6,780 Matching Annotations
  1. Mar 2024
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all the reviewers for their comments on our manuscript. We have attempted to address all the points raised by the reviewers and are happy to note that the manuscript is significantly strengthened with the additional experiments that we have performed and from significant restructuring of the manuscript.

      Reviewer #1

      Major Comments

      1. The choice of cells looks confusing. Drosophila are indeed widely used in research of neurodegeneration mechanisms, since they well reflect the behavioral characteristics of a wide range of brain diseases, but why authors used insect immune cells to study the effect of mHTT on cellular processes? Huntington's disease has a well-established site of origin, in the spiny neurons of the striatum, and they certainly have a different protein context than in insect cells. __Author's response: __We thank the reviewer for this comment. Patients with Huntington's disorder display a variety of symptoms affecting peripheral, non-neuronal cells, including alterations in the function of immune cells. Hemocytes isolated from Drosophila expressing pathogenic forms of Huntingtin also display altered immune responses. Through our manuscript we explore the effect of Huntingtin aggregates on cellular functions of hemocytes. Additionally, we have now included data showing that we are able to observe similar phenotypes in mammalian cells such as neuronal SHSY5Y and HEK293T (Supp. Fig. 3). This is indicative of similar effects being exerted by Huntingtin aggregates across cell types and organisms. Finally, we demonstrate that we are able to rescue neurodegeneration in the fly eye upon overexpression of either Hip1 or components of the Arp2/3 complex (Fig. 4F), further solidifying our results that Huntingtin aggregates alter CME in an actin-dependent manner and that this largely is responsible for the toxicity. This validates our observations that effects on CME appear to be independent of cell type and that non-neuronal cells such as hemocytes can also be used to study the effects of pathogenic aggregates.

      The interrelationship between mutant huntingtin and actin cytoskeleton and clathrin-mediated endocytosis that are convincingly demonstrated in other earlier studies in the m/s are described in rather morphological level and there is no description of molecular interactions of proteins belonging to three systems considered, Htt (control vs mutant), actin cytoskeleton and CME. Lack of these data renders the morphological observations unsupported

      __Author's response: __Previous data shown in Hosp et al, 2017 indicates that a large number of proteins involved in both actin remodelling and clathrin mediated endocytosis are sequestered within Huntingtin aggregates. While the mechanism of sequestration remains unknown, is has also been observed that loss of Huntingtin results in altered organization of the actin cytoskeleton. We have now added points discussing this in the results section.

      Three last figures of total eight demonstrate the effect of proteins, responsible for the initiation of certain neurodegenerative pathologies, on the activity of clathrin-mediated endocytosis, and on the properties of actin cytoskeletal system, however neither in the abstract nor in the introduction there is no any word about these proteins; in the discussion only a few words are devoted to one of these proteins TDP-43. When starting the article, did the authors plan to enter this data into the manuscript?

      __Author's response: __We have now amended this by revising the abstract and the text.

      It is important to work on the style of the manuscript, the article is difficult to read, it is a collection of data that does not seem related to each other.

      __Author's response: __We have reorganized the manuscript and have improved on the flow to make it easier for the reader. We apologize for the rather tedious and confusing flow in the previous draft.

      Reviewer #2

      This manuscript endeavours to explore the link between mutant Huntingtin, clathrin-mediated membrane transport and the actin cytoskeleton: both its dynamics and overall mechanics. As I read it, it carries the interesting idea that pathogenic protein aggregates alter actin cytoskeletal dynamics by sequestering Arp2/3 nucleator. This has two consequences in the authors' experiments: disruption of clathrin-coated vesicle movement and an increase in cellular stiffness. An interesting question is whether these two effects are related: Is the disruption of vesicular movement due to the change in cytoplasmic stiffness? Or could they be features that both reflect the underlying change in actin dynamics. This may be hard to tease apart and beyond the purview of this manuscript.

      I have some suggestions that could strengthen the MS.

      Major Comments

      1. Further characterizing Arp2/3 sequestration. The notion seems to be that actin nucleators would be sequestered (and inactivated) by mutant protein aggregates, as supported by co-localization studies. In addition, could the authors:
      2. a) Test if the dynamics of Arp2/3 are altered, comparing e.g. Arp3-GFP FRAP in the aggregates vs that elsewhere. Author's response: We indeed attempted the FRAP experiment. However due to some technical difficulties we were not convinced by the extent of FRAP in the transgenic fly line. It appeared as an artifact and we were not comfortable including the data in the manuscript. We have instead provided example files for the reviewer to examine.

      3. b) Test more directly if actin nucleation is altered in cells that have pathogenic mutant aggregates. This could be done by barbed-end labelling (e.g. measuring incorporation of labelled actin in live cells that are lightly permeabilized with saponin). __Author's response: __We have performed barbed-end labeling for HTT Q15 and HTT Q138 expressing cells. Images and quantification have now been added to the revised manuscript as Figures 2H and 2I. While this was a challenging experiment, it was deeply satisfying to observe such dramatic changes indicating a change in the state of the actin cytoskeleton.

      Does manipulating actin nucleation alter cellular mechanics as it does for clathrin-coated vesicle transport? For example, does inhibition of Arp2/3 (e.g. with CK666) increase cellular stiffness and would stiffness be amelioriated in mutant cells if Arp3 is overexpressed?

      __Author's response: __We have used LatA to look at whether alteration in the actin cytoskeleton affects cellular stiffness. We found that disruption of the actin cytoskeleton leads to a decrease in cellular stiffness in WT as well as in HTT Q138 expressing cells (shown in Figure 5 and discussed in the results section). We have also now performed AFM on CK666 treated cells and showed that treatment of CK666 leads to a decrease in cellular stiffness similar to LatA treated cells. This further strengthens our hypothesis that a 'Goldilocks' state of actin remodeling and consequently cellular stiffness is required for CME to proceed. We have not performed AFM on cells overexpressing Arp2/3 in HTT Q138 background. However, we believe that it will rescue cellular stiffness as overexpression of Arp2/3 rescues filopodia formation in HTT Q138 expressing cells (Figure 4E) as well as neurodegeneration. AFM data obtained from CK666 treated cells is now added in Supplementary figure 8.

      Although it may be difficult to determine if the defect in vesicle transport is due to the change in rheology, I wonder if the authors could reinforce their analysis by showing the overall relationship between the two features. It would be interesting if they could plot CCV velocity against elasticity for all the various conditions that they have tested. Would this cumulative analysis be informative?

      __Author's response: __This data is already present across the manuscript as part of different figures. We are not sure whether we can reuse the same data to put it as part of a different figure which plots the relationship between elasticity and CCS velocity. We would be grateful for advice on whether this is allowed and how to mention that the data is also part of different figures.

      Focus of the MS. I think that the MS is a little longer and more discursive than it needs to be. I rather struggled to find the focus of the story (which could well be me). There is a deal of repetition that could be profitably cut (the reader may actually find it easier to follow). As well, some anticipation and summaries could be shortened. The final paragraph of the introduction largely summarizes the paper; it could be shortened quite considerably, so that the reader can get directly into the Results themselves. Similarly, the final paragraph of the results is a summary which could work better elsewhere - perhaps, e.g. at the beginning of the discussion.

      __Author's response: __We have now trimmed and rearranged the text in the manuscript. We have reorganized the manuscript and have improved on the flow to make it easier for the reader. We apologize for the rather tedious and confusing flow in the previous draft. We are open to further suggestions to improve the writing style.

      Specific points

      i) Fig 3E. The changes in F-actin flow revealed by PIV are quite dramatic. How reproducible are these changes. (The data presented were from single cells?) __Author's response: __ Changes in F- actin flow obtained from PIV analysis (now figure 2J, 7E in MS) were performed on atleast 5 cells of each type, and the results were observed to be consistent across all. The representative figure is a true representative of the data observed.

      1. ii) If TPD43, does it also affect Arp2/3? Author's response: __We thank the reviewer for this comment. Unfortunately, __we could not perform this experiment due to the unavailability of a fluorescently tagged TDP43 fly line which which would enable us to visualize whether Arp3 was sequestered within the aggregates.

      Minor points:

      1. a) Fig 5a, b - why change the order on the x-axis? __Author's response: __We have fixed this now. We have removed figure 5b, since, in the revised MS we are only talking about stiffness instead of viscoelastic properties of the cells.

      Reviewer #2

      Overall, I think that the significance of the MS lies in its evidence that sequestration of actin nucleators may be a key effect of mutant protein aggregation, with implications for cellular function. This would provide a useful conceptual framework to understand the cell biological consquences of creating pathogenic protein aggregates.

      __Reviewer #3 __

      Summary

      In the paper, the authors showed that huntingtin aggregates, which play a critical role in initiating neurodegenerative diseases, impair clathrin-mediated endocytosis (CME). Using live cell imaging and AFM, the authors demonstrated that CME is affected by the alteration in actin cytoskeletal organization and cellular viscosity. Further, the authors concluded that there was a strong link between dynamic actin organization and functional CME in the context of neurodegeneration. While the data is interesting and novel, the study in its current form needs major revision before it is accepted.

      Major comments:

      1) Figure 2: The authors should show the compromised actin cytoskeleton structure after Lat A and cytoD treatment to back up the findings.

      __Author's response: __We have included the representative micrographs of compromised cytoskeleton in terms of filopodia formation upon treatment of LatA and CytoD in Supplementary figure 3E.

      2) Figure 2g and 2h: Quantification data of filopodia must be supported with representative images.

      __Author's response: __Figure number has been changed to 2D and 2E. Representative image for the quantification of filopodia has been now included in supplementary figure 3D.

      3) RNAi studies must be performed using control siRNA to check off-target effects.

      __Author's response: __Luc VAL10 was used as a control for all the RNAi experiments. However, data for RNAi is not shown as the phenotype for Luc VAL10 was comparable to WT. We have included Luc VAL10 as a control for Profilin RNAi in the FRAP experiment (Supplementary figure 4C).

      4) The result section needs to be reorganized to maintain flow. In the current format, the results of a similar set of experiments are spread across different figures, making it a bit difficult to understand.

      __Author's response: __We apologize for the inconvenience. This issue has been addressed now.

      5) Figure 3d: The expression level and spatial distribution of HTTQ138 transfection were not convincing compared to the httQ15 expression level and the distribution.

      Author's response Figure 3D (Figure 3A in this MS) shows the data obtained from hemocytes isolated from third instar larva of the same age. These are not transfected cells and are obtained from Drosophila larvae using the same Gal4 driver, Cg-Gal4. Thus, the level of expression will be same. However, the distribution may show a change due to the aggregating nature of HTT Q138, while HTT Q15 is non-aggregating and therefore remains diffused.

      6) Suppl Fig 2a data must be supported using images showing myosin VI distribution in wild-type vs. HTTQ138 transfected cells.

      __Author's response: __This data (Supplementary figure 4D in present MS) has been obtained from genetic knockdown of myosin VI. The aim of the experiment was to show that we see similar effects on CCSs movement as we see upon disruption of the actin cytoskeleton.

      7) Suppl movie videos are not labeled correctly in the source. It is not possible to locate them and know which videos are referred to in the manuscript.

      __Author's response: __We apologize for the inconvenience. This issue has been fixed now.

      8) Page 8: How do HTT aggregates sequester the actin-binding proteins? An explanation should be provided in the result section.

      __Author's response: __Previous data shown in Hosp et al, 2017 indicates that a large number of proteins involved in both actin remodelling and clathrin mediated endocytosis are sequestered within Huntingtin aggregates. While the mechanism of sequestration remains unknown, the types of proteins involved in actin remodelling are diverse and do not represent specific types or classes. We have now added points discussing this in the results section.

      9) Page 10: The authors concluded that "increasing the availability of proteins involved in actin reorganization is capable of restoring CME even in the presence of pathogenic aggregates." Since several actin-associated proteins are involved in actin reorganization, which types/classes of proteins are involved in CME restoration? The authors should expand it in the discussion.

      __Author's response: __As we have only investigated the roles of Hip1 and the Arp2/3 complex, we are confident of only reporting their roles in the context of this manuscript. However, previous data shown in Hosp et al, 2017 indicates that a large number of proteins involved in both actin remodelling and clathrin mediated endocytosis are sequestered within Huntingtin aggregates. While the mechanism of sequestration remains unknown, the types of proteins involved in actin remodelling are diverse and do not represent specific types or classes. Therefore this indicates that modulation of actin, through the sequestration of proteins involved in this process is affected in the presence of Huntingtin aggregates. We have added points detailing this in the results and discussion sections.

      10) The schematic of the proposed model depicting critical steps by which pathogenic proteins inhibit CME is required. It will help readers to understand the molecular mechanism easily.

      Author's response: We have now included a model in the manuscript (Fig. 8B).

      Minor:

      1) Figure panel referencing in the text needs to be more consistent, for example, fig 3e is referred to before fig 3d., and fig 2 panels are referred to before fig. 3 panels.

      __Author's response: __We have reordered the figures and maintained a consistent order throughout.

      2) The authors should use similar phrasing throughout the manuscript to avoid confusion. For instance, either use 'HTTQ138' or 'htt Q138'.

      __Author's response: __We apologize for this. We have now maintained uniform nomenclature through the text.

      3) Page 10: AFM indentation experimental part and its discussion in the result section is unnecessary. Shift it to the 'Materials and Method' section.

      __Author's response: __We have now trimmed this portion and we are now only showing elasticity data and not viscoelasticity.

      4) This statement looks a bit exaggerated. There is not sufficient evidence to support the statement- "It can be said that the cells in general behave like a soft glass. The presence of aggregates lowers the effective temperature pushing it nearer to the glass transition, affecting transport."

      __Author's response: __We have now removed all figures resulting from an analysis that assumes glassy behaviour. Instead, we have now provided a more conventional and well-established analysis to obtain Young's modulus of cells exhibiting different transport properties.

      5) Page 12: What is the basis for selecting proteins Aβ-42, FUSR521C, αSynA30P, αSynA53T, and TDP-43 over other proteins? An explanatory sentence must be added to support the selection.

      __Author's response: __We have modified the text to clarify this point.

    1. The social media platform itself is run with computer programs, such as recommendation algorithms (chapter 12).

      As a student majoring in communication, I am very interested in recommendation algorithms. Because we will find that many times Tik Tok or other software will push us things we like to watch or what we have searched for will always push us relevant content. Sometimes after browsing some clothing, you will even find discounts on the same items in other software. I think this is a good marketing tool but it may have some drawbacks. I am looking forward to studying chapter 12.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to Reviewers

      We are grateful to the three reviewers for their careful and constructive critiques of our preprint. We will address all of their comments and suggestions, which help to make our paper more precise and understandable. In our replies, we use 'Patterson, eLife (2021)' as shorthand for Patterson, Basu, Rees & Nurse, eLife 2021:10.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Novák and Tyson present a model-based analysis of published data that had claimed to demonstrate bistable activation of CDK at the G2/M transition in fission yeast. They point out that the published data does not distinguish between ultra-sensitive (switch-like, but reversible) and bistable (switch-like, but irreversible) activation. They back up their intuition with robust quantitative modeling. They then point out that, with a simple experimental modification, the published experiments could be repeated in a way that would test between the ultra-sensitive and bistable possibilities.

      This is an accurate and concise summary of our paper.

      Therefore, this is a rare paper that makes a specific modeling-based prediction and proposes a straightforward way to test it. As such, it will be of interest to a broad range of workers involved in the fields cell cycle and regulatory modeling.

      We agree that our work will be of interest to a broad range of scientists studying cell cycle regulation and mathematical modeling of bistable control systems.

      Nonetheless, attention to the following points would improve the manuscript. The authors should be more careful about how they describe protein abundance. They often refer to protein level. I believe in every case they mean protein concentration, but this is not explicitly stated; it could be interpreted as number of protein molecules per cell. The authors should either explicitly state that level means concentration or, more simply, use concentration instead of level.

      A valid criticism that has been addressed in the revised version.

      The authors should explain why they include stoichiometric inhibition of CDK by Wee1 in their model. Is it required to make the model work in the wild-type case, or only in the CDK-AF case? My intuition is it should only be required in the AF case, but I would like to know for sure. Also, they should state if there is any experimental data for such regulation.

      Bistability of the Tyr-phosphorylation switch requires 'sufficient' nonlinearity, which may come from the phosphorylation and dephosphorylation reactions that interconvert Cdk1, Wee1 and Cdc25. The easiest way to model these interconversion reactions is to use Hill- or Goldbeter-Koshland functions for the phosphorylation and dephosphorylation of Wee1 and Cdc25, but this approach is not appropriate for Gillespie SSA, which assumes elementary reactions. Both Wee1 and Cdc25 are phosphorylated on multiple sites, which we approximate by double phosphorylation; but this level of nonlinearity is not sufficient to make the switch bistable. In addition, stochiometric inhibition is a well-known source of nonlinearity, and in the Wee1:Cdk1 enzyme:substrate complex, Cdk1 is inhibited because Wee1 binds to Cdk1 near its catalytic site. In our model, stoichiometric inhibition of Cdk1 by Wee1 is required for bistability even in the wild-type case because the regulations of Wee1 and Cdc25 by phosphorylation are not nonlinear enough. There is experimental evidence that stoichiometric inhibition of Cdk1 by Wee1 is significant: mik1D wee1ts double mutant cells at the restrictive temperature (Lundgren, Walworth et al. 1991) are less viable than AF-Cdk1 (Gould and Nurse 1989). Furthermore, Patterson (eLife, 2021) found weak 'bistability' when they used AF-Cdk1 to induce mitosis. This puzzling observation suggests a residual feedback mechanism in the absence of Tyr-phosphorylation. Our model accounts for this weak bistability by assuming that free CDK1 can phosphorylate and inactivate the Wee1 'enzyme' in the Wee1:Cdk1 complex, which makes CDK1 and Wee1 mutual antagonists. This reaction is based on formation of a trimer, Cdk1:Wee1:Cdk1, which is possible since CDK1 phosphorylation of Wee1 occurs in its N-terminal region, which lies outside the C-terminal catalytic domain of Wee1 (Tang, Coleman et al. 1993). These ideas have been incorporated into the text in the subsection describing the model (see lines120-125).

      The authors should explicitly state, on line 131, that the fact that "the rate of synthesis of C-CDK molecules is directly proportional to cell volume" results in a size-dependent increase in the concentration of C-CDK.

      The accumulation of C-CDK molecules in fission yeast cells is complicated. In general, we may assume that larger cells have more ribosomes and make all proteins faster than do smaller cells. Absent other regulatory effects, the number of protein molecules is proportional to cell volume, and the concentration is constant. But, in Patterson's experiments, the number of C-CDK molecules is zero at the start of induction and rises steeply thereafter (see lines 147-148), and the rate of increase (#molec/time) is proportional to the size of the growing cell.

      The authors should explain, on line 100, why they are "quite sure the bistable switch is the correct interpretation".

      Line 105-106: "Although we suspect that the mitotic switch is bistable,.."

      On line 166, include the units of volume.

      Done

      On lines 152 and 237, "smaller protein-fusion levels "should be replaced with "lower protein-fusion concentrations".

      Done

      **Referee cross-commenting** *I concur with the other two reviews. *

      Reviewer #1 (Significance (Required)): *The paper is significant in that it points out an alternative interpretation for an important result in an important paper. Specifically, it points out that the published data is consistent with activation of CDK at the G2/M transition in fission yeast could be ultra-sensitive (switch-like, but reversible) instead of bistable (switch-like, but irreversible). The distinction is important because it has been claimed, by the authors of the submitted manuscript among others, that bistability is required for robust cell-cycle directionality. *

      We agree with this assessment.

      However, activation of CDK at the G2/M transition in other species has been shown to be bistable and the authors state that they are "quite sure the bistable switch is the correct interpretation". So, the paper is more likely an exercise in rigor than an opportunity to overturn a paradigm.

      We were the first authors to predict that the G2/M switch is bistable (J. Cell Sci., 1993) and among the first to prove it experimentally in frog egg extracts (PNAS, 2004). Our models (Novak and Tyson 1995, Novak, Pataki et al. 2001, Tyson, Csikasz-Nagy et al. 2002, Gerard, Tyson et al. 2015) of fission yeast cell-cycle control rely on bistability of the G2/M transition; so, understandably, we believe that the transition in fission yeast is a bistable switch. But the 'bistable paradigm' has never been directly demonstrated by experimental observations in fission yeast cells. The Patterson paper (eLife, 2021) claims to provide experimental proof, but we demonstrate in our paper that Patterson's experiments are not conclusive evidence of bistability. Furthermore, we suggest that a simple change to Patterson's protocol could provide convincing evidence that the G2/M switch is either monostable or bistable. We are not proposing that the switch is monostable; we would be quite surprised if the experiment, correctly done, were to indicate a reversible switch. Our point is simply that the published experiments are inconclusive. The point we are making is neither a mere 'exercise in rigor' nor a suggestion to 'overturn a paradigm.' Rather it is a precise theoretical analysis of a central question of cell cycle regulation that should be of interest to both experimentalists and mathematical modelers.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Summary: The manuscript asks whether the data reported in Patterson et al. (2021) is consistent with a bistable switch controlling the G2/M transition in fission yeast. Patterson et al. (2021) use an engineered system to decouple a non-degradable version of Cyclin-dependent kinase (CDK) from cell growth and concomitantly measure CDK activity (by the nuclear localization of a downstream target, Cut3p). They observe cells with indistinguishable CDK levels but two distinct CDK activities, which they posit shows bistable behavior. In this study, the authors ask if other models can also explain this data. The authors use both deterministic and Gillespie based stochastic simulations to generate relationships between CDK activities and protein levels for various cell sizes. They conclude that the experiments performed in Patterson et al. are insufficient to distinguish between a bistable switch and a reversible ultrasensitive switch. They propose additional experiments involving the use a degradable CDK construct to also measure the inactivation kinetics.

      This is an accurate summary of our paper.

      They propose that a bistable switch will have different forward (OFF->ON) and backward (ON->OFF) switching rates. A reversible ultrasensitive switch will have indistinguishable switching rates.

      Our analysis of Patterson's (2021) experiments is based on the well-known fact that the threshold for turning a bistable switch on is significantly different from the threshold for turning it off (in Patterson's case, the 'threshold' is the level of fusion protein in the cell when CDK is activated), whereas for a reversible, ultrasensitive switch, the two thresholds are nearly indistinguishable. The 'rate' at which the switch is made is a different issue, which we do not address explicitly. In the experiments and in our model, the switching rates are fast, whether the switch is bistable or monostable. The results are interesting and worth publication in a computational biology specific journal, as they might only appeal to a limited audience.

      We think our results should also be brought to the attention of experimentalists studying cell cycle regulation, because Patterson's paper (eLife, 2021) presents a serious misunderstanding of the existence and implications of 'bistability' of the G2/M transition in fission yeast. Whereas Patterson's work is an elegant and creative application of genetics and molecular biology to an important problem, it is not backed up by quantitative mathematical modeling of the experimental results. In that sense, Patterson's work is incomplete, and its shortcomings need to be addressed in a highly respected journal, so that future cell-cycle experimentalists will not make the same-or similar-mistakes.

      Several ideas need to be clarified and additional information needs to be provided about the specific parameters used for the simulations: Major comments: #1 The parameters need to be made more accessible by means of a supplementary table and appropriate references need to be cited.

      Two new supplementary tables (S1 and S2) summarize the dynamic variables and parameter values.

      It is not clear why Michaelis Menten kinetics will not be applicable to this system. Has it been demonstrated that the Km s of the enzymes are much greater than the substrate concentrations for all the reactions? If yes, please cite.

      MM kinetics are not appropriate for such protein interaction networks because one protein may be both an enzyme and a substrate for a second protein (e.g., Wee1 and CDK, or Cdc25 and CDK). So, the condition for validity of MM kinetics (enzyme concen ≪ substrate concen) cannot be satisfied for both reactions. Indeed, enzyme concen ≈ substrate concen is probably true for most reactions in our network. Hence, it is advisable to stick with mass-action rate laws. Furthermore, MM kinetics are a poor choice for 'propensities' in Gillespie SSA calculations, as has been shown by many authors (Agarwal, Adams et al. 2012, Kim, Josic et al. 2014, Kim and Tyson 2020).

      It will not be surprising if the simulation with Michaelis Menten would alter the dynamics shown in this study. A reversible switch with two different enzymes (catalyzing the ON->OFF and OFF->ON transitions) having different kinetics can give asymmetric switching rates. This would directly contradict what has been shown in Figure 7A-D.

      We don't follow the reviewer's logic here. The two transitions, off → on and on → off, are already driven by different molecular processes (dephosphorylation of inactive CDK-P by Cdc25 and phosphorylation of active CDK by Wee1, respectively). Positive feedback of CDK activity on Cdc25 and Wee1 (++ and −−, respectively) causes bistability and asymmetric switching thresholds. Switching rates, which are determined by the kinetic rate constants of the up and down processes, are of secondary importance to the primary question of whether the switch is monostable or bistable.

      #2 Line 427: The authors use a half-time of 6 hours in their model as Patterson et al. used a non-degradable construct. It is not clear why dilution due to cell growth has not been considered. The net degradation rate of a protein is the sum of biochemical degradation rate and growth dilution rate. The growth dilution rate seems significant (140 mins doubling time or 0.3 h-1 dilution rate) relative to assumed degradation rate (0.12 h-1). Please clarify why was the effect of dilution neglected in the model or show by sensitivity analysis this does not change the predicted CDK activation thresholds.

      The reviewer highlights an important effect, but it is not relevant to our calculations. In the deterministic model used to calculate the bifurcation diagrams, both cell volume and the concentration of the non-degradable Cdc13:Cdk1 dimer are kept constant; therefore, there is no dilution effect. The stochastic model deals with changing numbers of molecules per cell; the dilution effect is taken into account by the appearance of cell volume, V(t), at appropriate places in the propensity functions. In other words: in the deterministic model, which is written for concentration changes, the dilution term, −(x/V)(dV/dt), is zero because V=constant; in the stochastic model, written in terms of numbers of molecules, dilution effects are implicit in the propensity functions.

      *#3 Line 402 The authors state that the production rate of the Cdk protein is 'assumed' proportional to the cell volume. The word 'assumed' is incorrect here as a simple conversion of concentration-based differential equation (with constant production rate) to molecular numbers would show that production rate is proportional to the volume. This is not an assumption. *

      Correct; we modified the text (see line 450-462). The role of cell volume in production rate is more relevant to the case of Cdc25, where we assume that its production rate, Δconcentration/Δt, is proportional to V, because the concentration of Cdc25 in the cell increases as the cell grows. We added two references (Keifenheim, Sun et al. 2017, Curran, Dey et al. 2022) to justify this assumption. In the stochastic code, the propensity for synthesis of Cdc25 molecules is proportional to V2.

      #4 Line 423 Please cite the appropriate literature that shows that fission yeast growth during cell division is exponential. If the dynamics are more complicated, involving multiple phases of growth during cell division, please state so.

      We now acknowledge that volume growth in fission yeast, rather than exponential, is bilinear with a brief non-growing phase at mitosis (Mitchison 2003). However, we suggest that our simplifying assumption of exponential growth is appropriate for the purposes of these calculations. See line 473-476: "In our stochastic simulations, we assume that cell volume is increasing exponentially, V(t) = V0eμt. Although fission yeast cells actually grow in a piecewise linear fashion (Mitchison 2003), the simpler exponential growth law (with doubling time @ 140 min) is perfectly adequate for our purposes in this paper.."

      *#5 Line 250 The authors convert the bistable version of the CDK switch to reversible sigmoidal by assuming that Wee1 and Cdc25 phosphorylation is proportional to the CDK level rather than activity, which seems biochemically unrealistic. This invokes an altered circuit architecture where inactive CDK has enough catalytic activity to phosphorylate the two modifying enzymes (Wee1/Cdc25) but not enough to drive mitosis. This might be possible if the Km of CDK for Wee1/Cdc25 is lower relative to other downstream substrates that drive mitosis. The authors can reframe this section of the paper to state this possibility, which might be interesting to experimentalists. *

      The reviewer is correct that the molecular biology underlying our 'reversible sigmoidal' model is biochemically unrealistic. But, in our opinion, this is the simplest way to convert our bistable model into a monostable, ultrasensitive switch while maintaining the basic network structure in Fig. 1. Our purpose is to show that a monostable model-only slightly changed from the bistable model-can account for Patterson's experimental data equally well. If Nurse's group modifies the experimental protocol as we suggest and their new results indicate that the G2/M transition in fission yeast is bistable, then our reversible sigmoidal model, having served its purpose, can be forgotten. If they show that the transition is not bistable, then both experimentalists and theoreticians will have to think about biochemically realistic mechanisms that can account for the new data...and everything else we already know about the G2/M transition in fission yeast.

      #6 It is difficult to phenomenologically understand a bistable switch just based on differences in activation and inactivation thresholds. For example, a reversible ultrasensitive switch also shows a difference in activation and inactivation thresholds (Figure 7D). How much of a difference should be expected of a bistable switch versus reversible switch?

      We show how much of a difference can be expected by contrasting Fig. 7 to Fig. 8. For the largest cells (panel D of both figures), the difference is small and probably undetectable experimentally. For medium-sized cells (panel C), the difference is larger but probably difficult to distinguish experimentally. Only the smallest cells (panel B) provide an opportunity for clearly distinguishing experimentally between monostable and bistable switching.

      *Moreover, as the authors clearly understand (line 275), time-delays in activation and inactivation reactions can inflate these differences. In the future, if the authors can convert the equations to potential energy space as done in Acar et al. 2005 (Nature 435:228) in Figure 3c-d, it will be useful. Also, predicting the distribution of switching rates from the Gillespie simulation might be informative and can be directly compared to experimental measurements in the future (if the Cut3p levels in nucleus and cytosol equilibrates fast enough or other CDK biosensors are developed). *

      The famous paper by Acar et al. (2005) is indeed an elegant experimental and theoretical study of bistability ('cellular memory') in the galactose-signalling network of budding yeast. We have included a comparison of Patterson et al. with Acar et al. in our Conclusions section (lines 353-368):

      "It is instructive, at this point, to compare the work of Patterson et al. (2021) to a study by Acar et al. (Acar, Becskei et al. 2005) of the galactose-signaling network of budding yeast. Combining elegant experiments with sophisticated modeling, Acar et al. provided convincing proof of bistability ('cellular memory') in this nutritional control system. They measured PGAL1-YFP expression (the response) as a function of galactose concentration in the growth medium (the signal), analogous to Patterson's measurements of CDK activity as a function of C-CDK concentration in fission yeast cells. In Acar's experiments, the endogenous GAL80 gene was replaced by PTET-GAL80 in order to maintain Gal80 protein concentration at a constant value determined by doxycycline concentration in the growth medium. The fixed Gal80p concentration in Acar's cells is analogous to cell volume in Patterson's experiments. In Fig.3b of Acar's paper, the team plotted the regions of monostable-off, monostable-on and bistable signaling in dependence on their two control parameters, external galactose concentration and intracellular Gal80p concentration, analogous to our Fig.4. Because Acar's experiments explored both the off → on and on → off transitions, they could show that their observed thresholds (the red circles) correspond closely to both saddle-node bifurcation curves predicted by their model. On the other hand, Patterson's experiments (as analyzed in our Fig.4) probe only the off → on transition."

      The purpose of our paper is to show that Patterson-type experiments can and should be done so as to probe both thresholds, as was done by van Oudenaarden's team. They went further to characterize their bistable switch in terms of 'the concept of energy landscapes'. We think it is premature to pursue this idea in the context of the G2/M transition in fission yeast until there is firm, quantitative data characterizing the nature of the 'presumptive' bistable switch in fission yeast.

      Minor comments: #1 Line 2: Please replace "In most situations" to "In favorable conditions"

      Done.

      **Referee cross-commenting** I agree with Reviewer 1 that this falls more under pointing out an alternative interpretation of a single experiment than challenging widely supported orthodoxy about how the eukaryotic cell cycle leaves mitosis.

      As we said earlier, our 1993 paper in J Cell Sci is the source of this orthodox view, and it is widely supported at present because there is convincing experimental evidence for bistability in frog egg extracts, budding yeast cells and mammalian cells. Patterson's paper is not sound evidence for bistability of the G2/M transition in fission yeast cells. It is important for experimentalists to know why the experiments fail to confirm bistability, and important for someone to do the experiment correctly in order to confirm (or, what would be really interesting, to refute) the expectation of bistability at the G2/M transition in fission yeast cells.

      Reviewer #2 (Significance (Required)): Suitable for specialist comp bio journal eg PLoS Comp Bio

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The paper by Novak and Tyson revisits a recent paper from Nurse group on the bistability of mitotic switch in fission yeast using mathematical modelling. The authors extend their older models of mitotic entry check point and implement both deterministic and stochastic version of new model. They show this model does indeed possess bistability and show that combined with stochastic fluctuations the model can show bimodality for the cyclin-CDK activity at a particular cell size consistent with the recent experimental data. However, the authors also show alternative model that has mono-stable ultrasensitivity can also explain the data and suggest experiments that can prove the existence of hysteresis and therefore bistability.

      Right on.

      While the biological implication of the study is well explained, the authors can improve the presentation of their model and the underlying assumptions. I have the following comments and suggestions for improvement of the paper.

        • The cartoon of the mathematical model is confusing at places, for example the wee1-CDK complex according to the equations either dissociates back to wee1 and CDK or gives rise to pCDK and wee1, the arrow below is confusing as it implies it can also give rise to wee1p, the CDK phosphorylation of wee1 is already included in the diagram. Also, the PP2A is put on the arrow for all reactions but for wee1p2 to wee1p its action shown with a dashed line. Also, I wondered if wee1p and wee1p2 can also bind CDK and sequester or phosphorylate CDK?* We are sorry for the confusion and have improved Fig. 1.
      1. The rates and variables in the ODEs are not fully described. Also sometimes unclear what is parameter and what is a variable, I had to look at the code.*

      We now include tables of variables and parameter values, with explanatory notes.

      • The model has quite a few parameters, but these are not at all discussed in the paper. How did the authors come up with these particular set of parameters, has there been some systematic fitting, or tuning by hand to produce a good fit to the data? I could only see the value of the parameters in the code, but perhaps a table with the parameters of the model, what they mean and their value (and perhaps how the values is obtained) is missing.*

      The parameters were tuned by hand to fit Patterson's data, based, of course, on our extensive experience fitting mathematical models to myriad data sets on the cell division cycles of fission yeast, budding yeast, and frog egg extracts. We now provide a table of parameter values.

      • The authors are using the Gillespie algorithm with time varying parameters (as some rates depend on volume and volume is not constant). Algorithm needs to be modified slightly to handle this (see for example Shahrezaei et al Molecular Systems Biology 2008). *

      A valid criticism, but the rate of cell volume increase is very slow compared to the propensities of the biochemical reactions. We write (lines 492-498):

      "In each step of the SSA, the volume of the cell is increasing according to an exponential function, and, consequently, the propensities of the volume-dependent steps are, in principle, changing with time; and this time-dependence could be taken into account explicitly in implementing Gillespie's SSA (Shahrezaei, Ollivier et al. 2008). However, the step-size between SSA updates is less than 1 s compared to the mass-doubling time (140 min) of cell growth. So, it is warranted to neglect the change in V(t) between steps of the SSA, as in our code."

      • The authors correctly point out, ignoring mRNA has resulted in underestimation of noise, however another point is that mRNA life times are short and that also affects the timescale of fluctuations and this may be relevant to the switching rates between the bistable states. *

      A valid point, but to include mRNA's would double the size of the model. Furthermore, we have little or no data about mRNA fluctuations in fission yeast cells, so it would be impossible to estimate the values of all the new parameters introduced into the model. Finally, the switching rates between bistable states (or across the ultrasensitive boundary) are not the primary focus of Patterson's experiments or our theoretical investigations. So, we propose to delay this improvement to the model until the relevant experimental data is available.

      • In the introduction add, "In this study" to "Intrigued by these results, we investigated their experimental observations with a model of bistability in the activation of cyclin-CDK in fission yeast." *

      Done

      Reviewer #3 (Significance (Required)): Overall, this is an interesting study that revisits an old question and some recent experimental data. The use of stochastic modelling in explaining variability and co-existence of cell populations in the context of cell cycle and comparison to experimental data is novel and of interest to the communities of cell cycle researchers, systems biologists and mathematical biologists.

      We agree. Thanks for the endorsement

      References

      Acar, M., A. Becskei and A. van Oudenaarden (2005). "Enhancement of cellular memory by reducing stochastic transitions." Nature 435(7039): 228-232.

      Agarwal, A., R. Adams, G. C. Castellani and H. Z. Shouval (2012). "On the precision of quasi steady state assumptions in stochastic dynamics." J Chem Phys 137(4): 044105.

      Curran, S., G. Dey, P. Rees and P. Nurse (2022). "A quantitative and spatial analysis of cell cycle regulators during the fission yeast cycle." Proc Natl Acad Sci U S A 119(36): e2206172119.

      Gerard, C., J. J. Tyson, D. Coudreuse and B. Novak (2015). "Cell cycle control by a minimal Cdk network." PLoS Comput Biol 11(2): e1004056.

      Gould, K. L. and P. Nurse (1989). "Tyrosine phosphorylation of the fission yeast cdc2+ protein kinase regulates entry into mitosis." Nature 342(6245): 39-45.

      Keifenheim, D., X. M. Sun, E. D'Souza, M. J. Ohira, M. Magner, M. B. Mayhew, S. Marguerat and N. Rhind (2017). "Size-Dependent Expression of the Mitotic Activator Cdc25 Suggests a Mechanism of Size Control in Fission Yeast." Curr Biol 27(10): 1491-1497 e1494.

      Kim, J. K., K. Josic and M. R. Bennett (2014). "The validity of quasi-steady-state approximations in discrete stochastic simulations." Biophys J 107(3): 783-793.

      Kim, J. K. and J. J. Tyson (2020). "Misuse of the Michaelis-Menten rate law for protein interaction networks and its remedy." PLoS Comput Biol 16(10): e1008258.

      Lundgren, K., N. Walworth, R. Booher, M. Dembski, M. Kirschner and D. Beach (1991). "mik1 and wee1 cooperate in the inhibitory tyrosine phosphorylation of cdc2." Cell 64(6): 1111-1122.

      Mitchison, J. M. (2003). "Growth during the cell cycle." Int Rev Cytol 226: 165-258.

      Novak, B., Z. Pataki, A. Ciliberto and J. J. Tyson (2001). "Mathematical model of the cell division cycle of fission yeast." Chaos 11(1): 277-286.

      Novak, B. and J. J. Tyson (1995). "Quantitative Analysis of a Molecular Model of Mitotic Control in Fission Yeast." J Theor Biol 173: 283-305.

      Patterson, J. O., S. Basu, P. Rees and P. Nurse (2021). "CDK control pathways integrate cell size and ploidy information to control cell division." Elife 10.

      Shahrezaei, V., J. F. Ollivier and P. S. Swain (2008). "Colored extrinsic fluctuations and stochastic gene expression." Mol Syst Biol 4: 196.

      Tang, Z., T. R. Coleman and W. G. Dunphy (1993). "Two distinct mechanisms for negative regulation of the Wee1 protein kinase." EMBO J 12(9): 3427-3436.

      Tyson, J. J., A. Csikasz-Nagy and B. Novak (2002). "The dynamics of cell cycle regulation." Bioessays 24(12): 1095-1109.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear Editor,

      We have addressed the points and concerns raised by the reviewers and wish to thank them for their effort and time. We agree with all the comments and suggestions, which resulted in a significant improvement of the manuscript. Below, we provide a point-by-point response to all comments.

      Sincerely,

      Anders Hofer, corresponding author


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In their paper Ranjbarian and colleagues provide a tour de force at characterizing dAK from G. intestinalis using both enzymology and structural biology. G. intestinalis does not have RNR and therefore this organism relies on dAK which catalyzes formation of dAMP and ADP from deoxyadenosine and ATP (among other substrate pairs). The authors performed a terrific job at testing this reaction in depth using a recombinant dAK and a battery of various co-substrates (both natural as well as synthetic ones, Table 1). Extensive structural information on dAK was obtained using a combination of X-ray crystallography and cryo-EM. Overall, this work will be paramount aid in better understanding of the reaction mechanism, especially in the context of molecules which can be used as inhibitors of such a crucial enzyme (metabolic vulnerability for this parasite).

      This manuscript, in its current form does not require additional experiments but I would like to have a few aspects corrected/clarified, before it can be accepted for publication:

      Line 30: "whereas the affinities for deoxyguanosine, deoxyinosine and deoxycytidine were 400-2000 times lower." Better not to use term "affinity" when KM or kcat/KM are implied (unless ITC was used to measure true Kds).

      -This is a good point, and we are now using KM values in all instances were actual numbers are implied and only kept the word affinity in cases where it is discussed in more general terms.

      Line 31: "Deoxyadenosine analogues halogenated at the 2- and/or 2´-positions were also potent substrates, with comparable EC50 values as the main drug used today, metronidazole, but with the advantage of being usable on metronidazole-resistant parasites." Not sure this sentence is clear as written.

      -We have now rewritten the sentence as follows: "Deoxyadenosine analogues halogenated at the 2- and/or 2´-positions were also potent substrates with comparable EC50 values on cultured G. intestinalis cells as metronidazole, the first line treatment today, with the additional advantage of being effective against metronidazole-resistant parasites."

      Line 55: "..G. intestinalis (synonymous to G. lamblia and G. duodenalis)..". Very nice that authors provide this information as it is usually a point of confusion i.e. multiple names for the same organism.

      -Thanks a lot, we are happy that you liked it.

      Line 61 and above as well: "Treatment regimes are mainly based on metronidazole and to a lesser extent other 5-nitroimidazoles...". MT is introduced a bit sporadically, and not completely clear which enzyme it inhibits and its mode of action." Common knowledge is that MT is known for its action in aerobic parasites/bacteria and known as Flagyl, where it is mode of action was linked to "activation" due to microaerophilic conditions. Maybe MT can be introduced after text starting from Line 71?

      -The description of metronidazole is adjusted as following: "Metronidazole (Flagyl) is the most commonly used drug to treat giardiasis and selectively kills the parasite and other anaerobic organisms by forming free radicals under oxygen-limited conditions, but it has side effects such as nausea, abdominal pain, diarrhea, and in some cases neurotoxicity reactions."

      Line 70: what is "cyst-wall"?

      -It is a cell wall consisting of three major cyst wall proteins and N-acetylgalactosamine. We have adjusted the sentence to the following to make the term clearer: The trophozoites can also secrete material to form a cyst wall and go through two rounds of DNA replication to form cysts, which contain four nuclei and a 16N genome per cell (4N in each nucleus).

      Line 90: "The reaction is catalyzed by deoxyribonucleoside kinases (dNKs), which are.." I really do not like when in order to find a reaction which is catalyzed by an enzyme in a particular study one needs to dive into the literature, sometimes it requires a lot of time as in most of recent papers on the subject reactions catalyzed are not listed. Please add a Figure or a panel with reactions catalyzed by both dNKs families.

      -It is a good idea and we have now added a figure (Fig. 1), which compares the deoxyribonucleotide metabolism of G. intestinalis with mammalian cells. The different deoxyribonucleoside kinases in the parasite and mammalian cells are included in the figure.

      Line 96: "..was found to have a ~10-fold higher affinity to thymidine.." as I mentioned above I really do not like the usage of "affinity", when actually low KM is implied.

      -It is corrected now (see above).

      Line 113: "This does not match the current knowledge that there are three dNKs in total whereof one completely specific for thymidine. The lack of knowledge about these essential enzymes in the parasite has hampered the understanding of Giardia deoxyribonucleoside metabolism and hence its exploitation as a target for antiparasitic drugs." Very good rationale, as I mentioned above, I think a Figure needs to be introduced that depicts different enzymes involved in deoxyribonucleoside metabolism (both TK1 and non- TK1 members) in Giardia with clearly labeled all known paralogs and corresponding enzymatic reactions.

      -Thanks a lot for the suggestion. Information about the different dNKs in G. intestinalis with mammalian cells for comparison is included in the new figure (Fig. 1).

      Line 132: Odd designation of supplementary figures, usually it is "Fig. S1" etc. The legend for Fig. S1 is not adequate, please add description of species and name of enzymes for all sequences shown. Also each sequences in alignment should start with number (a.a. number) as it is not clear if a full sequence is shown or not. Overall comment about the multiple sequence alignment (relevant to Fig. S1): with such a small number of sequences it is very hard to make any substantial predictions about conserved regions etc.

      -Thanks for the suggestions. We have now included more sequences, sequence numbering, and description of species as well as enzyme names. Some other changes are also that we have now used the same G. intestinalis dAK sequence in the alignment as in the experiments (same strain and accession number), and that we have made a realignment using Clustal W instead of Clustal Omega (gives better alignment of the termini). The designation of supplementary figures is according to the style of PLoS journals.

      Fig. 1 and elsewhere: I will prefer that all bar graphs show individual values + the error bar (if possible);

      -We have now added individual values to the bar graphs.

      I do not have any issues with X-ray data and cryo-EM studies (refinement statistics, particles classification etc).

      **Referees cross-commenting**

      I also agree with all the comments provided by Reviewer 2 and very pleased to see that we were very similar in our evaluations.

      Reviewer #1 (Significance (Required)):

      In their paper Ranjbarian and colleagues provide a tour de force at characterizing dAK from G. intestinalis using both enzymology and structural biology. G. intestinalis does not have RNR and therefore this organism relies on dAK which catalyzes formation of dAMP and ADP from deoxyadenosine and ATP (among other substrate pairs). The authors performed a terrific job at testing this reaction in depth using a recombinant dAK and a battery of various co-substrates (both natural as well as synthetic ones, Table 1). Extensive structural information on dAK was obtained using a combination of X-ray crystallography and cryo-EM. Overall, this work will be paramount aid in better understanding of the reaction mechanism, especially in the context of molecules which can be used as inhibitors of such a crucial enzyme (metabolic vulnerability for this parasite).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Ranjbarian et al. investigated a non-TK1-Like deoxyribonucleoside kinase (dNK) found in the protozoan parasite Giardia intestinalis. They used enzyme kinetic assays on heterologously expressed Gi dNK in E. coli to determine which deoxyribonucleotides were most likely physiological substrates for the enzyme. Their characterization revealed that this Gi dNK has a strong affinity to deoxyadenosine. They further investigated the affinity and activity of the dNK on deoxyadenosine analogues, some of which have known pharmaceutical utility. Finally, using a combination of crystallography, cryo-EM, chromatography, and mass photometry, they reveal that unlike other dNKs, Gi dNK forms a tetramer. They characterize important regions required for tetramerization and postulate that this tetramerization evolved to provide Gi dNK with a heightened affinity for deoxyadenosine.

      Major comments and questions:

      • The claims in this manuscript are well-supported, and I found no major issues with experimental methods. • The authors provide a structure of tetrameric dNK and suggest that this tetramer leads to the increased affinity to substrate compared to non-giardia dNKs. They also show through mutations that removing the novel dimerization regions decreases substrate affinity by 100-fold. However, I was left unclear about why the tetramer would lead to such high affinity for substrate compared to two dimers. This is especially notable, since the authors state that there are no signs of cooperativity, which is a common way that oligomerization may lead to heightened affinity. If the authors have no current evidence explaining this, they can consider adding a short amount of discussion speculating on the mechanism and future directions of study. -Thanks for this suggestion. We have now added a section in the last paragraph of the discussion where we speculate on the subject.

      Minor comments and questions:

      • The authors state that dATP acts as a mixed inhibitor and not a simple competitive inhibitor, and that previous studies have shown that this is because the dNTP competes in two locations (line 163). Is it also possible that competitive inhibition + allosteric regulation could be causing this behavior instead? -It is true that this can be theoretically explained in many ways. In fact, many allosteric regulators affect both the Vmax and Km values. However, in all studied dNKs, the dNTP acts as a dual competitor and no proper allosteric regulation with a separate allosteric site has ever been observed so far. We have rephrased this part as following to make it clear: "Mixed inhibition is often the result of allosteric regulation but studies of other dNKs have shown that this is not the case [17]. Instead, the far-end dNTP product gives a dual inhibition where the deoxyribonucleoside moiety competes with the substrate and the phosphate groups mimic those of ATP but coming from the opposite direction."

      • In the introduction (line 93), non-TK1-like dNKs are described as "not structurally related to TK1-like". This left me unclear, are they still interrelated among themselves? -We have added the following sentence for clarification: "The non-TK1-like dNKs are further subdivided into a monophyletic group of canonical non-TK1-like dNKs and a second group with thymidine kinases from Herpesviridae, which are structurally related to the canonical group but share very little amino acid sequence homology."

      • I was left confused by lines 106-116 in the introduction, where the specificities of dNKs in giardia are discussed. This is touched upon again in the discussion, but it was not clear here that there are several deoxyribonucleotides unaccounted for. -We think this should be clear now with the added Fig. 1 where the dNKs are shown.

      • When describing enzyme assays (Line 145), the authors say there is no salt dependence, but there looks to be MgCl2 always included in the assays (presumably for the ATP). -This is a good point and something we have overlooked when the sentence was written (Mg2+ is required). We have now corrected the sentence as follows: "Based on initial enzyme activity studies, it was confirmed that the assay did not have any specific requirements regarding K+, Na+, NH4+, acetate or reducing agents, and that it was linear with respect to time (S2 Fig)."

      • I was confused by the y-axis of Fig 2. How is enzyme activity lower when dAdo is added? I think I read "enzyme activity" as total substrate depleted, when it is actually referring exclusively to the given non-dAdo substrate in each column. -This is a very good point that we seem to have overlooked. We have now adjusted the y-axis title to "Indicated enzyme activity" and added the following sentence to the figure legend: "The recorded enzyme activities are for the substrates indicated on the x-axis (excluding the activity with the competing substrate)."

      • Lines 239 - 255 and Figure 3 were a little unclear to me. Specifically, I was having trouble following in the text which dimer is in the ASU, which is symmetry related, and matching those terms with which are canonical and non-canonical. -We agree with the reviewer and thank them for their comment. In order to improve the presentation of these results, we chose to extensively rearrange the figures and accompanying text. We now present the initial X-ray data together with the cryo-EM data in a new Figure 4 that focuses on the overall architecture of the tetramer. We realize that some of the nomenclature previously used in that figure was, as the reviewer pointed out, confusing and superfluous, and we have now simplified and unified it. The structural details of how the extended N- and C-termini interact with the neighboring subunit have been moved to the new figure 6 in order to present them just before the functional analyses of the consequences of truncating the termini. As a consequence of these changes to the figure layout, we made substantial changes in the organization of the text surrounding these figures, which also led to a clearer presentation. Since the changes to the figures and text are quite substantial, we would like to point out that they are only changes to the presentation, not to the data shown.

      • The authors suggest that in the experiment shown in Figure S9 (Line 285), low activity may be caused by minor impurities. I'm not sure why impurities would lower activity significantly. Could there be other differences in experimental conditions that are at play instead? -The sentence refers to a side activity (dATP dephosphorylation) which is not the normal reaction of deoxyadenosine kinase. We have rephrased the sentence to make it clearer: "The dATP-dephosphorylating activity was several orders of magnitude lower than the regular dAK activity (to phosphorylate deoxyadenosine) and was possibly catalyzed by other enzymes present as minor impurities in the protein preparation."

      • (Optional) From looking at the crystallography stats, I think the authors can potentially push the resolution more. At higher resolutions, Rmerge may become high, but depending on the data collection strategy, Eiger detectors can lead to high Rmerge just out of sheer data redundancy. Cc 1/2 can be a more useful metric in these contexts. -This is a good point and well spotted by the reviewer. Indeed, a CC1/2 of 0.802 suggests that the resolution can be pushed further. However, due to contaminating spots at higher resolutions the statistics significantly worsen when trying to push the resolution beyond 2.1 A, which is why we did not process the data to a higher resolution.

      • For Figure S8, the Polder map feature in Phenix is another option for showing ligand occupancy in an unbiased way. Did the authors try this? -We want to thank the reviewer for suggesting this. We have calculated a polder map using the Polder map feature in Phenix and both the resulting map and correlation coefficients support the presence of a dADP in the active site of monomer I. We added a section to the relative paragraph to include these new findings: "To increase our confidence that dADP was correctly placed within active site I, we calculated a polder map for dADP to test whether the b-phosphate density is correctly attributed or if it rather belongs to the bulk solvent. The resulting polder map and statistics support the placement of dADP in active site I with correlation coefficients of CC1,2=0.7627, CC1,3=0.9424, and CC2,3=0.7423 suggesting that the density does belong to dADP as CC1,3 > CC1,2 and CC1,3 > CC2,3 (S8 Fig.)."

      • It's disappointing that the tetramers show so much preferred orientation in the cryo-EM. With that said, while the nominal resolution is 4.8 Å, I think that with the streakiness the EM structure looks to have worse resolution than that. -We agree that the streakiness of the map is substantial. This is simply a result of the severe anisotropy of the map, which means that the resolution is probably worse than 4.8 Å in the "bad directions" of the map. The supplementary material (S9 Fig) clearly shows the preferred orientations leading to this problem. In the course of this study, we tried several methods to lessen the preferred orientation problem such as using graphene oxide-coated grids and collecting tilted data. However, when we got the crystal structure we saw no point in continuing these efforts. To address the comment of the reviewer, we extended the description of the EM map in the main text to say:

      "Due to strong preferred orientations, it was not possible to get an isotropic, high-resolution 3D structure of dAK using cryo-EM. The resulting 3D map had a nominal resolution of 4.8 Å, but a clearly anisotropic appearance probably reflecting lower resolution in the poorly resolved direction (S9 Fig)."

      **Referees cross-commenting**

      Overall, I agree with Reviewer #1's evaluation, and don't have any further suggestions or thoughts at this time.

      Reviewer #2 (Significance (Required)):

      Medical relevance: G. intestinalis is a parasite that causes 190 million cases of giardiasis per year. While treatable, there is evidence that giardia are developing a resistance to the main treatment at the moment, metronidazole. Thus, the authors provide a compelling case for the medical relevance of their investigation of Gi dNK for further pharmaceutical development. They provide further evidence for this by showing that several deoxyadenosine analogs bind the dNK and inhibit giardia growth. This work represents a very useful first step into a potential avenue for medical development. It's important to note that clinical studies are not within the purview of this research. However, in the discussion, the authors provide several comments on the promise of this avenue for future research.

      Conceptual, technical, and mechanistic relevance: Through biochemical and structural study, the authors provide a compelling framework to understand an enzyme that is very important to the unique lifestyle of giardia parasites. From an evolutionary standpoint, the authors provide insight into how giardia can survive even without major components of de novo DNA synthesis. The authors principally use well-established tools and techniques of the enzymology field. but do so to characterize a unique and previously uncharacterized enzyme system. This enzyme proves to be notable not just for its medical significance, but because it is unique among its family (non-TK1-like deoxynucleotide kinases) in its strong affinity for substrate and tetrameric quaternary structure. One relatively novel technique used in the study is mass photometry, which is a relatively new and exciting way to characterize native proteins at very low concentrations. Using this technique helps the authors overcome a common criticism of structural studies in which the high concentrations or crowding conditions of techniques like crystallography and cryo-EM may be inducing non-physiological oligomers.

      In summary, this work represents a meaningful addition to the protein structure-function literature. While it will principally be of interest to basic/fundamental researchers who study the mechanistic detail of protein function and evolution, it also provides a foundation for future translational work and antiparasitic drug design.

      Reviewer's background: I received my PhD in chemistry studying the structure and function of another enzyme key to DNA metabolism (except in giardia), ribonucleotide reductase. My background is in structural biology and biochemistry. I do not have sufficient expertise to comment on studies performed on G. intestinalis growth and susceptibility to deoxyadenosine analogs.

      • *
    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Roget et al. build on their previous work developing a simple theoretical model to examine whether ageing can be under natural selection, challenging the mainstream view that ageing is merely a byproduct of other biological and evolutionary processes. The authors propose an agent-based model to evaluate the adaptive dynamics of a haploid asexual population with two independent traits: fertility timespan and mortality onset. Through computational simulations, their model demonstrates that ageing can give populations an evolutionary advantage. Notably, this observation arises from the model without invoking any explicit energy tradeoffs, commonly used to explain this relationship.

      The model’s results are based on both numerical simulations and formal mathematical analysis.

      Additionally, the theoretical model developed here indicates that mortality onset is generally selected to start before the loss of fertility, irrespective of the initial values in the population. The selected relationship between the fertility timespan and mortality onset depends on the strength of fertility and mortality effects, with larger effects resulting in the loss of fertility and mortality onset being closer together. By allowing for a trans-generational effect on ageing in the model, the authors show that this can be advantageous as well, lowering the risk of collapse in the population despite an apparent fitness disadvantage in individuals. Upon closer examination, the authors reveal that this unexpected outcome is a consequence of the trans-generational effect on ageing increasing the evolvability of the population (i.e., allowing a more effective exploration of the parameter landscape), reaching the optimum state faster.

      The simplicity of the proposed theoretical model represents both the major strength and weakness of this work. On one hand, with an original and rigorous methodology, the logic of their conclusions can be easily grasped and generalised, yielding surprising results. Using just a handful of parameters and relying on direct competition simulations, the model qualitatively recapitulates the negative correlation between lifespan and fertility without requiring energy tradeoffs. This alone makes this work an important milestone for the rapidly growing field of adaptive dynamics, opening many new avenues of research, both theoretically and empirically.

      We thank the reviewers and editor for highlighting the importance of the work presented here.

      On the other hand, the simplicity of the model also makes its relationship with living organisms difficult to gauge, leaving open questions about how much the model represents the reality of actual evolution in a natural context.

      We presented both in results and discussion how the mathematical trade-offs between fertility and survival time give rise to (xb, xd) configuration representative of existing aging modes.

      In particular, a more explicit discussion of how the specifics of the model can impact the results and their interpretation is needed. For example, the lack of mechanistic details on the trans-generational effect on ageing makes the results difficult to interpret.

      We discussed the role of the transgenerational Lansing effect played to its function, there is no need for a particular mechanism beyond that function of transgenerational negative effect. We reinforce this in the discussion by adding the following sentence “Regarding the nature of the transgenerational effect, our model is agnostic and the mere transmission of any negative effect would be sufficient to exert the function.“

      Even if analytical results are obtained, most of the observations appear derived from simulations as they are currently presented. Also, the choice of parameters for the simulations shown in the paper and how they relate to our biological knowledge are not fully addressed by the authors.

      The long time limit of the system with and without the Lansing effect is based on analytical results later confirmed using numerical simulations. The choice of parameters is explained in the introduction as being the minimum ones for defining a living organism. As for the parameters’ values, our numerical analysis gives a solution for any ib, id, xb and xd on R+, making the choice of initial value a mere random decision.

      Finally, the conclusions of evolvability are insufficiently supported, as the authors do not show if the wider genotypic variability in populations with the ageing trans-generational effect is, in fact, selected.

      We do not show nor claim that evolvability per se is selected for but that the apparent advantage given by this transgenerational effect seems to be mediated by an increased genotypic/phenotypic variability conferred to the lineage that we interpreted as evolvability.

      Recommendations for the authors

      (1) The authors could use the lineage tracing results for the evolvability aspect. Specifically, within subpopulations featuring the Lansing effect, it would be valuable to explore whether individuals with parental age greater than the mortality onset (a > x_d) demonstrate higher fitness compared to individuals with a < x_d. Additionally, an examination of how this variation evolves over time could provide further insights into the dynamics of the proposed model.

      We thank the reviewer for this suggestion. This is an ongoing work in the group, especially in the context of varying environmental conditions.

      (2) In all simulations, I_b = I_d = 1, resulting in total fertility (x_b * I_b) equating to x_b, while x_d is proportional to life expectancy. Considering an exploration of the implications of this parameter setting, the authors could frame x_d as a 'lifespan cost', potentially allowing for the model to be conceptualised in terms of energetic tradeoffs. This might offer additional perspectives on the dynamics of the model and its alignment with biological principles.

      We discuss how the apparent trade-offs given by the model depending on ib and id values can be related to the interpretation of such trade-offs that has been accepted for most of the past century. Our claim here in the discussion is that one does not need such energetic trade-off for the fertility/longevity trade-offs to appear. Such energetic trade-off is not a “biological principle” but merely an accepted interpretation of a fertility/longevity trade-off that is not even a general mechanism.

      (3) Considering the necessity of variation in x_d for the observed patterns, an exploration could be undertaken by the authors to examine a model where x_d is simply variable without inheritance. This could involve centring x_d at some value d with some variance σ_d for all individuals. In such a scenario, it may be observed whether the same convergence of x_b - x_d occurs without requiring x_d to be selected. Furthermore, similar consequences of the Lansing effect could potentially be identified.

      This was done early on during our work and did not show any major changes in the model’s behaviour beyond the time of convergence. We did not include it to the final manuscript because of the low added value to an already long and complex manuscript.

      (4) While it may not be necessary to alter the model itself, it is suggested that the authors consider acknowledging the potential consequences of certain modelling decisions that might be perceived as biologically unrealistic. Notable examples include assumptions such as fertility from birth and zero mortality prior to x_d. These assumptions, such as infertility from birth, could be viewed as distinctive features, and it might be worth mentioning that parental care of offspring could have co-evolved with such features. This is particularly relevant considering the energy tradeoff hypothesis that has been postulated.

      Although inspired from results obtained in Drosophila, mice, nematodes and zebrafish, the model is so far haploid and asexual, thus involving individuals likely more similar to unicellulars. In these conditions, infertility from birth did not seem relevant to us. However, the model and codes are accessible online and we hope that others will use it to address such questions. It is interesting though to notice that ageing appears here without such constraint.

      Additionally, the consideration that all organisms face a non-trivial mortality rate at every age, not solely from physiological causes, reflects the reality within which selection operates.

      We thought this was the best way to reflect, an environment with a limited carrying capacity. A more complex model is under construction to take into account the fact that older individuals might be more sensitive to it than younger ones.

      (5) While acknowledging the technical rigour applied by the authors, it is suggested that further attention be given to conducting a comprehensive 'reality check' associated with the chosen parameters, particularly regarding the biological relevance of the results. For instance, the authors argue that offspring of old organisms do not, on average, live similarly to their parents. However, it is noted that studies in the haploid asexual organism yeast, akin to what the authors model (albeit not necessarily yeast), revealed that the average lifespan of yeast progeny born from young or old mothers is very similar.

      We do not claim that progeny of old parents live less long than that of younger parents on average, we say that it happens in the progeny of physiologically old parents, representing at most 10% of the population in our numerical simulations.

      The authors cite experimental evolution in Drosophila progeny conceived later in the life of the parent, indicating that the onset of mortality in these progeny occurs late, sometimes even after the end of the fertility period (Burke et al., 2016; Rose et al., 2002). While the authors report their own previous studies with divergent results, independent experiments have suggested an increase of x_d following an artificial increase of x_b (Luckinbill and Clare, 1985; Sgro et al., 2000). A more in-depth consideration of these contrasting observations and their potential implications for the current model could enhance the overall robustness of the study.

      The increase of x_d following an artificial increase of x_b is predicted by our model as discussed. The divergence of observations between studies is alas hard to assess.

      (6) To enhance readability and maintain consistency, it is suggested that the authors homogenise the description of key parameters, specifically x_b and x_d, throughout the text. This could contribute to improved clarity and rigour. One recommendation is to refer to x_b consistently as the 'fertility span' and x_d as the 'mortality onset' for the sake of uniformity in terminology.

      We have modified the text accordingly.

      (7) At various points in the text, the assertion is made that observations have indicated a tradeoff between fertility and longevity. It is recommended that the authors provide references or data to substantiate this claim. This addition would contribute to the empirical grounding of the mentioned tradeoff and strengthen the overall support for the assertions made in the study.

      We added the following references to the discussion Lemaitre et al., 2015, Kirkwood, 2005 and Rodrigues and Flatt 2016.

      (8) The statement claiming that the model is 'able to describe all types of ageing observed in the wild' should be moderated. As the authors themselves acknowledge, the model is referred to as a 'toy model,' and it is made clear that it cannot capture, nor is intended to capture, the entire diversity observed in life. Adjusting this statement to reflect the limited scope and purpose of the model would enhance precision and accuracy in the presentation of its capabilities.

      Although a toy model, its possible configurations encompass all the possible configurations described so far across the diversity of ageing throughout the tree of life from negligible senescence with no loss of fertility (x_b and x_d >> 0) to menopause-like configurations (x_b >> x_d) through fast mortality increase post reproduction (x_b = x_d). Replacing our current square functions would allow age-dependant decrease or increase of fertility and/or risks of mortality onsets.

      (9) To bolster the biological relevance of the study, it is strongly recommended that the authors cross-check the results of their simulations with previously published experimental findings. This approach would serve to strengthen the alignment between the model outcomes and observed biological phenomena. Additionally, placing greater emphasis on the biological relevance aspects throughout the text would contribute to a more robust and comprehensive exploration of the study's implications.

      In the present manuscript we have tried to cite a certain number of results from artificial selection experiments on life history traits in order to strengthen the interpretations of our model’s behaviour. There are numerous other studies, going in the same direction or not, but we do not think that it would be relevant to add an exhaustive list of them. Nevertheless, we added Stearns et al., 2000 that adds extrinsic high mortality to the evolution of life history traits.

      (1) For enhanced clarity, it is suggested that the x-axis in Figure 1 be labelled as 'age.' Considering this adjustment could contribute to clearer visual communication of the data.

      We agree with the reviewer and modified the figure accordingly.

      (!!) The addition of graphical legends is recommended for Figures 3-5, as well as the supplementary figures. Including these legends would provide essential context and improve the interpretability of the figures for readers.

      We agree with the reviewer and modified the figure accordingly.

      (12) For improved distinction of the ranges indicated by quantiles in Figure 3, it is suggested that the authors consider enhancing visual clarity. One approach could involve making the middle quantile thicker or using a different line type. Additionally, it is recommended to explore the calculation of the highest density 90% intervals rather than the 1-9 deciles. This adjustment could contribute to a clearer representation of the data distribution in the figure.

      We named the different deciles directly on the figure to improve readability.

      (13) It is observed that the mathematical proofs in Annex 1 are not displaying properly in the PDF. Additionally, there seem to be missing and broken references for the Annex. This issue may be related to LaTeX formatting. The authors could consider revisiting the formatting of Annex 1 to ensure the correct display of mathematical proofs and address the referencing concerns, possibly by checking and rectifying any LaTeX-related issues.

      The latex file of the supplementary was not correctly compiled. It is now corrected.

      (14) There is inconsistency in the text regarding the reference to the Annex, with both 'Annex' and 'Annexe' being used interchangeably. To maintain uniformity, it is suggested that the authors consistently use either 'Annex' or 'Annexe' throughout the text. This adjustment would contribute to a more polished presentation of the supplementary material.

      We corrected them accordingly.

      (15)There appears to be a typographical error in the name of Supplementary Figure 3.

      We corrected it accordingly.

    1. Locke: Everyone has a right to life, liberty, and property Jefferson in the Declaration of Independence [b30]: “We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.” Discussions of “human rights” fit in the Natural Rights ethics framework Key figures:

      I think the natural rights framework provide a view to understand human basic rights, but according on universal principles might not explain or solve complex ethical issues in a globalized context. Moreover, seeing everyone's rights as indivisible and equal seems unrealistic to me because it may overlook the impact of social and economic inequalities.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Wang et al investigated the evolution, expression, and function of the X-linked miR-506 miRNA family. They showed that the miR-506 family underwent rapid evolution. They provided evidence that miR-506 appeared to have originated from the MER91C DNA transposons. Human MER91C transposon produced mature miRNAs when expressed in cultured cells. A series of mouse mutants lacking individual clusters, a combination of clusters, and the entire X-linked cluster (all 22 miRNAs) were generated and characterized. The mutant mice lacking four or more miRNA clusters showed reduced reproductive fitness (litter size reduction). They further showed that the sperm from these mutants were less competitive in polyandrous mating tests. RNA-seq revealed the impact of deletion of miR-506 on the testicular transcriptome. Bioinformatic analysis analyzed the relationship among miR-506 binding, transcriptomic changes, and target sequence conservation. The miR-506-deficient mice did not have apparent effect on sperm production, motility, and morphology. Lack of severe phenotypes is typical for miRNA mutants in other species as well. However, the miR-506-deficient males did exhibit reduced litter size, such an effect would have been quite significant in an evolutionary time scale. The number of mouse mutants and sequencing analysis represent a tour de force. This study is a comprehensive investigation of the X-linked miR-506 miRNA family. It provides important insights into the evolution and function of the miR-506 family.

      The conclusions of this preprint are mostly supported by the data except being noted below. Some descriptions need to be revised for accuracy.

      L219-L285: The conclusion that X-linked miR-506 family miRNAs are expanded via LINE1 retrotransposition is not supported by the data. LINE1s and SINEs are very abundant, accounting for nearly 30% of the genome. In addition, the LINE1 content of the mammalian X chromosome is twice that of the autosomes. One can easily find flanking LINE1/SINE repeat. Therefore, the analyses in Fig. 2G, Fig. 2H and Fig. S3 are not informative. In order to claim LINE1-mediated retrotransposition, it is necessary to show the hallmarks of LINE1 retrotransposition, which are only possible for new insertions. The X chromosome is known to be enriched for testis-specific multi-copy genes that are expressed in round spermatids (PMID: 18454149). The conclusion on the LINE1-mediated expansion of miR-506 family on the X chromosome is not supported by the data and does not add additional insights. I think that the LINE1 related figure panels and description (L219-L285) need to be deleted. In discussion (L557558), "...and subsequently underwent sequence divergence via LINE1-mediated retrotransposition during evolution" should also be deleted. This section (L219-L285) needs to deal only with the origin of miR506 from MER91C DNA transposons, which is both convincing and informative.

      Reply: Agreed, the corresponding sentences were deleted.

      Fig. 3A: can you speculate/discuss why the miR-506 expression in sperm is higher than in round spermatids?

      Reply: RNAs are much less abundant in sperm than in somatic or spermatogenic cells (~1/100). Spermborne small RNAs represent a small fraction of total small RNAs expressed in their precursor spermatogenic cells, including spermatocytes and spermatids. Therefore, when the same amount of total/small RNAs are used for quantitative analyses, sperm-borne small RNAs (e.g., miR-506 family miRNAs) would be proportionally enriched in sperm compared to other spermatogenic cells. We discussed this point in the text (Lines 550-556).

      **Reviewer #2 (Public Review):

      In this paper, Wang and collaborators characterize the rapid evolution of the X-linked miR-506 cluster in mammals and characterize the functional reference of depleting a few or most of the miRNAs in the cluster. The authors show that the cluster originated from the MER91C DNA transposon and provide some evidence that it might have expanded through the retrotransposition of adjacent LINE1s. Although the animals depleted of most miRNAs in the cluster show normal sperm parameters, the authors observed a small but significant reduction in litter size. The authors then speculate that the depletion of most miRNAs in the cluster could impair sperm competitiveness in polyandrous mating. Using a successive mating protocol, they show that, indeed, sperm lacking most X-linked miR-506 family members is outcompeted by wild-type sperm. The authors then analyze the evolution of the miR-506 cluster and its predicted targets. They conclude that the main difference between mice and humans is the expansion of the number of target sites per transcript in humans.

      The conclusions of the paper are, in most cases, supported by the data; however, a more precise and indepth analysis would have helped build a more convincing argument in most cases.

      (1) In the abstracts and throughout the manuscript, the authors claim that "... these X-linked miRNA-506 family miRNA [...] have gained more targets [...] " while comparing the human miRNA-506 family to the mouse. An alternative possibility is that the mouse has lost some targets. A proper analysis would entail determining the number of targets in the mouse and human common ancestor.

      Reply: This question alerted us that we did not describe our conclusion accurately, causing confusion for this reviewer. Our data suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis. In other words, mice never lost any targets compared to humans, but per the miR-506 family miRNA tends to target more genes in humans than in mice.

      We revised the text to more accurately report our data. The pertaining text (lines 490-508) now reads: “Furthermore, we analyzed the number of all potential targets of the miR-506 family miRNAs predicted by the aforementioned four algorithms among humans, mice, and rats. The total number of targets for all the X-linked miR-506 family miRNAs among different species did not show significant enrichment in humans (Fig. S9C), suggesting the sheer number of target genes does not increase in humans. We then compared the number of target genes per miRNA. When comparing the number of target genes per miRNA for all the miRNAs (baseline) between humans and mice, we found that on a per miRNA basis, human miRNAs have more targets than murine miRNAs (p<0.05, t-test) (Fig. S9D), consistent with higher biological complexity in humans. This became even more obvious for the X-linked miR-506 family (p<0.05, t-test) (Fig. S9D). In humans, the X-linked miR-506 family, on a per miRNA basis, targets a significantly greater number of genes than the average of all miRNAs combined (p<0.05, t-test) (Fig. S9D). In contrast, in mice, we observed no significant difference in the number of targets per miRNA between X-linked miRNAs and all of the mouse miRNAs combined (mouse baseline) (Fig. S9D). These results suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis.”

      We also changed “have gained” to “have” throughout the text to avoid confusion.

      (2) The authors claim that the miRNA cluster expanded through L1 retrotransposition. However, the possibility of an early expansion of the cluster before the divergence of the species while the MER91C DNA transposon was active was not evaluated. Although L1 likely contributed to the diversity within mammals, the generalization may not apply to all species. For example, SINEs are closer on average than L1s to the miRNAs in the SmiR subcluster in humans and dogs, and the horse SmiR subcluster seems to have expanded by a TE-independent mechanism.

      Reply: Agreed. We deleted the data mentioned by this reviewer.

      (3) Some results are difficult to reconcile and would have benefited from further discussion. The miR-465 sKO has over two thousand differentially expressed transcripts and no apparent phenotype. Also, the authors show a sharp downregulation of CRISP1 at the RNA and protein level in the mouse. However, most miRNAs of the cluster increase the expression of Crisp1 on a reporter assay. The only one with a negative impact has a very mild effect. miRNAs are typically associated with target repression; however, most of the miRNAs analyzed in this study activate transcript expression.

      Reply: Both mRNA and protein levels of Crisp1 were downregulated in KO mice, and these results are consistent with the luciferase data showing overexpression of these miRNAs upregulated the Crisp1 3’UTR luciferase activity. We agree that miRNAs usually repress target gene expression. However, numerous studies have also shown that some miRNAs, such as human miR-369-3, Let-7, and miR-373, mouse miR-34/449 and the miR-506 family, and the synthetic miRNA miRcxcr4, activate gene expression both in vitro (1, 2) and in vivo (3-6). Earlier reports have shown that these miRNAs can upregulate their target gene expression, either by recruiting FXR1, targeting promoters, or sequestering RNA subcellular locations (1, 2, 6). We briefly discussed this in the text (Lines 605-611).

      (4) More information is required to interpret the results of the differential RNA targeting by the murine and human miRNA-506 family. The materials and methods section needs to explain how the authors select their putative targets. In the text, they mention the use of four different prediction programs. Are they considering all sites predicted by any method, all sites predicted simultaneously by all methods, or something in between? Also, what are they considering as a "shared target" between mice and humans? Is it a mRNA that any miR-506 family member is targeting? Is it a mRNA targeted by the same miRNA in both species? Does the targeting need to occur in the same position determined by aligning the different 3'UTRs?

      Reply: Since each prediction method has its merit, we included all putative targets predicted by any of the four methods. The "shared target" refers to a mRNA that any miR-506 family member targets because the miR-506 family is highly divergent among different species. We have added the information to the “Large and small RNA-seq data analysis” section in Materials and Methods (Lines 871-882).

      (5) The authors highlight the particular evolution of the cluster derived from a transposable element. Given the tendency of transposable elements to be expressed in germ cells, the family might have originated to repress the expression of the elements while still active but then remained to control the expression of the genes where the element had been inserted. The authors did not evaluate the expression of transcripts containing the transposable element or discuss this possibility. The authors proposed an expansion of the target sites in humans. However, whether this expansion was associated with the expansion of the TE in humans was not discussed either. Clarifying whether the transposable element was still active after the divergence of the mouse and human lineages would have been informative to address this outstanding issue.

      Reply: Agreed. The MER91C DNA transposon is denoted as nonautonomous (7); however, whether it was active during the divergence of mouse and human lineages is unknown. To determine whether the expansion of the target sites in humans was due to the expansion of the MER91C DNA transposon, we analyzed the MER91C DNA transposon-containing transcripts and associated them with our DETs. Of interest, 28 human and 3 mouse mRNAs possess 3’UTRs containing MER91C DNA sequences, and only 3 and 0 out of those 28 and 3 genes belonged to DETs in humans and mice, respectively (Fig. S9E), suggesting a minimal effect of MER91C DNA transposon expansion on the number of target sites. We briefly discussed this in the text (Lines 511-518).

      Post-transcriptional regulation is exceptionally complex in male haploid cells, and the functional relevance of many regulatory pathways remains unclear. This manuscript, together with recent findings on the role of piRNA clusters, starts to clarify the nature of the selective pressure that shapes the evolution of small RNA pathways in the male germ line.

      Reply: Agreed. We appreciate your insightful comments.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors conducted a comprehensive study of the X-linked miR-506 family miRNAs in mice on its origin, evolution, expression, and function. They demonstrate that the X-linked miR-506 family, predominantly expressed in the testis, may be derived from MER91C DNA transposons and further expanded by retrotransposition. By genetic deletion of different combinations of 5 major clusters of this miRNA family in mice, they found these miRNAs are not required for spermatogenesis. However, by further examination, the mutant mice show mild fertility problem and inferior sperm competitiveness. The authors conclude that the X-linked miR-506 miRNAs finetune spermatogenesis to enhance sperm competition.

      Strengths:

      This is a comprehensive study with extensive computational and genetic dissection of the X-linked miR506 family providing a holistic view of its evolution and function in mice. The finding that this family miRNAs could enhance sperm competition is interesting and could explain their roles in finetuning germ cell gene expression to regulate reproductive fitness.

      Weaknesses:

      The authors specifically addressed the function of 5 clusters of X-link miR-506 family containing 19 miRNAs. There is another small cluster containing 3 miRNAs close to the Fmr1 locus. Would this small cluster act in concert with the 5 clusters to regulate spermatogenesis? In addition, any autosomal miR-506 like miRNAs may compensate for the loss of X-linked miR-506 family. These possibilities should be discussed.

      Reply: The three FmiRs were not deleted in this study because the SmiRs are much more abundant than the FmiRs in WT mice (Author Response image 1, heatmap version of Fig. 5C). Based on small RNA-seq, some FmiRs, e.g., miR-201 and miR-547, were upregulated in the SmiRs KO mice, suggesting that this small cluster may act in concert with the other 5 clusters and thus, worth further investigation. To our best knowledge, all the miR-506 family miRNAs are located on the X chromosome, although some other miRNAs were upregulated in the KO mice, they don’t belong to the miR-506 family. We briefly discussed this point in the text (Lines 635-638).

      Author response image 1.

      sRNA-seq of WT and miR-506 family KO testis samples.

      Direct molecular link to sperm competitiveness defect remains unclear but is difficult to address.

      Reply: In this study, we identified a target of the miR-506 family, i.e. Crisp1. KO of Crisp1 in mice, or inhibition of CRISP1 in human sperm (7, 8), appears to phenocopy the quinKO mice, displaying largely normal sperm motility but compromised ability to penetrate eggs. The detailed mechanism warrants further investigation in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 84-85: "Several cellular events are unique to the male germ cells, e.g., meiosis, genetic recombination, and haploid male germ cell differentiation (also called spermiogenesis)". This statement is not accurate. Please revise. Meiosis and genetic recombination are common to both male and female germ cells. They are highly conserved in both sexes in many species including mouse.

      Reply: Agreed. We have revised the sentence and it now reads: “Several cellular events are unique to the male germ cells, e.g., postnatal formation of the adult male germline stem cells (i.e., spermatogonia stem cells), pubertal onset of meiosis, and haploid male germ cell differentiation (also called spermiogenesis) (9)” (Lines 83-86).

      Lines 163-164: "we found that Slitrk2 and Fmr1 were syntenically linked to autosomes in zebrafish and birds (Fig. 1A), but had migrated onto the X chromosome in most mammals". This description is not accurate. Chr 4 in zebrafish and birds is syntenic to the X chromosome in mammals. The term "migrated" is not appropriate. Suggestion: Slitrk2 and Fmr1 mapped to Chr 4 (syntenic with mammalian X chromosome) in zebrafish and birds but to the X chromosome in most mammals.

      Reply: Agreed. Revised as suggested.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the significance statement, the authors mention that the mutants are "functionally infertile," although the decrease in competitiveness is partial. I suggest referring to them as "functionally sub-fertile."

      Reply: Agreed. Revised as suggested.

      (2) I will urge the authors to explain in more detail how some figures are generated and what they mean. Some critical information needs to be included in various panels.

      (2a) Figure S1. The phastCons track does not seem to align as expected with the rest of the figure. The highest conservation peak is only present in humans, and the sequence conserved in the sea turtle has the lowest phastCons score. I was expecting the opposite from the explanation.

      Reply: The tracks for phyloP and phastCons are the scores for all 100 species, whereas the tracks with the species names on the left are the corresponding sequences aligned to the human genome. We have revised our figure to make it clearer.

      (2b) Figure 2A and Figure S2C. Although all the functional analysis of the manuscript has been done in mice, the alignments showing sequence conservation do not include the murine miRNAs. Please include the mouse miRNAs in these panels.

      Reply: The mouse has Mir-506-P7 with the conserved miRNA-3P seed region, which was included in the lower panel in Figure S2C. However, mice do not have Mir-506-P6, which may have been lost or too divergent to be recognized during the evolution and thus, were not included in Figure 2A and the upper panel in Figure S2C.

      (2c) Figure S7H. The panel could be easier to read.

      Reply: Agreed. We combined all the same groups and turned Figure S7H (now Figure S6H) into a heatmap.

      (2d) The legend of Figure 6G reads, "The number of target sites within individual target mRNAs in both humans and mice ." Can the author explain why the value 1 of the human "Number of target sites" is connected to virtually all the "Number of target sites" values in mice?

      Reply: Sorry for the confusion. For example, for gene 1, we have 1 target site in the human and 1 target site in the mouse; but for gene 2, we have 1 target site in the human and multiple sites in the mouse; therefore, the value 1 is connected to more than one value in the mouse.

      Reviewer #3 (Recommendations For The Authors):

      CRISP1 and EGR1 protein localization in WT and mutant sperm by immunostaining would be helpful.

      Reply: Agreed. We performed immunostaining for CRISP1 on WT sperm, and the new results are presented in Figure S8D. CRISP1 seems mainly expressed in the principal piece and head of sperm.

      The detailed description of the generation of various mutant lines should be included in the Methods.

      Reply: We added more details on the generation of knockout lines in the Materials and Methods (686701).

      References:

      (1) S. Vasudevan, Y. Tong, J. A. Steitz, Switching from repression to activation: microRNAs can upregulate translation. Science 318, 1931-1934 (2007).

      (2) R. F. Place, L. C. Li, D. Pookot, E. J. Noonan, R. Dahiya, MicroRNA-373 induces expression of genes with complementary promoter sequences. Proc Natl Acad Sci U S A 105, 1608-1613 (2008).

      (3) Z. Wang et al., X-linked miR-506 family miRNAs promote FMRP expression in mouse spermatogonia. EMBO Rep 21, e49024 (2020).

      (4) S. Yuan et al., Motile cilia of the male reproductive system require miR-34/miR-449 for development and function to generate luminal turbulence. Proc Natl Acad Sci U S A 116, 35843593 (2019).

      (5) S. Yuan et al., Oviductal motile cilia are essential for oocyte pickup but dispensable for sperm and embryo transport. Proc Natl Acad Sci U S A 118 (2021).

      (6) M. Guo et al., Uncoupling transcription and translation through miRNA-dependent poly(A) length control in haploid male germ cells. Development 149 (2022).

      (7) V. G. Da Ros et al., Impaired sperm fertilizing ability in mice lacking Cysteine-RIch Secretory Protein 1 (CRISP1). Dev Biol 320, 12-18 (2008).

      (8) J. A. Maldera et al., Human fertilization: epididymal hCRISP1 mediates sperm-zona pellucida binding through its interaction with ZP3. Mol Hum Reprod 20, 341-349 (2014).

      (9) L. Hermo, R. M. Pelletier, D. G. Cyr, C. E. Smith, Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microsc Res Tech 73, 241-278 (2010).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The present study by Berger et al. analyzes to what extent memory formation is dependent on available energy reserves. This has been dealt with extensively in the case of aversive memory formation, but only very sparsely in the case of appetitive memory formation. It has long been known that an appetitive memory in flies can only be formed by starvation. However, the authors here additionally show that not only the duration of starvation plays a role, but also determines which form of memory (short- or long-term memory) is formed. The authors demonstrated that internal glycogen stores play a role in this process and that this is achieved through insulin-like signaling in octopaminergic reward neurons that integrates internal energy stores into memory formation. Here, the authors suggest that octopamine plays a role as a negative regulator of different forms of memory.

      The study sheds light on an old question, to what extent the octopaminergic neuronal system plays a role in the formation of appetitive memory, since in recent years only the dopaminergic system has been in focus. Furthermore, the data are an interesting contribution to the ongoing debate whether insulin receptors play a role in neurons themselves or in glial cells. The experiments are very well designed and the authors used a variety of behavioural experiments, genetic tools to manipulate neuronal activity and state-of-the-art imaging techniques. In addition, they not only clearly demonstrated that octopamine is a negative regulator of appetitive memory formation, but also proposed a mechanism by which the insulin receptor in octopaminergic neurons senses the internal energy status and then controls the activity of those neurons. The conclusions are mostly supported by the data, but some aspects related to the experimental design, some explanations and literature references need more clarification and revision.

      (1) Usually, long-term memory (LTM) is tested 24 hours after training. Here, the authors usually refer to LTM as a memory that is tested 6 hours after training. The addition of a control experiment to show that LTM that the authors observe here lasts longer would increase the power of this study immensely.

      We thank the reviewer for this comment, as it helped greatly to clarify the matter.

      We measured memory of control and mutant flies 24 h after the training and included the data into the manuscript (Figure 1B and summarized in a model in Figure 2C). We show that control flies develop an intermediate type of memory, that is depending on the length of starvation either anesthesia-sensitive or resistant. Mutants lacking octopamine develop either anesthesia-sensitive or resistant long-term memory.

      (2) The authors define here another consolidated memory component as ARM, when they applied a cold-shock 2 hours after training. However, some publications showed that LTM is formed after only one training cycle (Krashes et al 2008, Tempel et al 1983). This makes it difficult to determine, whether appetitive ARM can be formed. Furthermore, one study showed that appetitive ARM is absent after massed training (Colomb et al 2009). Therefore, the conclusion could be also, that different starvation protocols, would lead to different stabilities of LTM. Therefore, additional experiments could help to clarify this opposing explanation. From these results, it can then be concluded either that different stable forms of LTM are formed depending on the starvation state, or that two differently consolidated memory phases (LTM, ARM) are formed, as has already been shown for aversive memory. This is also important for other statements in the manuscript, and therefore the authors should address this. For example, the findings about the insulin receptor (is it two opposing memories or different stabilities of LTM).

      The flies indeed develop different types of memory depending on the length of starvation and the internal energy supply.

      Reviewer #2 (Public Review):

      How organism physiological state modulates establishment and perdurance of memories is a timely question that the authors aimed at addressing by studying the interplay between energy homeostasis and food-related conditioning in Drosophila. Specifically, they studied how starvation modulates the establishment of short-term vs long-term memories and clarified the role of the monoamine Octopamine in food-related conditioning, showing that it is not per se involved in formation of appetitive short-term memories but rather gates memory formation by suppressing LTM when energy levels are high. This work clarifies previously described phenotypes and provides insight about interconnections between energy levels, feeding and formation of short-term and long-term food-related memories. In the absence of population-specific manipulation of octopamine signaling, it however does not reach a circuit-level understanding of how these different processes are integrated.

      Strengths

      • Previous studies have documented the impact of Octopamine on different aspects of food-related behaviors (regulation of energy homeostasis, feeding, sugar sensing, appetitive memory...), but we currently lack a clear understanding of how these different functions are interconnected. The authors have used a variety of experimental approaches to systematically test the impact of internal energy levels in establishment of appetitive memory and the role of Octopamine in this process.

      • The authors have used a range of approaches, performed carefully controlled experiments and produced high quality data.

      Weaknesses

      (1) In the tbh mutant flies, Tyramine -to- Octopamine conversion is inhibited, resulting not only in a lack of Octopamine, but also in elevated levels of Tyramine. If and how elevated levels of Tyramine contributes to the described phenotypes is unclear. In the current version of the manuscript, only one set of experiments (Figure 2) has been performed using Octopamine agonist. This is particularly important in light of recent published data showing that starvation modifies Tyramine levels. (2) Octopamine (and its precursor Tyramine) have been implicated in numerous processes, complicating the analysis of the phenotypes resulting from a general inhibition of tbh.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increase in octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (3) The manuscript explores various aspects of the impact of energy levels on food-related behaviors and the underlying sensing and effector mechanism, both in wild-type and tbh mutants, making it difficult to follow the flow of the results.

      We included models illustrating the results to clarify the content of the manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, Berger et al. study how internal energy storage influence learning and memory. Since in Drosophila melanogaster, octopamine (OA) is involved in the regulation of energy homeostasis they focus on the roles of OA. To do so they use the tyramine-β-hydroxylase (Tbh) mutant that is lacking the neurotransmitter OA and study short term memory (STM), long-term memory (LTM) and anesthesia-resistant memory (ARM). They show that the duration of starvation affects the magnitude of both short- and long-term memory. In addition, they show that OA has a suppressive effect on learning and memory. In terms of energy storage, they show that internal glycogen storage influences how long sucrose is remembered and high glycogen suppresses memory. Finally, they show that insulin-like signaling in octopaminergic neurons, which is also related to internal energy storage, suppresses learning and memory.

      This is an important study that extends our knowledge on OA activity in learning and memory and the effects the metabolic state has on learning and memory. The authors nicely use the genetic tools available in flies to try and unravel the complex circuitry of metabolic state level, OA activity and learning and memory.

      Nevertheless, I do have some comments that I think require attention:

      (1) The authors use RNAi to reduce the level of glycogen synthase or glycogen phosphorylase. These manipulations are expected to affect the level of glycogen. Using specific drivers the authors attempt to manipulate glycogen level at the muscles and fat bodies and examine how this affects learning and memory. The conclusions of the authors arise solely from the manipulation intended (i.e. the genetics). However, the authors also directly measured glycogen levels at these organs and those do not follow the manipulation intended, i.e. the RNAi had very limited effect on the glycogen level. Nevertheless, these results are ignored.

      We agreed with the reviewer and repeated the experiments. While we could not detect differences in whole animals, we detected differences in tissues enriched for muscles or fat, e.g. thorax or abdomen. We added the data.

      (2) The authors claim in the summary that OA is not required for STM. However, according to one experiment OA is required for STM as Tbh mutants cannot form STM. In another experiment OA is suppressive to STM as wt flies fed with OA cannot form STM. Therefore, it is very difficult to appreciate the actual role of OA on STM.

      During mild starvation, the internal energy supply is greater in Tbh mutants than in control flies. This information is integrated into the reward system via insulin receptor signaling. Therefore, the association between the odorant and sucrose is not meaningful to the mutants and no STM is formed. At the same time there is no release of octopamine and therefore no repression of LTM. In starved animals, octopamine suppresses food intake (we added the data). This is consistent with a function of Octopamine as a signal for the presence of food. Depending on when the signal comes, this might suppress the formation of STM or LTM.

      (3) The authors use t-test and ANOVA for most of the statistics, however, they did not perform normality tests. While I am quite sure that most datasets will pass normality test, nevertheless, this is required.

      Thanks for pointing this out. We have included a description in the “Materials and Methods” section that explains how we tested the data for normal distribution. We corrected the figure legends accordingly.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. “

      (4) While it is logical to assume that OA neurons are upstream to R15A04 DA neurons, I am not sure this really arises from the experiment that is presented here. It is well established that without activity in R15A04 DA neurons there is no LTM. Since OA acts to decrease LTM, can one really conclude anything about the location of OA effect when there is no learning?

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant.

      (5) It is unclear how expression of a dominant negative form of insulin receptor (InR) in OA neurons can rescue the lack of OA due to the Tbh mutation. If OA neurons cannot release anything to the presumably downstream DA neurons, how can changing their internal signaling has any effect?

      The expression of the dominant negative form of the insulin receptor signals no food or low energy levels and activation of the insulin receptor that there is enough food. The reward is a source of food, but the energy content is not high enough to fill the energy stores. The insulin receptor activation can activate at least three different signaling cascades, one of which might regulate octopamine release.

      While I stressed some comments that need to be addressed, the overall take-home message of the manuscript is supported and the authors do show that the metabolic state of the animal affects learning and memory. I do think though, that some more caution is required for some of the conclusions.

      We added additional data to address the points raised.

      Recommendations for the authors:

      We addressed all points raised by the reviewers, clarified the content or added more data.

      Reviewer #1 (Recommendations For The Authors):

      (1) Throughout the manuscript, the full stop of a sentence is always placed before the references.

      We fixed this.

      (2) I find the English in the manuscript not yet sufficient for publication. I suggest that the authors carefully revise the manuscript. I think if the sentences are structured a little more clearly, this paper has enormous potential to be read by your broad community.

      We agree and revised the manuscript. We hope the manuscript is now clearer.

      (3) Sentences l114 to l117 are misleading. The authors imply that they tested the same flies for changes in odor perception or sucrose sensitivity. I assume that the authors meant that they analyzed different groups of animals.

      We clarified the sentence as follows:

      “To ensure that the observed differences in learning and memory were not due to changes in odorant perception, odorant evaluation or sucrose sensitivity, different fly populations of the same genotypes were tested for their odorant acuity, odorant preference and their sucrose responsiveness (Table S1).”

      (4) In the title as well as in the abstract the influence of octopamine on appetitive memory formation is described in more detail, this is also the main focus of this study. However, in the introduction, the influence of the insulin receptor on memory formation is discussed first. Personally, I would describe this later in the manuscript, ideally in the results section. At this point in the manuscript, this leads to an interruption in the flow of reading.

      Thanks for the suggestion. We changed the order in the introduction.

      (5) The authors could consider, since they only used Drosophila melanogaster, changing "Drosophila melanogaster" to "Drosophila" throughout the manuscript.

      We modified the text accordingly.

      (6) All evaluations and statistical tests are state of the art. However, I have one comment. For each statistical test, a correction should be made depending on the number of tests. However, I could not determine whether this was also done for the parametric or non-parametric one-sample t-test. From the results and the methods section, I would guess not. Here I would recommend a Bonferroni correction or even better a Sidak-Holm correction. Furthermore, the authors could also go into more detail about which non-parametric one-sample t-test they used.

      We described the statistic used in more detail in the material and method section.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. For normal distributed data, we used the Student’s t test to compare differences between two groups and the one-way ANOVA with Tukey’s post hoc HSD test for differences between more than two groups. For nonparametric data, we used the Mann-Whitney U test for differences between two groups and for more than two groups the Kruskal-Wallis test with post hoc Duenn analysis and Bonferroni correction. The nonparametric one-sample sign test was used to analyze whether behavior was not based on random choice and differed from zero (P < 0.5). The statistical data analysis was performed using statskingdom (https://www.statskingdom.com).”

      (7) In nearly all figure legends the sentence "The letter "a" marks a significant difference from random choice as determined by a one-sample sign test (P < 0.05; P< 0.01)" occur. This is correctly indexed in the figures. However, I do not understand here what then P < 0.05; P**< 0.01 means. The significance level should be described here. I would strongly recommend the authors to make the definition clearer.

      We corrected this in the figure legends (see also above).

      (8) In Fig. 1B the labelling is a bit confusing. I interpret the two right groups as the mutants for octopamine, but there is still w[1118] in front.

      We modified the Figure 1B.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions

      (1) Assessing the contribution of Tyramine in the observed phenotypes (for example by reducing the levels of Tyramine or its specific receptor) would help understand the contribution of Tyramine in the observed phenotypes.

      See comments above.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increased octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (2) Cell-specific inhibition of octopamine receptors should thus be performed to precisely interpret the observed phenotypes and dissect how interconnected the different phenotypes are, which is the object of this publication.

      We observed that the time point and duration of octopamine application changes the behavioral output. The behavior analyzed depends on pulses of octopamine and differences of the internal energy status. A cell-specific inhibition via RNAi knock down of octopamine receptors might not clarify the issue.

      (3) Defining of streamline and progressively integrating the different observations into a unifying model would improve the clarity and flow of the manuscript.

      We included models explaining the observed results (Figure 2C and Figure 7E).

      Minor comments

      Line 129: Figure 1B should be mentioned, not 2B.

      Figure 1 legend: E should be replaced by C (after A,B).

      Figure S5: what are the arrows pointing to? Why are the Inr foci visible in A not seen in B? It should be mCD8-GFP and not mCD on top of the images.

      We fixed this.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) Can one really conclude from Figure 2A that OA acts on R15A04 DA neurons? It is well established that without activity in these DA neurons there is no LTM. Since OA acts to decrease learning, how one can conclude anything about the location of OA effect when there is no learning? With STM the situation was opposite, OA supported learning and this was abolished when DA neurons were silenced. I think some supporting experiment are required, i.e. how OA affects DA neurons activity or, alternatively, tone down a bit the writing.

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant. The inhibition of dopaminergic neurons blocks the memory of Tbh mutants. Taken together the duration of the memory, the cold-shock experiments and the inhibition of the dopaminergic neurons, Tbh develops LTM after training. This training does not evoke memory in controls.

      The loss of STM in mildly starved Tbh mutants depends on the integration of the high internal energy levels via InR signaling. Reducing the internal energy levels further by extension of starvation result in STM supporting that OA is not directly involved in the formation of STM.

      (2) Figure 4 requires some clarifications. In Supplementary Figure S2 the authors show that they could not manipulate glycogen levels in muscles. However, in Figure 4B they show that "Increasing glycogen levels in the muscles did not change short-term memory in 16 h starved flies, but the reduction in glycogen significantly improved memory strength (Figure 4B)" (lines 231-233). How can this be reconciled?

      While we could not detect differences in whole animals, we detected differences in glycogen content in body parts enriched with muscles or fat, e.g. thorax or abdomen when using UAS-GlyP-RNAi or UAS-GlyS-RNAi under the control of the respective Gal4 drivers.

      We added the data.

      Likewise, the authors write that "Increasing or decreasing glycogen levels in the fat bodies had no effect on memory performance (Figure 4C)" Line (233-234). However, in Figure S2 they show that they can only increase glycogen levels but not decrease them.

      As explained above the conclusion of Figure 4 "Thus, low levels of glycogen in the muscles upon starvation positively influence appetitive short-term memory, while high levels of glycogen in the muscles and fat body reduce short-term memory" lines 245-246, is not supported by the direct measurements of glycogen presented in Figure S2.

      We added the data showing that the reduction or increase can be measured when analyzing the specific body parts enriched in muscles tissue or fat tissue.

      (3) In cases where mutant flies do not display learning, a control should be done to see if they ate the sugar (with dye). Especially since the genetic manipulation affects metabolism.

      We analyzed how much sucrose the animals consumed in the behavioral test. Tbh and controls fed and there was no difference in feeding behavior between the mutants and the controls.

      “We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies. “

      (4) The use of t-test requires the data to be normally distributed. If I am not mistaken this was not demonstrated for any of the datasets used. I did a quick check on one of the datasets provided in the excel sheet and it is normally distributed. Therefore, please add normality test for all data sets. If some do not pass normality, please use a suitable non-parametric test.

      We added normality test to all data sets and used non-parametric tests for non-normal distributed data. We clarify this in the material and method section and the figure legends.

      (5) The authors show that OA suppresses also STM. This result is in contradiction to previous published results. This by itself is not a problem. However, this result also seems to me in contradiction to the authors own results. According to Figure 1B, OA is required for STM as it absence in the tbh mutant results in loss of STM. According to Figure 2C, OA is reducing STM as wt flies fed with OA just prior to learning do not form STM. This appears in other places in the manuscript as well.

      In addition, in the text lines 178-180, the authors write "A short pulse of octopamine before the training inhibits the STM. Thus, octopamine is a negative regulator of appetitive dopaminergic neuron-dependent long-term memory and can block STM." But in the summary they write "Octopamine is not required for short-term memory, since octopamine deficient mutants form appetitive short-term memory to sucrose and to other nutrients depending on the internal energy status." So, the take-home message regarding OA and STM is unclear.

      The authors need to better clarify this point.

      We clarified these points. See comments above. The loss of memory in Tbh mutants is not due to loss of octopamine, but increased energy levels that changes the reward properties of sucrose.

      (6) The manuscript is very difficult to follow. The authors constantly change between 16 and 40 hours starvation, short term memory, 3 hour memory and 6 hour memory. I think it would have been better to have a more focused manuscript. However, if this is not possible, I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other. Also, perhaps add to each figure a panel describing exactly the experimental conditions. I think also simplifying the text and adding more conclusions throughout the results section will help the readers to follow. Finally, I think that it would help understanding the conclusions if the authors can add a diagram of the flow that they think occurs. For example, the authors show that glycogen suppresses learning as its reduction increases learning. They also show that InR activity receptor suppresses learning as its KD also increases learning. If I am not mistaken the link between the two is not straight forward (but I may be wrong here). A diagram of the flow would be very helpful.

      We prepared diagrams summarizing and explaining the results.

      Minor

      (1) I may not have understood correctly as I am not sure that I found Table S1.

      Also, there was no legend for Table S1.

      Nevertheless, if I understood correctly, the authors write that "Before the experiments, flies were tested to determine whether they perceived the odorants, preferred one odorant over other and responded to the reward similarly to ensure that the observed differences in behavior were not due to changes in odorant perception or sucrose sensitivity (Table S1)." However, according to the Table that I found it seems that following 40h starvation wt flies show preference to OCT whereas this does not occur for the mutant. Also, it seems that at 16h the mutant has a much higher preference to the odors than after 40h. This is a bit odd. I am also not sure what the balance value refers to. Finally, the mutant shows really low 2M sucrose preference after 40h. In general, this set of experiments requires a bit more explanation.

      I think it is better to show these experiments using graphs and add this to the supplementary figures.

      We clarified the experiments in the result section as follows and added an explanation to the material and method section. We tested the odorant acuity and sucrose preference for all genotypes used in the manuscript and added the data to the Table S1.

      “The flies of the different genotypes sensed the odorants and evaluated them as similar salient in comparison. This is important to a avoid a bias in the situation where flies have to choose between the two odorants after training. They also sensed sucrose. We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies.”

      (2) Line 129 should be Figure 1B

      Is corrected.

      (3) Line 133, Figure 1C, how can one explain the negative reinforcement? I can understand no reinforcement, but negative?

      The effect of glucose might be doses dependent. 0.15 M sucrose is a much closer to a realistic concentration found in fruits than 2 M sucrose and might therefore elicit aversion. When animals are starved enough they might find any food source attractive, even when the concentrations of sucrose is unrealistic.

      (4) Figure 1, why are the graphs different between panel B and C?

      Is corrected.

      (5) In Figure S1, are the TβhnM18 groups differ significantly from zero? I think they are, so better to state this somewhere. If not, the claims in lines 134-135 are not supported by the data.

      We added the significance and added the data to Figure 1.

      Figure S1 legend: there is no A panel. Also "below box blots" should be box plots.

      Thanks for pointing that out. We corrected it.

      (6) It is not clear what is the duration of starvation used in Figure 2A. I assume that 16h and sucrose 2M used were used, but I would state that explicitly.

      We added the information to the figure legends.

      (7) Figure 2A is missing a control of flies with both the driver and UAS shibirets at the permissive temperature.

      We added the controls to the supplement (Figure S1).

      (8) It seems to me that Figure 3B, in which the author state that "Only after 40 h of starvation did TβhnM18 mutants show a similar preference to control sucrose consumption" (line 198) is somewhat in contradiction to Table S1 in which I see Sucrose preference for wt 0.36 and for tbh 0.17. I think this comment arise because I did not understand Table S1 correctly, so please better explain.

      We rewrote this section.

      (9) In Figure 3C, consider not using std as this stands for standard deviation and may be confusing.

      We now use the term “food” instead of “std” and explained in the legend that food means standard fly food.

      We fixed this.

      (10) Please check the Supplementary Figures. I think Figures S2 and S3 are switched.

      We fixed this.

      (11) There is a mistake in Figure S3A. The right column should have another "+" sign.

      Thanks, we fixed this.

      (12) I am somewhat puzzled by Figures 4 and 5. If I understand correctly figure 4B w1118 mef2-G4 is exactly the same experiment as Figure 5A w1118 mef2-G4 and yet in Figure 4B performance index is 0.2 and in Figure 5A about 0.4. According to other comparisons it seems to me that these will be significantly different and yet it is the same experiment.

      They are two independent experiments done at different times. The controls were independently repeated.

      (13) Line 273 should be Figure 5C.

      Is corrected.

      (14) I don't think this is a correct sentence "Virgin females remembered sucrose significantly better than mated females." Line 274.

      Reads now:

      “Virgin females remembered the odorant paired with sucrose significantly better than mated females.”

      (15) Line 340 there is no Figure 1E

      Is fixed (1 C)

      (16) The data excel file is difficult to follow. In Figure 2 there are references to Figure 5. The graphs are pointing to other files. Text is not always in English. It is not clear what W stands for. I recommend making it more accessible.

      We corrected the data excel files.

      (17) The manuscript is difficult to follow. I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other.

      We improved the data presentation by

      a) adding a model showing the kinetics of memory formation in controls and mutants (Figure 2C)

      b) a model explaining how the internal state is integrated into the formation of memory (Figure 7D).

    1. It is there-fore to be expected that the initial cost of the card system is nota fair criterion of its cost when in working order.

      Setting up and learning a note taking or card index system has a reasonably large up-front cost, but learning it well and being able to rely on it over long periods of time will eventually reap larger and cheaper long-term outcomes and benefits.

      Unless changing systems creates dramatically larger improvements, the cost of change will surely swamp the benefits making the switch useless. This advice given by Kaiser is still as true today as it was in 1908, we tend not to think about the efficiency as much now as he may have then however and fall trap to shiny object syndrome.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study reports important evidence that infants' internal factors guide children's attention and that caregivers respond to infants' attentional shifts during caregiver-infant interactions. The authors analyzed EEG data and multiple types of behaviors using solid methodologies that can guide future studies of neural responses during social interaction in infants. However, the analysis is incomplete, as several methodological choices need more adequate justification.

      Reviewer #1

      Public Review:

      The authors bring together multiple study methods (brain recordings with EEG and behavioral coding of infant and caregiver looking, and caregiver vocal changes) to understand social processes involved in infant attention. They test different hypotheses on whether caregivers scaffold attention by structuring a child's behavior, versus whether the child's attention is guided by internal factors and caregivers then respond to infants' attentional shifts. They conclude that internal processes (as measured by brain activation preceding looking) control infants' attention, and that caregivers rapidly modify their behaviors in response to changes in infant attention.

      The study is meticulously documented, with cutting-edge analytic approaches to testing alternative models; this type of work provides a careful and well-documented guide for how to conduct studies and process and analyze data for researchers in the relatively new area of neural response in infants in social contexts.

      We are very pleased that R1 considers our work an important contribution to this developing field, and we hope that we have now addressed their concerns below.

      Some concerns arise around the use of terms (for example, an infant may "look" at an object, but that does not mean the infant is actually "attending); collapsing of different types of looks (to people and objects), and the averaging of data across infants that may mask some of the individual patterns.

      We thank the reviewer for this feedback and their related comments below, and we feel that our manuscript is much stronger as a result of the changes we have made. Please see blow for a detailed description of our rationale for defining and analysing the attention data, as well as the textual changes made in response to the author’s comments.

      Recommendations For The Authors

      This paper is rigorous in method, theoretically grounded, and makes an important contribution to understanding processes of infant attention, brain activity, and the reciprocal temporal features of caregiver-infant interactions. The alternative hypothesis approach sets up the questions well (although authors should temper any wording that suggests attention processes are one or the other. That is, certain bouts of infant attention can be guided by exogenous factors such as social input, and others be endogenous; so averaging across all bouts can actually mask the variation in these patterns). I appreciated the focus on multiple types of behavior (e.g., gaze, vocal fluctuations in maternal speech); the emphasis on contingent responding; and the very clear summaries of takeaways after each section. Furthermore, methods and analyses are well described, details on data processing and so on are very thorough, and visualizations aptly facilitate data interpretation. However, I am not an expert on infant neural responses in EEG and assume that a reviewer with such expertise will weigh in on the treatment and quality of the data; therefore, my comments should be interpreted in light of this lack of knowledge.

      We thank R1 for these very positive and insightful comments on our analyses which are the result of a number of years of methodological and technical developmental work.

      We do agree with R1 that we should more carefully word parts of our argument in the Introduction to make clear the fact that shifts in infant attention could be driven by a combination of interactive and endogenous influences. As a result of this comment, we have made direct changes to parts of the Introduction; removing any wording that suggests that these processes are ‘alternative’ or ‘separate’, and our overall aim states: ‘Here, recording EEG from infants during naturalistic interactions with their caregiver, we examined the (inter)-dependent influences of infants’ endogenous oscillatory neural activity, and inter-dyadic behavioural contingencies in organising infant attention’.

      Examining variability between infant attention episodes in the factors that influence the length and timing of the attention episode is an important area for future investigation. We now include a discussion on this on page 38 of the Discussion section, with suggestions for how this could be examined. Investigating different subtypes of infant attention is methodologically challenging, given the number of infant behaviours that would need to inform such an analysis- all of which are time consuming to code. Developing automated methods for performing these kinds of analyses is an important avenue for future work.

      Here, I review various issues that require revision or elaboration based on my reading of what I consider to otherwise be a solid and important research paper.

      Problem in the use of the term attention scaffolding. Although there may be literature precedent in the use of this term, it is problematic to narrowly define scaffolding as mother-initiated guidance of attention. A mother who responds to infant behaviors, but expands on the topic or supports continued attention, and so on, is scaffolding learning to a higher level. I would think about a different term because it currently implies a caregiver as either scaffolding OR responding contingently. It is not an either-or situation in conceptual meaning. In fact, research on social contingency (or contingent responsiveness), often views the follow-in responding as a way to scaffold learning in an infant.

      Yes, we agree with R1 that the term ‘attention scaffolding’ could be confusing given the use of this term in previous work conducted with children and their caregivers in problem-solving tasks, that emphasise modulations in caregiver behaviour as a function of infant behaviour. As a result of this suggestion, we have made direct edits to the text throughout, replacing the term attentional scaffold with terms such as ‘organise’ and ‘structure’ in relation to the caregiver-leading or ‘didactic’ perspective, and terms such as ‘contingent responding’ and ‘dynamic modulation’ in relation to the caregiver-following perspective. We feel that this has much improved the clarity of the argument in the Introduction and Discussion sections.

      Do individual data support the group average trends? My concern with unobservable (by definition) is that EEG data averages may mask what's going on in individual brain response. Effects appear to be small as well, which occurs in such conditions of averaging across perhaps very variable response patterns. In the interest of full transparency and open science, how many infants show the type of pattern revealed by the average graph (e.g., do neural markers of infant engagement forward predict attention for all babies? Majority?). Non-parametric tests on how many babies show a claimed pattern would offer the litmus test of significance on whether the phenomenon is robust across infants or pulled by a few infants with certain patterns of data. Ditto for all data. This would bolster my confidence in the summaries of what is going on in the infant brain. (The same applies as I suggest to attention bouts. To what extent does the forward-predict or backward-predict pattern work for all bouts, only some bouts, etc.?). I recognize that to obtain power, summaries are needed across infants and bouts, but I want to know if what's being observed is systematic.

      We thank R1 for this comment and understand their concern that the overall pattern of findings reported in relation to the infants’ EEG data might obscure inter-individual variability in the associations between attention and theta power. Averaging across individual participant EEG responses is, however, the gold standard way to perform both event-locked (Jones et al., 2020) and continuous methods (Attaheri et al., 2020) of EEG analysis that are reported in the current manuscript. EEG data, and, in particular, naturalistic EEG data is inherently noisy, and averaging across participants increases the signal to noise ratio (i.e. inconsistent, and, therefore, non-task-related activity is averaged out of the response (Cohen, 2014; Noreika et al., 2020)). Examining individual EEG responses is unlikely to tell us anything meaningful, given that, if a response is not found for a particular participant, then it could be that the response is not present for that participant, or that it is present, but the EEG recording for that participant is too noisy to show the effect. Computing group-level effects, as is most common in all neuroimaging analyses, is, therefore, most optimal to examining our main research questions.

      The findings reported in this analysis also replicate previous work conducted by our lab which showed that infant attention to objects significantly forward-predicted increases in infant theta activity during joint table-top play with their caregiver, involving one toy object (compared to our paradigm which involved 3;Wass et al., 2018). More recent work conducted by our lab has also shown continuous and time-locked associations between infant look durations and infant theta activity when infants play with objects on their own (Perapoch Amadó et al., 2023). To reassure readers of the replicability of the current findings, we now reference the Wass et al. (2018) study at the beginning of the Discussion section.

      Could activity artifacts lead to certain reported trends? Babies typically look at an object before they touch or manipulate the object, and so longer bouts of attention likely involve a look and then a touch for lengthier time frames. If active involvement with an object (touching for example) amplifies theta activity, that may explain why attention duration forward predicts theta power. That is, baby looks, then touches, then theta activates, and coding would show visual gaze preceding the theta activation. Careful alignment of infants' touches and other such behaviors with the theta peak might help address this question, again to lend confidence to the robustness of the interpretation.

      Yes, again this is a very important point, and the removal of movement-related artifact is something we have given careful attention to in the analysis of our naturalistic EEG data (Georgieva et al., 2020; Marriott Haresign et al., 2021). As a result of this comment we have made direct changes to the Results section on page 18 to more clearly signal the reader to our EEG pre-processing section before presenting the results of the cross-correlation analyses.

      As we describe in the Methods section of the main text, movement-related artifacts are removed from the data with ICA decomposition, utilising an automatic-rejection algorithm, specially designed for work with our naturalistic EEG data (Marriott Haresign et al., 2021). Given that ICA rejection does not remove all artifact introduced to the EEG signal, additional analysis steps were taken to reduce the possibility that movement artifacts influenced the results of the reported analyses. As explained in the Methods section, rather than absolute theta power, relative theta was used in all EEG analyses, computed by dividing the power at each theta frequency by the summed power across all frequencies. Eye and head movement-related artifacts most often associate with broadband increases in power in the EEG signal (Cohen, 2014): computing relative theta activity therefore further reduces the potential influence of artifact on the EEG signal.

      It is also important to highlight that previous work examining movement artifacts in controlled paradigms with infants has shown that limb movements actually associate with a decrease in power at theta frequencies, compared to rest (Georgieva et al., 2020). It is therefore unlikely that limb movement artifacts explain the pattern of association observed between theta power and infant attention in the current study.

      That said, examining the association between body movements and fluctuations in EEG activity during naturalistic interactions is an important next step, and something our lab is currently working on. Given that touching an object is most often the end-state of a larger body movement, aligning the EEG signal to the onset of infant touch is not all that informative to understanding how body movements associate with increases and decreases in power in the EEG signal. Our lab is currently working on developing new methods using motion tracking software and arousal composites to understand how data-derived behavioural sub-types associate with differential patterns of EEG activity.

      The term attention may be misleading. The behavior being examined is infant gaze or looks, with the assumption that gaze is a marker of "attention". The authors are aware that gaze can be a blank stare that doesn't reflect underlying true "attention". I recommend substitution of a conservative, more precise term that captures the variable being measured (gaze); it would then be fine to state that in their interpretation, gaze taken as a marker for attention or something like that. At minimum, using term "visual attention" can be a solution if authors do not want to use the precise term gaze. As an example, the sentence "An attention episode was defined as a discrete period of attention towards one of the play objects on the table, or to the partner" should be modified to defined as looking at a play object or partner.

      We thank the reviewer for this comment, and we understand their concern with the use of the term ‘attention’ where we are referring to shifts in infant eye gaze. However, the use of this term to describe patterns of infant gaze, irrespective of whether they are ‘actually attending’ or not is used widely in the literature, in both interactive (e.g. Yu et al., 2021) and screen-based experiments examining infant attention (Richards, 2010). We therefore feel that its use in our current manuscript is acceptable and consistent with the reporting of similar interaction findings. On page 39 of the Discussion we now also include a discussion on how future research might further investigate differential subtypes of infant looks to distinguish between moments where infants are attending vs. just looking.

      Why collapse across gaze to object vs. other? Conceptually, it's unclear why the same hypotheses and research questions on neural-attention (i.e., gaze in actuality) links would apply to looks to a mom's face or to an object. Some rationale would be useful to the reader as to why these two distinct behaviors are taken as following the same principles in ordering of brain and behavior. Perhaps I missed something, however, because later in the Discussion the authors state that "fluctuations in neural markers of infants' engagement or interest forward-predict their attentiveness towards objects", which suggests there was an object-focused variable only? Please clarify. (Again, sorry if I missed something).

      This is a really important point, and we agree with R1 that it could have been more clearly expressed in our original submission – for which, we apologise. In the cross-correlation analyses conducted in parts 2 and 3 which examines forwards-predictive associations between infant attention durations and infant endogenous oscillatory activity (part two), and caregiver behaviour (part three), as R1 describes, we include all infant looks towards objects and their partner. Including all infant look types is necessary to produce a continuous variable to cross-correlate with the other continuous variables (e.g. theta activity, caregiver vocal behaviours), and, therefore, does not concentrate only on infant attention episodes towards objects.

      We take the reviewers’ point that different attention and neural mechanisms may be associated with looks towards objects vs. the partner, which we now acknowledge directly on page 10 of the Introduction. However, our focus here is on the endogenous and interactive mechanisms that drive fluctuations in infant engagement with the ongoing, free-flowing interaction. Indeed, previous work has shown increases in theta activity during sustained episodes of infant attention to a range of different stimuli, including cartoon videos (Xie et al., 2018), real-life screen-based interactions (Jones et al., 2020), as well as objects (Begus et al., 2016). In the second half of part 2, we go on to address the endogenous processes that support infant attention episodes specifically towards objects.

      As a result of this comment, we have made direct changes to the Introduction on page 10 to more clearly explain the looking behaviours included in the cross-correlation analysis, and the rationale behind the analysis being conducted in this way – which is different to the reactive analyses conducted in the second half of parts one and three, which examines infant object looks only. Direct edits to the text have also been made throughout the Results and Methods sections as a result of this comment, to more clearly specify the types of looks included in each analysis. Now, where we discuss the cross-correlation analyses we refer only to infant ‘attention durations’ or infant ‘attention’, whilst ‘object-directed attention’ and ‘looks towards objects’ is clearly specified in sections discussing the reactive analyses conducted in parts 2 and 3. We have also amended the Discussion on page 31so that the cross-correlation analyses is interpreted relative to infant overall attention, rather than their attention towards objects only.

      Why are mothers' gazes shorter than infants' gazes? This was the flip of what I'd expect, so some interpretation would be useful to understanding the data.

      This is a really interesting observation. Our findings of the looking behaviour of caregivers and infants in our joint play interactions actually correspond to much previous micro-dynamic analysis of caregiver and infant looking behaviour during early table-top interactions (Abney et al., 2017; Perapoch Amadó et al., 2023; Yu & Smith, 2013, 2016). The reason for the shorter look durations in the adult is due to the fact that the caregivers alternate their gaze between their infant and the objects (i.e. they spend a lot of the interaction time monitoring their infants’ behaviours). This can be seen in Figure 2 (see main text) which shows that caregiver looks are divided between looks to their infants and looks towards objects. In comparison, infants spend most of their time focussing on objects (see Figure 2, main text), with relatively infrequent looks to their caregiver. As a result, infant looks are, overall, longer in comparison to their caregivers’.

      Minor points

      Use the term association or relation (relationships is for interpersonal relationships, not in statistics).

      This has now been amended throughout.

      I'm unsure I'd call the interactions "naturalistic" when they occur at a table, with select toys, EEG caps on partners, and so on. The term seems more appropriate for studies with fewer constraints that occur (for example) in a home environment, etc.

      We understand R1s concern with our use of the term ‘naturalistic’ to refer to the joint play interactions that we analyse in the current study. However, we feel the term is appropriate, given that the interactions are unstructured: the only instruction given to caregivers at the beginning of the interaction is to play with their infants in the way that they might do at home. The interactions, therefore, measure free-flowing caregiver and infant behaviours, where modulations in each individual’s behaviour are the result of the intra- and inter-individual dynamics of the social exchange. This is in comparison to previous work on early infant attention development which has used more structured designs, and modulations in infant behaviour occur as a result of the parameters of the experimental task.

      Reviewer #2

      Public Review

      Summary:

      This paper acknowledges that most development occurs in social contexts, with other social partners. The authors put forth two main frameworks of how development occurs within a social interaction with a caregiver. The first is that although social interaction with mature partners is somewhat bi-directional, mature social partners exogenously influence infant behaviors and attention through "attentional scaffolding", and that in this case infant attention is reactive to caregiver behavior. The second framework posits that caregivers support and guide infant attention by contingently responding to reorientations in infant behavior, thus caregiver behaviors are reactive to infant behavior. The aim of this paper is to use moment-to-moment analysis techniques to understand the directionality of dyadic interaction. It is difficult to determine whether the authors prove their point as the results are not clearly explained as is the motivation for the chosen methods.

      Strengths

      The question driving this study is interesting and a genuine gap in the literature. Almost all development occurs in the presence of a mature social partner. While it is known that these interactions are critical for development, the directionality of how these interactions unfold in real-time is less known.

      The analyses largely seem to be appropriate for the question at hand, capturing small moment-to-moment dynamics in both infant and child behavior, and their relationships with themselves and each other. Autocorrelations and cross-correlations are powerful tools that can uncover small but meaningful patterns in data that may not be uncovered with other more discretized analyses (i.e. regression).

      We are pleased that R2 finds our work to be an interesting contribution to the field, which utilises appropriate analysis techniques.

      Weaknesses

      The major weakness of this paper is that the reader is assumed to understand why these results lead to their claimed findings. The authors need to describe more carefully their reasoning and justification for their analyses and what they hope to show. While a handful of experts would understand why autocorrelations and cross-correlations should be used, they are by no means basic analyses. It would also be helpful to use simulated data or even a simple figure to help the reader more easily understand what a significant result looks like versus an insignificant result.

      We thank the reviewer for this comment, and we agree that much more detail should be added to the Introduction section. As a result of this comment, we have made direct changes to the Introduction on pages 9-11 to more clearly detail these analysis methods, our rationale for using these methods; and how we expect the results to further our understanding of the drivers of infant attention in naturalistic social interactions.

      We also provide a figure in the SM (Fig. S6) to help the reader more clearly understand the permutation method used in our statistical analyses described in the Methods, on page 51, which depicts significant vs. insignificant patterns of results against their permutation distribution.

      While the overall question is interesting the introduction does not properly set up the rest of the paper. The authors spend a lot of time talking about oscillatory patterns in general but leave very little discussion to the fact they are using EEG to measure these patterns. The justification for using EEG is also not very well developed. Why did the authors single out fronto-temporal channels instead of using whole brain techniques, which are more standard in the field? This is idiosyncratic and not common.

      We very much agree with R2 that the rationale and justification for using EEG to understand the processes that influence infants’ attention patterns is under-developed in the current manuscript. As a result of this comment we have made direct edits to the Introduction section of the main text on pages 7-8 to more clearly describe the rationale for examining the relationship between infant EEG activity and their attention during the play interactions with their caregivers.

      As we describe in the Introduction section, previous behavioural work conducted with infants has suggested that endogenous cognitive processes (i.e. fluctuations in top-down cognitive control) might be important in explaining how infants allocate their attention during free-flowing, naturalistic interactions towards the end of the first year. Oscillatory neural activity occurring at theta frequencies (3-6Hz), which can be measured with EEG, has previously been associated with top-down intrinsically guided attentional processes in both adulthood and infancy (Jones et al., 2020; Orekhova, 1999; Xie et al., 2018). Measuring fluctuations in infant theta activity therefore provides a method to examine how endogenous cognitive processes structure infant attention in naturalistic social interactions which might be otherwise unobservable behaviourally.

      It is important to note that the Introduction distinguishes between two different oscillatory mechanisms that could possibly explain the organisation of infant attention over the course of the interaction. The first refers to oscillatory patterns of attention, that is, consistent attention durations produced by infants that likely reflect automatic, regulatory functions, related to fluctuations in infant arousal. The second mechanism is oscillatory neural activity occurring at theta frequencies, recorded with EEG, which, as mentioned above, is thought to reflect fluctuations in intrinsically guided attention in early infancy. We have amended the Introduction to make the distinction between the two more clear.

      A worrisome weakness is that the figures are not consistently formatted. The y-axes are not consistent within figures making the data difficult to compare and interpret. Labels are also not consistent and very often the text size is way too small making reading the axes difficult. This is a noticeable lack of attention to detail.

      This has now been adjusted throughout, where appropriate.

      No data is provided to reproduce the figures. This does not need to include the original videos but rather the processed and de-identified data used to generate the figures. Providing the data to support reproducibility is increasingly common in the field of developmental science and the authors are greatly encouraged to do so.

      This will be provided with the final manuscript.

      Minor Weaknesses

      Figure 4, how is the pattern in a not significant while in b a very similar pattern with the same magnitude of change is? This seems like a spurious result.

      The statistical analysis conducted for all cross-correlation analyses reported follows a rigorous and stringent permutation-based temporal clustering method which controls for family-wise error rate using a non-parametric Monte Carlo method (see Methods in the main text for more detail). Permutations are created by shuffling data sets between participants and, therefore, patterns of significance identified by the cluster-based permutation analysis will depend on the mean and standard deviation of the cross-correlations in the permutation distribution. Fig. S6 now depicts the cross-correlations against their permutation distributions which should help readers to understand the patterns of significance reported in the main text.

      The correlations appear very weak in Figures 3b, 5a, 7e. Despite a linear mixed effects model showing a relationship, it is difficult to believe looking at the data. Both the Spearman and Pearson correlations for these plots should be clearly included in the text, figure, or figure legend.

      We thank the reviewer for this comment, and agree that reporting the correlations for these plots would strengthen the findings of the linear mixed effects models reported in text. As a result, we have added both Spearman and Pearson correlations to the legends of Figures 3b, 5a and 7e, corresponding to the statistically significant relationships examined in the linear mixed effects models. The strength of the relationships are entirely consistent with those documented in other previous research that used similar methods (e.g. Piazza et al., 2018). How strong the relationship looks to the observer is entirely dependent on the graphical representation chosen to represent it. We have chosen to present the data in this way because we feel that it is the most honest way to represent the statistically significant, and very carefully analysed, effects that we have observed in our data.

      Linear mixed effects models need more detail. Why were they built the way they were built? I would have appreciated seeing multiple models in the supplementary methods and a reasoning to have landed on one. There are multiple ways I can see this model being built (especially with the addition of a random intercept). Also, there are methods to test significance between models and aid in selection. That being said, although participant identity is a very common random effect, its use should be clearly stated in the main text.

      We very much agree with R2 that the reporting of the linear mixed effects models needs more detail and this has now been added to the Method section (page 54). Whilst it is true that there are multiple ways in which this model could be built, given the specificity of our research questions, regarding the reactive changes in infant theta activity and caregiver behaviours that occur after infant look onsets towards objects (see pages 9-11 of the Introduction), we take a hypothesis driven approach to building the linear mixed effects models. As a result, random intercepts are specified for participants, as well as uncorrelated by-participant random slopes (Brown, 2021; Gelman & Hill, 2006; Suarez-Rivera et al., 2019). In this way, infant look durations are predicted from caregiver behaviours (or infant theta activity), controlling for between participant variability in look durations, as well as the strength of the effect of caregiver behaviours (or infant theta activity) on infant look durations.

      Some parentheses aren't closed, a more careful re-reading focusing on these minor textual issues is warranted.

      This has now been corrected.

      Analysis of F0 seems unnecessarily complex. Is there a reason for this?

      Computation of the continuous caregiver F0 variable may seem complex but we feel that all analysis steps are necessary to accurately and reliably compute this variable in our naturalistic, noisy and free-flowing interaction data. For example, we place the F0 only into segments of the interaction identified as the mum speaking so that background noises and infant vocalisations are not included in the continuous variable. We then interpolate through unvoiced segments (similar to Räsänen et al., 2018), and compute the derivative in 1000ms intervals as a measure of the rate of change. The steps taken to compute this variable have been both carefully and thoughtfully selected given the many ways in which this continuous rate of change variable could be computed (cf. Piazza et al., 2018; Räsänen et al., 2018).

      The choice of a 20hz filter seems odd when an example of toy clacks is given. Toy clacks are much higher than 20hz, and a 20hz filter probably wouldn't do anything against toy clacks given that the authors already set floor and ceiling parameters of 75-600Hz in their F0 extraction.

      We thank the reviewer for this comment and we can see that this part of the description of the F0 computation is confusing. A 20Hz low pass filter is applied to the data stream after extracting the F0 with floor and ceiling parameters set between 75-600Hz. The 20Hz filter therefore filters modulations in the caregivers’ F0 that occur at a modulation frequency greater than 20Hz. The 20Hz filter does not, therefore, refer to the spectral filtering of the speech signal. The description of this variable has been rephrased on page 48 of the main text.

      Linear interpolation is a choice I would not have made. Where there is no data, there is no data. It feels inappropriate to assume that the data in between is simply a linear interpolation of surrounding points.

      The choice to interpolate where there was no data was something we considered in a lot of detail, given the many options for dealing with missing data points in this analysis, and the difficulties involved with extracting a continuous F0 variable in our naturalistic data sets. As R2 points out, one option would be to set data points to NaN values where no F0 is detected and/ or the Mum is not vocalising. A second option, however, would be to set the continuous variable to 0s where no F0 is detected and/ or the Mum is not vocalising (where the mum is not producing sound there is no F0 so rather than setting the variable to missing data points, really it makes most objective sense to set to 0).

      Either of these options (setting parts where no F0 is detected to NaN or 0) makes it difficult to then meaningfully compute the rate of change in F0: where NaN values are inserted, this reduces the number of data points in each time window; where 0s are inserted this creates large and unreal changes in F0. Inserting NaN values into the continuous variable also reduces the number of data points included in the cross-correlation and event-locked analyses. It is important to note that, in our naturalistic interactions, caregivers’ vocal patterns are characterised by lots of short vocalisations interspersed by short pauses (Phillips et al., in prep), similar to previous findings in naturalistic settings (Gratier et al., 2015). Interpolation will, therefore, have largely interpolated through the small pauses in the caregiver’s vocalisations.

      The only limitation listed was related to the demographics of the sample, namely saying that middle class moms in east London. Given that the demographics of London, even east London are quite varied, it's disappointing their sample does not reflect the community they are in.

      Yes we very much agree with R2 that the lack of inclusion of caregivers from wider demographic backgrounds is disappointing, and something which is often a problem in developmental research. Our lab is currently working to collect similar data from infants with a family history of ADHD, as part of a longitudinal, ongoing project, involving families from across the UK, from much more varied demographic backgrounds. We hope that the findings reported here will feed directly into the work conducted as part of this new project.

      That said, demographic table of the subjects included in this study should be added.

      This is now included in the SM, and referenced in the main text.

      References

      Abney, D. H., Warlaumont, A. S., Oller, D. K., Wallot, S., & Kello, C. T. (2017). Multiple Coordination Patterns in Infant and Adult Vocalizations. Infancy, 22(4), 514–539. https://doi.org/10.1111/infa.12165

      Attaheri, A., Choisdealbha, Á. N., Di Liberto, G. M., Rocha, S., Brusini, P., Mead, N., Olawole-Scott, H., Boutris, P., Gibbon, S., Williams, I., Grey, C., Flanagan, S., & Goswami, U. (2020). Delta- and theta-band cortical tracking and phase-amplitude coupling to sung speech by infants [Preprint]. Neuroscience. https://doi.org/10.1101/2020.10.12.329326

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants’ preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397–12402. https://doi.org/10.1073/pnas.1603261113

      Brown, V. A. (2021). An Introduction to Linear Mixed-Effects Modeling in R.

      Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice. The MIT Press.

      Gelman, A., & Hill, J. (2006). In Data Analysis using Regression and mulilevel/Hierachical Models. Cambridge University Press.

      Georgieva, S., Lester, S., Noreika, V., Yilmaz, M. N., Wass, S., & Leong, V. (2020). Toward the Understanding of Topographical and Spectral Signatures of Infant Movement Artifacts in Naturalistic EEG. Frontiers in Neuroscience, 14, 352. https://doi.org/10.3389/fnins.2020.00352

      Gratier, M., Devouche, E., Guellai, B., Infanti, R., Yilmaz, E., & Parlato-Oliveira, E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01167

      Jones, E. J. H., Goodwin, A., Orekhova, E., Charman, T., Dawson, G., Webb, S. J., & Johnson, M. H. (2020). Infant EEG theta modulation predicts childhood intelligence. Scientific Reports, 10(1), 11232. https://doi.org/10.1038/s41598-020-67687-y

      Marriott Haresign, I., Phillips, E., Whitehorn, M., Noreika, V., Jones, E. J. H., Leong, V., & Wass, S. V. (2021). Automatic classification of ICA components from infant EEG using MARA. Developmental Cognitive Neuroscience, 52, 101024. https://doi.org/10.1016/j.dcn.2021.101024

      Noreika, V., Georgieva, S., Wass, S., & Leong, V. (2020). 14 challenges and their solutions for conducting social neuroscience and longitudinal EEG research with infants. Infant Behavior and Development, 58, 101393. https://doi.org/10.1016/j.infbeh.2019.101393

      Orekhova, E. (1999). Theta synchronization during sustained anticipatory attention in infants over the second half of the first year of life. International Journal of Psychophysiology, 32(2), 151–172. https://doi.org/10.1016/S0167-8760(99)00011-2

      Perapoch Amadó, M., Greenwood, E., James, Labendzki, P., Haresign, I. M., Northrop, T., Phillips, E., Viswanathan, N., Whitehorn, M., Jones, E. J. H., & Wass, S. (2023). Naturalistic attention transitions from subcortical to cortical control during infancy. [Preprint]. Open Science Framework. https://doi.org/10.31219/osf.io/6z27a

      Piazza, E. A., Hasenfratz, L., Hasson, U., & Lew-Williams, C. (2018). Infant and adult brains are coupled to the dynamics of natural communication [Preprint]. Neuroscience. https://doi.org/10.1101/359810

      Räsänen, O., Kakouros, S., & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206. https://doi.org/10.1016/j.cognition.2018.05.015

      Richards, J. E. (2010). The development of attention to simple and complex visual stimuli in infants: Behavioral and psychophysiological measures. Developmental Review, 30(2), 203–219. https://doi.org/10.1016/j.dr.2010.03.005

      Suarez-Rivera, C., Smith, L. B., & Yu, C. (2019). Multimodal parent behaviors within joint attention support sustained attention in infants. Developmental Psychology, 55(1), 96–109. https://doi.org/10.1037/dev0000628

      Wass, S. V., Noreika, V., Georgieva, S., Clackson, K., Brightman, L., Nutbrown, R., Covarrubias, L. S., & Leong, V. (2018). Parental neural responsivity to infants’ visual attention: How mature brains influence immature brains during social interaction. PLOS Biology, 16(12), e2006328. https://doi.org/10.1371/journal.pbio.2006328

      Xie, W., Mallin, B. M., & Richards, J. E. (2018). Development of infant sustained attention and its relation to EEG oscillations: An EEG and cortical source analysis study. Developmental Science, 21(3), e12562. https://doi.org/10.1111/desc.12562

      Yu, C., & Smith, L. B. (2013). Joint Attention without Gaze Following: Human Infants and Their Parents Coordinate Visual Attention to Objects through Eye-Hand Coordination. PLoS ONE, 8(11), e79659. https://doi.org/10.1371/journal.pone.0079659

      Yu, C., & Smith, L. B. (2016). The Social Origins of Sustained Attention in One-Year-Old Human Infants. Current Biology, 26(9), 1235–1240. https://doi.org/10.1016/j.cub.2016.03.026

      Yu, C., Zhang, Y., Slone, L. K., & Smith, L. B. (2021). The infant’s view redefines the problem of referential uncertainty in early word learning. Proceedings of the National Academy of Sciences, 118(52), e2107019118. https://doi.org/10.1073/pnas.2107019118

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their appreciation of our study and thoughtful comments. In response to the main concern raised by all reviewers regarding the potential influences of external noise factors on intuitive inference, such as external disturbances or imperfect observations, we have conducted three new experiments suggested by the reviewers. These experiments were designed to: (1) assess the influence of external forces on humans’ judgments by implementing a wall to block wind disturbances from one direction, (2) examine human accuracy in predicting the landing position of a falling ball when its trajectory is obscured, and (3) evaluate the effect of object geometry on human judgment of stability. The findings from these experiments consistently support our proposal of the stochastic world model on gravity embedded in human mind. Besides, we have also addressed the rest comments from the reviewers in a one-by-one fashion.

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review, I did not find it entirely convincing that the study shows evidence for a Gaussian understanding of gravity. There are two studies that would bolster this claim: 1. Replicate experiment 1, but also ask people to infer whether there was a hidden force. If people are truly representing gravity as proposed in the paper, you should get no force inferences. However, if the reason the Gaussian gravity model works is that people infer unseen forces, this should come out clearly in this study.

      Author response image 1.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R1: We thank the reviewer for this suggestion. To directly test whether participants’ judgments were influenced by their implicit assumptions about external forces, we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). Before the start of the experiment, we explicitly informed the participants that the wall was designed to block wind, ensuring that any potential wind forces from the direction of the wall would not influence the collapse. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants tested (1 female; ages: 24-30), similar to the experiment without the wall (Supplementary Figure 4B). Therefore, the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, not shaped by external forces or explicit instructions.

      This new experiment has been added to the revised manuscript

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (2) Similarly, you can imagine a simple study where you drop an object behind a floating occluder and you check where people produce an anticipatory fixation (i.e., where do they think the object will come out?). If people have a stochastic representation of gravity, this should be reflected in their fixations. But my guess is that everyone will look straight down.

      Author response image 2.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      R2: We thank the reviewer for suggesting this thought experiment. However, when predicting the landing point of a falling object, participants may rely more on learned knowledge that an unimpeded object continues to fall in a straight line, rather than drawing on their intuitive physics. To avoid this potential confounding factor, we designed a similar experiment where participants were asked to predict the landing point of a parabolic trajectory, obscured by an occluder (Author response image 2A). In each trial, participants used a mouse (clicking the left button) to predict the landing point of each parabolic trajectory, and there were 100 trials in total. This design not only limits the impact of direct visual cues but also actively engages the mental simulation of intuitive physics. All three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      (3) I believe the correct alternative model should be the one that has uncertainty over unseen forces, which better captures current proposals in the field, and controls for the amount of uncertainty in the models.

      R3: We thank the reviewers for the above-mentioned suggestions, and the findings from these two new experiments reinforce our proposal regarding the inherent stochastic characteristic of how the mind represents gravity.

      (4) I was not convinced that the RL framework was set up correctly to tackle the questions it claims to tackle. What this shows is that you can evolve a world model with Gaussian gravity in a setup that has no external perturbations. That does not imply that that is how humans evolved their intuitive physics, particularly when creatures have evolved in a world full of external perturbations. Showing that when (1) there are hidden perturbations, and (2) these perturbations are learnable, but (3) the model nonetheless just learns stochastic gravity, would be a more convincing result.

      R4: We completely agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity. In fact, introducing additional external noise into the RL framework likely heightens the uncertainty in learning gravity’s direction, potentially amplifying, rather than diminishing, the stochastic nature of mental gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) Some comments on the writing:

      The word 'normality' is used to refer to people's judgments about whether a tower collapsed looked 'normal'. I was a bit confused by this because normality can also mean 'Gaussian' and the experiments are also sampling from Gaussian distributions. There were several points where it took me a second to figure out which sense of 'normality' the paper was using. I would recommend using a different term.

      R5: We are sorry for the confusion. In revision, the term “normality” has been replaced with “confidence level about normal trajectory”.

      (6) One small comment is that Newton's laws are not a faithful replica of the "physical laws of the world" they are a useful simplification that only works at certain timescales. I believe some people propose Newtonian physics as a model of intuitive physics in part because it is a rapid and useful approximation of complex physical systems, and not because it is an untested assumption of perfect correspondence.

      R6: We are sorry for the inaccurate expression. We have revised our statements in the manuscript Line 15-16: “We found that the world model on gravity was not a faithful replica of the physical laws, but instead encoded gravity’s vertical direction as a Gaussian distribution.”

      (7) Line 49-50: Based on Fig 1d, lower bound of possible configurations for 10 blocks is ~17 in log-space, which is about 2.5e7. But the line here says it's 3.72e19, which is much larger. Sorry if I am missing something.

      R7: We thank the reviewer to point out this error. We re-calculated the number of possible configurations using the formula (3) in the appendix, and the number of configurations with 10 blocks is:

      Thus,

      This estimated number is much larger than that in our previous calculation, which has been corrected in the revised text.

      Line 827-829: “d) The lower bound of configurations’ possible number and the number of blocks in a stack followed an exponential relationship with a base of 10. The procedure can create at least 1.14×1050 configurations for stacks consisting of 10 blocks.”

      Line 49-50: “… but the universal cardinality of possible configurations is at least 1.14×1050 (Supplementary Figure 1), …”

      Line 1017-1018: “… the number of configurations can be estimated with formula (9), which is 1.14×1050.”

      (8) Lines 77-78: "A widely adopted but not rigorously tested assumption is that the world model in the brain is a faithful replica of the physical laws of the world." This risks sounding like you are asserting that colleagues in the field do not rigorously test their models. I think you meant to say that they did not 'directly test', rather than 'rigorously test'. If you meant rigorous, you might want to say more to justify why you think past work was not rigorous.

      R8: We apologize for the inappropriate wording, the sentence has been revised and we illustrate the motivation more comprehensively in the revised text,

      Line 76-92: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach.”

      (9) Lines 79-84 States that past models encode gravity downward. It then says that alternatively there is consensus that the brain uses data from sensory organs and adds meaning to them. I think there might be a grammatical error here because I did not follow why saying there is 'consensus' on something is a theoretical alternative. I also had trouble following why those two statements are in opposition. Is any work on physics engines claiming the brain does not take data from sensory organs and add meaning to them?

      R9: We are sorry for the confusion. Here we intend to contrast the deterministic model (i.e., the uncertainty comes from outside the model) with the stochastic model (i.e., the uncertainty is inherently built into the model). In revision, we have clarified the intention. For details, please see R8.

      (10) Lines 85-88: Following on the sentence above, you then conclude that the representation of the world may therefore not be the same as reality. I did not understand why this followed. It seems you are saying that, because the brain takes data from sensory organs, therefore its representations may differ from reality.

      R10: Again, we are sorry about the confusion. Please see the revised text in R8.

      (11) Lines 190-191: I had trouble understanding this sentence. I believe you are missing an adjective to clarify that participants were more inclined to judge taller stacks as more likely to collapse.

      R11: We are sorry for the confusion. What we intended to state here is that participants’ judgment was biased, showing a tendency to predict a collapse for stacks regardless of their actual stability. We have revised this confusing sentence in the revision. Line 202–204: “However, the participants showed an obvious bias towards predicting a collapse for stacks regardless of their actual stability, as the dots in Fig 2b are more concentrated on the lower side of the diagonal line.”

      (12) Line 201: I don't think it's accurate to say that MGS "perfectly captured participants' judgments" unless the results are actually perfect.

      R12: We agree, and in revision we have toned down the statement Line 213–214: “…, the MGS, in contrast to the NGS, more precisely reflected participants’ judgments of stability …”

      Reviewer #2 (Recommendations For The Authors):

      I think this is an impressive set of experiments and modeling work. The paper is nicely written and I appreciate the poetic license the authors took at places in the manuscript. I only have clarification points and suggest a simple experiment that could lend further support to their conclusions. 1. In my opinion, the impact of this work is twofold. First, the suggestion that gravity is represented as a distribution of the world and not a result of (inferred) external perturbations. Second, that the distribution is advantageous as it balances speed and accuracy, and lessens computational processing demands (i.e., number of simulations). The second point here is contingent on the first point, which is really only supported by the RL model and potentially the inverted scene condition. I am somewhat surprised that the RL model does not converge on a width much smaller than ~20 degrees after 100,000 simulations. From my understanding, it was provided feedback with collapses based on natural gravity (deterministically downward). Why is learning so slow and the width so large? Could it be the density of the simulated world model distribution? If the model distribution of Qs was too dense, then Q-learning would take forever. If the model distribution was too sparse, then its final estimate would hit a floor of precision. Could the authors provide more details on the distribution of the Qs for the RL model?

      Author response image 3.

      RL learning curves as a function of θ angle with different sampling densities and learning rates. Learning rates were adjusted to low (a), intermediate (b) and high (c) settings, while sampling densities were chosen at four levels: 5x5, 11x11, 31x31, and 61x61 shown from the left to the right. Two key observations emerged from the simulations as the reviewer predicted. First, higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances. Second, increased sampling density necessitated more iterations for convergence. Note that in all simulations, we limited the iterations to 1,000 times (as opposed to 100,000 times reported in the manuscript) to demonstrate the trend without excessive computational demands.

      R1: To illustrate the distribution of the Q-values for the RL model, we re-ran the RL model with various learning rates and sampling densities (Author response image 3). These results support the reviewer’s prediction that higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances, and increased sampling density requires more iterations for convergence.

      This simulation also elucidates the slower learning observed in the experiment described in the text, where the force sphere was divided into 61x61 angle pairs, and the learning rate was set to 0.15. This set of parameters ensured convergence within a reasonable brief timeframe while maintaining high-resolution force assessments.

      Besides, the width of the Gaussian distribution is mainly determined by the complexity of stacks. As shown in Figure 3c and Supplementary Figure 9, stacks with fewer blocks (i.e., less complex) caused a larger width, whereas those with more blocks resulted in a narrower spread. In the study, we used a collection of stacks varying from 2 to 15 blocks to simulate the range of stacks humans typically encounter in daily life.

      In revision, we have incorporated these insights suggested by the reviewer to clarify the performance of the RL framework:

      Line 634-639: “The angle density and learning rate are two factors that affect the learning speed. A larger angle density prolongs the time to reach convergence but enables a more detailed force space; a higher learning rate accelerates convergence but incurs larger variance during training. To balance speed and convergence, we utilized 100,000 configurations for the training.”

      Line 618-619: “…, separately divided them into 61 sampling angles across the spherical force space (i.e., the angle density).”

      (2) Along similar lines, the authors discuss the results of the inverted science condition as reflecting cognitive impenetrability. However, do they also interpret it as support for an intrinsically noisy distribution of gravity? I would be more convinced if they created a different scene that could have the possibility of affecting the direction of an (inferred) external perturbation - a previously held explanation of the noisy world model. For example, a relatively simple experiment would be to have a wall on one side of the scene such that an external perturbation would be unlikely to be inferred from that direction. In the external perturbation account, phi would then be affected resulting in a skewed distribution of angle pairs. However, in the authors' stochastic world model phi would remain unaffected resulting in the same uniform distribution of phi the authors observed. In my opinion, this would provide more compelling evidence for the stochastic world model.

      Author response image 4.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R2: We thank the reviewer for this suggestion. Following the reviewer’s concern, we designed the experiment with the addition of a wall implemented on one side (Supplementary figure 4A). We explicitly informed the participants that the wall was designed to block wind before the start of the experiment, ensuring no potential wind forces from the direction of the wall to influence the collapse trajectory of configurations. Participants need to judge if the trajectory was normal. If participants’ judgments were influenced by external noises, we would expect to observe a skewed angle distribution. However, our results still showed a normal distribution across all participants tested, consistent with the experiment without the wall (Supplementary figure 4B). This experiment suggested the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, rather than shaped by external forces or explicit instructions.

      We revised the original manuscript, and added this new experiment

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (3) I didn't completely follow the authors' explanation for the taller objects illusion. On lines 229-232, the authors state that deviations from gravity's veridical direction are likely to accumulate with the height of the objects. Is this because, in the stochastic world model account, each block gets its own gravity vector that is sampled from the distribution? The authors should clarify this more explicitly. If this is indeed the author's claim, then it would seem that it could be manipulated by varying the dimensions of the blocks (or whatever constitutes an object).

      R3: We are sorry for the confusion caused by the use of the term ‘accumulate’. In the study, there is only one gravity vector sampled from the distribution for the entire structure, rather than each block having a unique gravity vector. The height illusion is attributed to the fact that the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction. This is especially true for objects consisting of multiple blocks stacked atop one another. In revision, we have removed the confusing term ‘accumulate’ for clarification.

      Line 242-244: “…, because the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction during humans’ internal simulations.”

      (4) The authors refer to the RL simulations as agent-environment interactions, but in reality, the RL model does not interact with the blocks. Would experience-dependent or observation be more apropos?

      R4: We completely agree. Indeed, the RL model did not manipulate stacks; rather, it updated its knowledge of natural gravity based on the discrepancies between the RL model’s predictions and observed outcomes. In revision, we have removed the confusing term ‘agent-environment interactions’ and clarified its intended meaning.

      Line 19-22: “Furthermore, a computational model with reinforcement learning revealed that the stochastic characteristic likely originated from experience-dependent comparisons between predictions formed by internal simulations and the realities observed in the external world, …”

      Reviewer #3 (Public Review):

      (1) In spite of the fact that the Mental Gravity Simulation (MGS) seems to predict the data of the two experiments, it is an untenable hypothesis. I give the main reason for this conclusion by illustrating a simple thought experiment. Suppose you ask subjects to determine whether a single block (like those used in the simulations) is about to fall. We can think of blocks of varying heights. No matter how tall a block is, if it is standing on a horizontal surface it will not fall until some external perturbation disturbs its equilibrium. I am confident that most human observers would predict this outcome as well. However, the MSG simulation would not produce this outcome. Instead, it would predict a non-zero probability of the block to tip over. A gravitational field that is not perpendicular to the base has the equivalent effect of a horizontal force applied on the block at the height corresponding to the vertical position of the center of gravity. Depending on the friction determined by the contact between the base of the block and the surface where it stands there is a critical height where any horizontal force being applied would cause the block to fall while pivoting about one of the edges at the base (the one opposite to where the force has been applied). This critical height depends on both the size of the base and the friction coefficient. For short objects this critical height is larger than the height of the object, so that object would not fall. But for taller blocks, this is not the case. Indeed, the taller the block the smaller the deviation from a vertical gravitational field is needed for a fall to be expected. The discrepancy between this prediction and the most likely outcome of the simple experiment I have just outlined makes the MSG model implausible. Note also that a gravitational field that is not perpendicular to the ground surface is equivalent to the force field experienced by the block while standing on an inclined plane. For small friction values, the block is expected to slide down the incline, therefore another prediction of this MSG model is that when we observe an object on a surface exerting negligible friction (think of a puck on ice) we should expect that object to spontaneously move. But of course, we don't, as we do not expect tall objects that are standing to suddenly fall if left unperturbed. In summary, a stochastic world model cannot explain these simple observations.

      Author response image 5.

      Differentiating Subjectivity from Objectivity. In both Experiment 1 (a) and Experiment 2 (b), participants were instructed to determine which shape appeared most stable. Objectively, in the absence of external forces, all shapes possess equal stability. Yet, participants typically perceived the shape on the left as the most stable because of its larger base area. The discrepancy between objective realities and subjective feelings, as we propose, is attributed to the human mind representing gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      R1: We agree with the reviewer that objects will remain stable until disturbed by external forces. However, in many cases, this is a clear discrepancy between objective realities and subjective feelings. For example, electromagnetic waves associated with purple and red colors are the farthest in the electromagnetic space, yet purple and red are the closest colors in the color space. Similarly, as shown in Supplementary Figure 4, in reality all shapes possess equal stability in the absence of external forces. Yet, humans typically perceive the shape on the left as more stable because of its larger base area. In this study, we tried to explore the mechanism underlying this discrepancy by proposing that the human mind represents gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      In revision, we have clarified the rationale of this study

      Line 76-98: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach. Here, we investigated these two alternative hypotheses regarding the construction of the world model in the brain by examining how gravity’s direction is represented in the world model when participants judged object stability.”

      (2) The question remains as to how we can interpret the empirical data from the two experiments and their agreement with the predictions of the stochastic world model if we assume that the brain has internalized a vertical gravitational field. First, we need to look more closely at the questions posed to the subjects in the two experiments. In the first experiment, subjects are asked about how "normal" a fall of a block construction looks. Subjects seem to accept 50% of the time a fall is normal when the gravitational field is about 20 deg away from the vertical direction. The authors conclude that according to the brain, such an unusual gravitational field is possible. However, there are alternative explanations for these findings that do not require a perceptual error in the estimation of the direction of gravity. There are several aspects of the scene that may be misjudged by the observer. First, the 3D interpretation of the scene and the 3D motion of the objects can be inaccurate. Indeed, the simulation of a normal fall uploaded by the authors seems to show objects falling in a much weaker gravitational field than the one on Earth since the blocks seem to fall in "slow motion". This is probably because the perceived height of the structure is much smaller than the simulated height. In general, there are even more severe biases affecting the perception of 3D structures that depend on many factors, for instance, the viewpoint.

      R2: We thank the reviewer for highlighting several potential confounding factors in our study. We address each of these concerns point-by-point:

      (a) Misinterpretation of the 3D scene and motion. In Response Figure 4 shown above, there is no 3D structure, yet participants’ judgment on stability still deviated from objective realities. In addition, the introduction of 3D motion was to aid in understanding the stacks’ 3D structure. Previous studies without 3D motion have reported similar findings (Allen et al., 2020). Therefore, regardless of whether objects are presented in 2D or 3D, or in static or in motion formats, humans’ judgment on object stability appears consistent.

      (b) Errors in perceived height. While there might be discrepancies between perceived and simulated heights, such errors are systematic across all conditions. Therefore, they may affect the width of the Gaussian distribution but do not fundamentally alter its existence.

      (c) The viewpoint. In one experiment, we inverted gravity’s direction to point upward, diverging from common daily experience. Despite this change in viewpoint, the Gaussian distribution was still observed. That is, the viewpoint appears not a key factor in influencing how gravity’s direction is represented as a Gaussian distribution in our mental world.

      In summary, both our and previous studies (Allen et al., 2020; Battaglia et al., 2013) agree that humans’ subjective assessments of objects’ stability deviate from actual stability due to noise in mental simulation. Apart from previous studies, we suggest that this noise is intrinsic, rather than stemming from external forces or imperfect observations.

      (3) Second, the distribution of weight among the objects and the friction coefficients acting between the surfaces are also unknown parameters. In other words, there are several parameters that depend on the viewing conditions and material composition of the blocks that are unknown and need to be estimated. The authors assume that these parameters are derived accurately and only that assumption allows them to attribute the observed biases to an error in the estimate of the gravitational field. Of course, if the direction of gravity is the only parameter allowed to vary freely then it is no surprise that it explains the results. Instead, a simulation with a titled angle of gravity may give rise to a display that is interpreted as rendering a vertical gravitational field while other parameters are misperceived. Moreover, there is an additional factor that is intentionally dismissed by the authors that is a possible cause of the fall of a stack of cubes: an external force. Stacks that are initially standing should not fall all of a sudden unless some unwanted force is applied to the construction. For instance, a sudden gust of wind would create a force field on a stack that is equivalent to that produced by a tilted gravitational field. Such an explanation would easily apply to the findings of the second experiment. In that experiment subjects are explicitly asked if a stack of blocks looks "stable". This is an ambiguous question because the stability of a structure is always judged by imagining what would happen to the structure if an external perturbation is applied. The right question should be: "do you think this structure would fall if unperturbed". However, if stability is judged in the face of possible external perturbations then a tall structure would certainly be judged as less stable than a short structure occupying the same ground area. This is what the authors find. What they consider as a bias (tall structures are perceived as less stable than short structures) is instead a wrong interpretation of the mental process that determines stability. If subjects are asked the question "Is it going to fall?" then tall stacks of sound structure would be judged as stable as short stacks, just more precarious.

      R3: Indeed, the external forces suggested by the reviewer certainly influence judgments of objects’ stability. The critical question, however, is whether humans’ judgments on objects’ stability accurately mirror the actual stability of objects in the absence of external forces. To address this question, we designed two new experiments.

      Experiment 1: we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). We explicitly informed the participants that the wall could block wind, ensuring that no potential wind from the direction of the wall could influence the configuration. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants (Age: 25-30, two females), which is similar to the experiment without the wall (Supplementary Figure 4B).

      Author response image 6.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      Experiment 2: The second experiment adopted another paradigm to test the hypothesis of stochastic mental simulation. Consider humans to infer the landing point of a parabolic trajectory that was obscured by an occlude (Author response image 2A), the stochastic mental simulation predicted that humans’ behavior follows a Gaussian distribution. However, if humans’ judgments were influenced by external noise, the landing points could not be Gaussian. The experiment consists of 100 trials in total, and in each trial participants used a mouse to predict the landing point of each trajectory by clicking the left button. Our results found all three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      Author response image 7.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      (4) The RL model used as a proof of concept for how the brain may build a stochastic prior for the direction of gravity is based on very strong and unverified assumptions. The first assumption is that the brain already knows about the force of gravity, but it lacks knowledge of the direction of this force of gravity. The second assumption is that before learning the brain knows the effect of a gravitational field on a stack of blocks. How can the brain simulate the effect of a non-vertical gravitational field on a structure if it has never observed such an event?

      R4: We agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) The third assumption is that from the visual input, the brain is able to figure out the exact 3D coordinates of the blocks. This has been proven to be untrue in a large number of studies. Given these assumptions and the fact that the only parameters the RL model modifies through learning specify the direction of gravity, I am not surprised that the model produces the desired results.

      Author response image 8.

      Perception Uncertainty in 3D stacks structures. (a) Experimental design. A pair of two stacks with similar placements of blocks were presented sequentially to participants, who were instructed to judge whether the stacks were identical and to rate their confidence in this judgment. Each stack was presented on the screen for 2 seconds. (b) Behavior Performance. Three participants (2 males, age range: 24-30) were recruited to the experiment. The confidence in determining whether a pair of stacks remained unchanged rapidly decreased when each block had a very small displacement, suggesting humans could keenly perceive trivial changes in configurations. The x-axis denotes the difference in block placement between stacks, with the maximum value (0.4) corresponding to the length of a block’s short side. The Y-axis denotes humans’ confidence in reporting no change. The red curve illustrates the average confidence level across 4 runs, while the yellow curve is the confidence level of each run.

      R5: Indeed, uncertainty is inevitable when perceiving the external world, because our perception is not a faithful replica of external reality. A more critical question pertains to the accuracy of our perception in representing the 3D coordinates of a stack’s blocks. To address this question, we designed a straightforward experiment (Author response image 5a), where participants were instructed to determine whether a pair of stacks were identical. The position of each block was randomly changed horizontally. We found that all participants were able to accurately identify even minor positional variations in the 3D structure of the stacks (Author response image 5b). This level of perceptual precision is adequate for locating the difference between predictions from mental simulations and actual observations of the external world.

      (6)Finally, the argument that the MGS is more efficient than the NGS model is based on an incorrect analysis of the results of the simulation. It is true that 80% accuracy is reached faster by the MGS model than the 95% accuracy level is reached by the NGS model. But the question is: how fast does the NGS model reach 80% accuracy (before reaching the plateau)?

      R6: Yes. The NGS model achieved 80% accuracy as rapidly as the MGS model. However, the NGS model required a significantly longer period to reach the plateau crucial for decision-making. In revision, this information is now included.

      Line 348-350: “…, while the initial growth rates of both models were comparable, the MGS reached the plateau crucial for decision-making sooner than the NGS.”

      We greatly appreciate the thorough and insightful review provided by all three reviewers, which has considerably improved our manuscript, especially in terms of clarity in the presentation of the approach and further validation of the robustness implications of our results.

      Reference: Allen KR, Smith KA, Tenenbaum JB. 2020. Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences 117:29302–29310.

      Battaglia PW, Hamrick JB, Tenenbaum JB. 2013. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences 110:18327–18332.

      Friston K, Moran RJ, Nagai Y, Taniguchi T, Gomi H, Tenenbaum J. 2021. World model learning and inference. Neural Networks 144:573–590.

      Kriegeskorte N, Douglas PK. 2019. Interpreting encoding and decoding models. Current opinion in neurobiology 55:167–179.

      MacKay DM. 1956. The epistemological problem for automataAutomata Studies.(AM-34), Volume 34. Princeton University Press. pp. 235–252.

      Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M, Uchibe E, Morimoto J. 2022. Deep learning, reinforcement learning, and world models. Neural Networks.

      Naselaris T, Kay KN, Nishimoto S, Gallant JL. 2011. Encoding and decoding in fMRI. Neuroimage 56:400–410.

      Zhou L, Smith K, Tenenbaum J, Gerstenberg T. 2022. Mental Jenga: A counterfactual simulation model of physical support.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study combines a comparative approach in different synapses with experiments that show how synaptic vesicle endocytosis in nerve terminals regulates short-term plasticity. The data presented support the conclusions and make a convincing case for fast endocytosis as necessary for rapid vesicle recruitment to active zones. Some aspects of the description of the data and analysis are however incomplete and would benefit from a more rigorous approach. With more discussion of methods and analysis, this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After the acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses an acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear-cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated. Although this is a hard question and difficult to address experimentally, reagents may affect synaptic vesicle mobilization to the release sites directly in addition to blocking endocytosis.

      To acutely block vesicle endocytosis, we utilized two different pharmacological tools, Dynasore and Pitstop-2, after testing their blocking spectra and potencies at the calyx presynaptic terminals and collected data of their common effects on target functions. Since the recovery from STD was faster at the calyx synapses in the presence of both endocytic blockers in physiological 1.3 mM [Ca2+] (Figure 2B), but not in 2.0 mM [Ca2+] (Figure S4), they might facilitate vesicle mobilization in physiological condition.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular, the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse. This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      The concept of FRP and SRP are derived from voltage-clamp step-depolarization experiments at calyces of Held in pre-hearing rodents at RT, which cannot be directly dissected in data of action-potential evoked EPSCs at post-hearing calyces at physiological conditions. However, we dissected as much by referring to related literatures in new paragraphs in Result section (p9-10), particularly on the different effects of Latrunculin application and experimental conditions by adding a new supplementary Figure (now S5). Regarding F-actin role in vesicle replenishment at cerebellar synapses, we added sentences in Discussion section (p14, last paragraph).

      Reviewer #3 (Public Review):

      General comments:

      (1) While Dynasore and Pitstop-2 may impede release site clearance due to an arrest of membrane retrieval, neither Latrunculin-B nor ML-141 specifically acts on AZ scaffold proteins. Interference with actin polymerization may have a number of consequences many of which may be unrelated to release site clearance. Therefore, neither Latrunculin-B nor ML-141 can be considered suitable tools for specifically identifying the role of AZ scaffold proteins (i.e. ELKS family proteins, Piccolo, Bassoon, α-liprin, Unc13, RIM, RBP, etc) in release site clearance which was defined as one of the principal aims of this study.

      In this study, we focused our analysis on the downstream activity of scaffold protein intersectin by comparing the common inhibitory effects of CDC42 and actin polymerization, by use of ML141 and Latrunculin B, respectively, on vesicle endocytosis and synaptic depression/ facilitation without addressing diverse individual drug effects. To avoid confusion we removed “AZ” from scaffold protein.

      (2) Initial EPSC amplitudes more than doubled in the presence of Dynasor at hippocampal SC->CA1 synapses (Figure S2). This unexpected result raises doubts about the specificity of Dynasor as a tool to selectively block SV endocytosis.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) In this study, the application of Dynasore and Pitstop-2 strongly decreases 100 Hz steady-state release at calyx synapses while - quite unexpectedly - strongly accelerates recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      The latrunculin effect on STD can vary according to the condition of application and external [Ca2+], which we show in a new supplemental Figure S5. The latrunculin effect on the recovery from STD also varies with temperature, [Ca2+], and animal age, which affect Ca2+-dependent fast recovery component from depression. We added paragraphs for this issue in Results section (p9-10).

      (4) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We added methodological explanations and reworded sentences in the text to be clear for pharmacological data derived from non-sequential separate experiments.

      (5) The authors compare results obtained in calyx with those obtained in SC->CA1 synapses which they considered examples for 'fast' and 'slow' synapses, respectively. There is little information given to help readers understand why these two synapse types were chosen, what the attributes 'fast' and 'slow' refer to, and how that may matter for the questions studied here. I assume the authors refer to the maximum frequency these two synapse types are able to transmit rather than to EPSC kinetics?

      Yes, the “fast and slow” naming features maximum operating frequency these synapses can transmit. We reworded “fast and slow” to “fast-signaling and slow-plastic” and added explanation in the text.

      (6) Strong presynaptic stimuli such as those illustrated in Figures 1B and C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents a fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Since the data shown in Figs. 1 and 3 are central to the argumentation, illustration of the corresponding conductance traces is mandatory. Merely mentioning that the first 450 ms after stimulation were skipped during analysis is insufficient.

      Conductance trace is shown with a trace of capacitance change induced by a square pulse in our previous paper (Yamashita et al, 2005 Science).

      (7) It is essential for this study to preclude a contamination of the results with postsynaptic effects (AMPAR saturation and desensitization). AMPAR saturation limits the amplitudes of initial responses in EPSC trains and hastens the recovery from depression due to a 'ceiling effect'. AMPAR desensitization occludes paired-pulse facilitation and reduces steady-state responses during EPSC trains while accelerating the initial recovery from depression. The use of, for example, 1 mM kynurenic acid in the bath is a well-established strategy to attenuate postsynaptic effects at calyx synapses. All calyx EPSC recordings should have been performed under such conditions. Otherwise, recovery time courses and STP parameters are likely contaminated by postsynaptic effects. Since the effects of AMPAR saturation on EPSC_1 and desensitization on EPSC_ss may partially cancel each other, an unchanged relative STD in the presence of kynurenic acid is not necessarily a reliable indicator for the absence of postsynaptic effects. The use of kynurenic acid in the bath would have had the beneficial side effect of massively improving voltage-clamp conditions. For the typical values given in this MS (10 nA EPSC, 3 MOhm Rs) the expected voltage escape is ~30 mV corresponding to a change in driving force of 30 mV/80 mV=38%, i.e. initial EPSCs in trains are likely underestimated by 38%. Such large voltage escape usually results in unclamped INa(V) which was suppressed in this study by routinely including 2 mM QX-314 in the pipette solution. That approach does, however, not reduce the voltage escape.

      Glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) although it does in pre-hearing calyces (Yamashita et al, 2009). In fact, as shown in Figure S3, our results are essentially the same with or without kynurenate.

      (8) In the Results section (pages 7 and 8), the authors analyze the time course into STD during 100 Hz trains in the absence and presence of drugs. In the presence of drugs, an additional fast component is observed which is absent from control recordings. Based on this observation, the authors conclude that '... the mechanisms operate predominantly at the beginning of synaptic depression'. However, the consequences of blocking or slowing site clearing are expected to be strongly release-dependent. Assuming a probability of <20% that a fusion event occurs at a given release site, >80% of the sites cannot be affected at the arrival of the second AP even by a total arrest of site clearance simply because no fusion has yet occurred. That number decreases during a train according to (1-0.2)^n, where n is the number of the AP, such that after 10 APs, ~90% of the sites have been used and may potentially be unavailable for new rounds of release after slowing site clearance. Perhaps, the faster time course into STD in the presence of the drugs isn't related to site clearance?

      Enhanced depression at the beginning of stimulation indicates the block of rapid SV replenishment mechanism, which includes endocytosis-dependent site-clearance and scaffold-dependent vesicle translocation to release sites.

      (9) In the Discussion (page 10), the authors present a calculation that is supposed to explain the reduced size of the second calyx EPSC in a 100 Hz train in the presence of Dynasore or Pitstop-2. Does this calculation assume that all endocytosed SVs are immediately available for release within 10 ms? Please elaborate.

      We do not assume rapid endocytosed vesicle reuse within 10 ms as it requires much longer time for glutamate refilling (7s at PT; Hori & Takahashi, 2012). Instead, already filled reserved vesicles can rapidly replenish release sites if sites are clean and scaffold works properly. Results shown in Figure S6 also indicate that block of vesicle transmitter refilling has no immediate effect on synaptic responses.

      (10) It is not clear, why the bafilomycin/folimycin data is presented in Fig. S5. The data is also not mentioned in the Discussion. Either explain the purpose of these experiments or remove the data.

      These v-ATPase blockers, which block vesicular transmitter refilling, are reported to enhance EPSC depression at hippocampal synapses at RT and 2 mM [Ca2+] presumably because of lack of filled vesicles undergoing rapid vesicle recycling (eg Kiss & Run). We thought it important to determine whether these data have physiological relevance since such a mechanism might also regulate synaptic strength during repetitive transmission. However, our results did not support its physiological relevance. Since these results are not within our main questions, the negative results are shown it in supplementary Figure 6 and explained in the last paragraph of Result section (p11), but were not discussed further in Discussion section.

      (11) The scheme in Figure 7 is not very helpful.

      We updated the scheme to summarize our conclusion that vesicle replenishment through endocytosis-dependent site-clearance and scaffold-dependent mechanism independently co-operate to strengthen synaptic efficacy during repetitive transmission at calyx fast-signaling synapses. However, endocytic site clearance is solely required to support facilitation at slow-plastic hippocampal SC-CA1 synapses.

      Recommendations for the authors:

      First, my deep apologies for the long delay in reviewing your paper. All reviewers are now in agreement that the paper has valuable new information, but some methods are not described well and some results appear to be incompatible with previous results in the literature. The discussion of previous literature is also incomplete and not well-balanced. With more discussion of methods and literature strengthened this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms. We ask that you address the comments and revise your paper before we can fully recommend the paper as being an important contribution with compelling evidence and a strong data set that supports the conclusions.

      We explained methods more explicitly. Apparent incompatibility with previous results is now explained and discussed with new supplementary data.

      Major:

      (1) In this study, the application of Dynasore and Pitstop-2 strongly decreased 100 Hz steady-state release at calyx synapses while - quite unexpectedly - it strongly accelerated recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      Lack of change in the recovery from depression in dynamin-1 knockout mice by Mahapatra et al (2016) is consistent with results in Figure S4 in 2 mM [Ca2+], whereas accelerated recovery by Dynasore (Figure 2B2) is observed in 1.3 mM [Ca2+] suggesting that it is masked in 2 mM [Ca2+] but revealed in physiological [Ca2+] (p7, top paragraph). In both cases, however, recovery from STD is not prolonged unlike Hosoi et al (2009).

      The latrunculin issues are discussed in Results section with newly added Supplementary Figure S5 (p9-10).

      (2) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We made these points clearer in Method section and Result section.

      (3) Please cite and discuss briefly previous papers that have shown fast endocytosis in the calyx of Held with membrane capacitance measurements like Renden and von Gersdorff, J Neurophysiology, 98:3349, 2007 and Taschenberger et al., Neuron, 2002. These papers first showed exocytosis and endocytosis kinetics in more mature (hearing) mice calyx of Held and at higher physiological temperatures.

      One of these literatures relevant to the present study is quoted in p4.

      (4) The findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We added discussions on the issue of latrunculin in Result section by quoting previous literatures (p9-10). Since there is no direct evidence (by vesicle imaging) for the presence of FRP and SRP, these definitions derived from voltage clamp step-depolarization studies are difficult to incorporate into the dissection of synaptic depression in physiological conditions.

      Reviewer #1 (Recommendations For The Authors):

      I have no major comments, but the following issues may be addressed.

      (1) The term "fast and slow" synapses may be relative and a bit confusing. I do not think hippocampal synapses are slow synapses.

      We have replaced “fast and slow” by “fast-signaling and slow-plastic” to represent their functions and added explanation in the text.

      (2) Off-target effects of pharmacological effects may be discussed. In this respect, bafilomycin experiments can be used to argue against the slow effects of vesicle cycling such as endocytosis, and vesicle mobilization. However, the effects on rapid vesicle mobilization cannot be excluded entirely. Because I cannot exclude the absence of off-target effects either (can be addressed by looking at single vesicle imaging at nano-scale, which is hard to do or looking at EM level quantitatively?), I feel this is a matter of discussion.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) Fig2 A2, B2 and Fig 4 A2 and B2. It is easier to plot the recovery only normalized to the initial value. Subtracting steady-state is somewhat confusing because the recovery looks faster after deeper depression, but this may be just apparent.

      We have given values for both types of plots in Table 2, which indicates no essential difference in the recovery parameters.

      Reviewer #2 (Recommendations For The Authors):

      Line 51: Rajappa et al. (2016) investigated clearance deficits in synaptophysin KO mice (not synaptobrevin).

      Corrected.

      Line 54: intersectin is introduced as AZ scaffold protein, although in most of the literature, it is referred to as an endocytic scaffold protein (also in the cited one, e.g. Sakaba et al. 2013). At least, this should be discussed.

      Since blockers of intersectin downstream protein activity has no effect on vesicle endocytosis (Figure 3 and Sakaba et al, 2013), we called it (presynaptic) scaffold protein instead of endocytic scaffold protein.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Page 1, Title: I don't think the presented data address the role of the presynaptic scaffold in SV replenishment. In addition, 'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be implied here.

      In this study our focus was on the downstream activity of scaffold protein intersectin and since block of its downstream effector proteins CDC42 and actin activities do not obstruct the endocytic activity (Fig 3, and Sakaba et al., 2013), instead of naming it as “endocytic scaffold protein”, we adopted “presynaptic scaffold protein”.

      We have corrected it in the text.

      Page 2, Abstract: Clarify 'physiologically optimized condition' here and elsewhere in the manuscript.

      Abstract: in physiologically optimized condition → in physiological temperature and Ca2+.

      Page 3, line 62: I don't think 'the site-clearance hypothesis is widely accepted'. There are very few models that implement such a mechanism. Examples would be Pan & Zucker (2009) Neuron and Lin, Taschenberger & Neher 2022 (PNAS) which could be cited.

      62: the site-clearance hypothesis is “widely accepted”→ “well supported”

      Page 3 line 77: Please clarify 'fast synapses

      77: fast synapses→fast-signaling synapses, added clarification in the text.

      Page 4, line 100: Please clarify 'in the maximal rate'.

      100: in the maxima rate→reached during 1-Hz stimulation.

      Page 6, line 136: Please clarify 'to reduce the gap'.

      136: To reduce the gap between these different results→To explore the reason for these different results

      Page 7, line 157: I don't consider ML141 and Latrunculin-B 'scaffold protein inhibitors'.

      157: scaffold protein inhibitors had no effect on→ reworded as “none of these inhibitors affected fast or slow endocytosis”.  

      Page 7, line 162: P-value missing.

      162: p < 0.001 added.

      Page 8, line 184: "Since both endocytic blockers and scaffold inhibitors enhanced synaptic depression with a similar time course" consider rephrasing. Sounds like you refer to the time course by which these drugs exert their effect after being applied.

      184: Since both endocytic blockers and scaffold inhibitors enhance synaptic depression with a similar time course→Since the enhancement of synaptic depression by endocytic blockers or scaffold inhibitor occurred mostly at the early phase of synaptic depression.

      Same on page 11, line 250: "At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker" Please consider rephrasing.

      At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker →the early phase of synaptic depression like endocytic blockers

      Page 13, line 318: Please clearly state which experiments were performed at 1.3 mM and which at 2 mM external Ca if two different concentrations were used during recordings.

      320: Added text “Unless otherwise noted, EPSCs were recorded in 1.3 mM [Ca2+] aCSF at 37oC” in the methods.

      Page 15: line 346: Reference in the wrong format.

      346; (25) → (Yamashita et al, 2005)

      Page 15: line 351: Do you mean to say every 10 s and every 20 s? Please clarify.

      No, averaged at 10 ms and 20 ms, respectively as written.

      Page 16, line 369: 1 mM kyn was present in only very few experiments shown in the supplemental figures. Please clarify.

      368: In some experiments, to test in the presence of 1 mM kyn, if there is any difference in enhanced STD following endocytic block. However, as shown in Figure S3, our results are essentially the same with or without kynurenate, suggesting glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) unlike in pre-hearing calyces (Yamashita et al, 2009).

      Page 16, line 387: You cannot simply use multiple t-tests to compare a single control to multiple test conditions which seems to be the scenario here. Please correct or clarify.

      Experimental protocols are clarified in Methods as “Experiments were designed as population study using different cells from separate brain slices under control and drug treatment, rather than on a same cell before and after the drug exposure.”

      Table S1: 'Endo decay rate'. It's either the 'Endo rate' or the 'Deacy rate of delta Cm'. Please correct.

      Corrected as Endocytosis rate (Endo rate).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The authors' primary research question revolves around the inquiry of "how far in advance semantic information might become available from parafoveal preview." In contrast to prior studies, the current research seeks to achieve a breakthrough in terms of timing by employing innovative technology. They mention in the manuscript that "most of these studies have been limited to measuring parafoveal preview from fixations to an immediately adjacent word... We tackle these core issues using a new technique that combines the use of frequency tagging and the measurement of magnetoencephalography (MEG)-based signals." However, the argumentation for how this new technology constitutes a breakthrough is not sufficiently substantiated. Specifically, there are two aspects that require further clarification. Firstly, the authors should clarify the importance of investigating the timing of semantic integration in their research question. They need to justify why previous studies focusing on the preview effect during fixations to an immediately adjacent word cannot address their specific inquiry about "how far in advance semantic information might become available from parafoveal preview," which requires examining parafoveal processing (POF). Secondly, in terms of the research methodology, the authors should provide a more comprehensive explanation of the advantages offered by MEG technology in the observation of the timing of semantic integration compared to the techniques employed in prior research. Indeed, the authors have overlooked some rather significant studies in this area. For instance, the research conducted by Antúnez, Milligan, Hernández-Cabrera, Barber, & Schotter in 2022 addresses the same research question mentioned in the current study and employs a similar experimental design. Importantly, they utilize a natural reading paradigm with synchronized ERP and eye-tracking recordings. Collectively, these studies, along with the series of prior research studies employing ERP techniques and RSVP paradigms discussed by the authors in their manuscript, provide ample evidence that semantic information becomes available and integrated from words before fixation occurs. Therefore, the authors should provide a more comprehensive citation of relevant research and delve deeper into explaining the potential contributions of their chosen technology to this field.

      We express our gratitude to the reviewer for providing insightful comments. Firstly, we clarify the advantages of the RIFT technique. The revised paragraph is on Page 4 with tracked changes and is copied as follows:

      “…… The RIFT technique provides a notable advantage by generating a signal — the tagging response signal — specifically yoked to just the tagged word. This ensures a clear separation in processing the tagged word from the ongoing processing of other words, addressing a challenge faced by eye tracking and ERP/FRP approaches. Moreover, RIFT enables us to monitor the entire dynamics of attentional engagement with the tagged word, which may begin a few words before the tagged word is fixated.”

      We also rephase our research questions in the introduction section on Page 5 with tracked changes:

      “This paradigm allows us to address three questions. First, we aimed to measure when in the course of reading people begin to direct attention to parafoveal words. Second, we sought to ascertain when semantic information obtained through parafoveal preview is integrated into the sentence context. Modulations of pre-target RIFT responses by the contextual congruity of target words would serve as evidence that parafoveal semantic information has not only been extracted and integrated into the sentence context but that it is affecting how readers allocate attention across the text. Third, we explored whether these parafoveal semantic attention effects have any relationship to reading speed.”

      Secondly, we would like to elucidate the significance of investigating the timing of semantic integration and why this complements existing findings of parafoveal processing (POF) during reading. Our manuscript has been revised accordingly, with specific modifications highlighted on Page 2. The revised passage reads as follows:

      “…… eye tracking-based evidence for the extraction of parafoveal semantic information …… was eventually extended into English …… For example, Schotter and Jia (2016) showed preview benefits on early gaze measures for plausible compared to implausible words, even for plausible words that were unrelated to the target. These results demonstrate that semantic information can indeed be extracted from parafoveal words. However, due to the limitations of the boundary paradigm, which only assesses effects after target words have been fixated, it is challenging to precisely determine when and how parafoveal semantic processing takes place. Furthermore, it is generally hard to distinguish between the effects of cross-saccade integration (e.g., mismatch between the preview and the word fixated) and the effects of how differing words fit into the context itself (Veldre and Andrews, 2016a, 2016b).”

      Thirdly, we now better highlight the contributions of Antúnez et al. paper as they have provided important evidence for parafoveal semantic processing during natural reading. The relevant modifications are highlighted on Page 3. The revised passage is as follows: “Although many of these effects have been measured in the context of unnatural reading paradigms (e.g., the “RSVP flanker paradigm”), similar effects obtain during natural reading. Using the stimuli and procedures from Schotter and Jia (2016), Antúnez et al. (2022) showed that N400 responses, measured relative to the fixation before the target words (i.e., before the boundary change while the manipulated words were in parafoveal preview), were sensitive to the contextual plausibility of these previewed words. These studies suggest that semantic information is available from words before they are fixated, even if that information does not always have an impact on eye fixation patterns.”

      References:

      Schotter ER, Jia A. 2016. Semantic and plausibility preview benefit effects in English: Evidence from eye movements. J Exp Psychol Learn Mem Cogn 42:1839–1866. doi:10.1037/xlm0000281

      Veldre A, Andrews S. 2016a. Is Semantic Preview Benefit Due to Relatedness or Plausibility? J Exp Psychol Hum Percept Perform 42:939–952. doi:10.1037/xhp0000200

      Veldre A, Andrews S. 2016b. Semantic preview benefit in English: Individual differences in the extraction and use of parafoveal semantic information. J Exp Psychol Learn Mem Cogn 42:837–854. doi:10.1037/xlm0000212

      Antúnez M, Milligan S, Andrés Hernández-Cabrera J, Barber HA, Schotter ER. 2022. Semantic parafoveal processing in natural reading: Insight from fixation-related potentials & eye movements. Psychophysiology 59:e13986. doi:10.1111/PSYP.13986

      (2) Further, the authors emphasize semantic integration in their observed results but overlook the intricate relationship between access, priming, and integration. This assertion appears overly confident. Despite using low-constraint sentences and low-predicted targets (lines 439-441), differences between congruent and incongruent conditions may be influenced by word-level factors. For instance, in the first coherent sentence, such as "Last night, my lazy brother came to the party one minute before it was over" (line 1049), replacing the keyword "brother" with an incongruent word could create an incoherent sentence, possibly due to semantic violation, relation mismatch with "lazy," or prediction error related to animate objects. A similar consideration applies to the second example sentence, "Lily says this blue jacket will be a big fashion trend this fall" (line 1050), where the effect might result from a discrepancy between "blue" and an incongruent word. However, the authors do not provide incongruent sentences to substantiate their claims. I recommend that the authors discuss alternative explanations and potentially control for confounding factors before asserting that their results unequivocally reflect semantic integration. My intention is not to dispute the semantic integration interpretation but to stress the necessity for stronger evidence to support this assertion.

      We agree with the reviewer that stimulus control is very critical for this kind of work and apologize for the lack of clarity in the original manuscript.

      (1) We fully agree that word-level factors can be an important confound, which is why we carefully controlled word-level factors in the experimental design. As detailed in the Appendix of the original manuscript, each pair of target words has been strategically embedded into two sentences, allowing for the creation of both congruent and incongruent sentence pairs through the interchange of these words. We now have explicitly specified this design in all sentences, as reflected in the edited manuscript on Page 38. For example, considering the exemplar pair of “brother/jacket”,

      “Last night, my lazy brother/jacket came to the party one minute before it was over.

      Lily says this blue jacket/brother will be a big fashion trend this fall.”

      In this design, the pair of target words is presented in both congruent and incongruent sentences. Participant A reads “lazy brother” and “blue jacket”, while Participant B reads “lazy jacket” and “blue brother”. This approach ensures that the same target words appear in both congruent and incongruent conditions across participants, serving as an effective control for word-level factors.

      (2) We acknowledge that the consideration of word-level information is crucial when making claims about contextual integration in the current study. However, we don’t think there are many cases in the stimulus set where a single feature like animacy is enough to create the mismatch. Instead, the stimuli were written so that it is not possible to strongly predict any word or even a specific semantic feature, so that appreciating the mismatch requires the comprehender to integrate the word into the context (and especially to integrate the word with the immediately preceding one). However, this more local modifier/noun plausibility may behave differently from a more global contextual plausibility, which is a limitation of the stimulus set and has been discussed in the revised manuscript, as indicated by the tracked changes on Page 16, as copied below:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      Reviewer #2 (Public Review):

      This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating on the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parafoveal preview.

      The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered EEG and MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.

      We express our gratitude to the reviewer for recognizing the significance of our research question in the domain of natural reading.

      Major points:

      (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities as to why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires an integration and comprehension of the full sentence. I'm not sure whether the authors really needed to present their stimuli in a full-sentence context to obtain these effects. This should be explicitly discussed and also mentioned in the introduction (or even the abstract).

      We have addressed this limitation of the study explicitly in the revised manuscript. The modifications can be found in the tracked changes on Page 16, and is copied as follows:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in the visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency effect, as the RIFT response probably reflects top-down effects on visual attention etc. Was it necessary to use MEG? Would EEG have produced the same results? In terms of sensitivity, EEG is better than MEG as it is more sensitive to radial and deeper sources. This should be mentioned in the discussion and/or methods section.

      Source estimation was exclusively provided for the tagging response rather than the congruency effect because we posit that this conditional contrast would emanate from the same brain regions exhibiting the tagging responses in general. As depicted in the following figure, source localization for the congruency effect was identified in the left association cortex (Brodmann area 18), the same area as the source localization for the tagging response (the negative cluster observed here is due to the incongruent minus congruent contrast). While we agree with the Reviewer that the RIFT result might indicate a top-down effect on visual attention, it is important to note that, due to the low-pass filter property of synapses, observing a tagging response at a high frequency beyond the visual cortex is challenging.

      Author response image 1.

      We discussed the necessity of using MEG in the edited manuscript with tracked changes on Page 20, and is copied as follows:

      “While the current study was conducted using MEG, these procedures might also work with EEG. If so, this would make our approach accessible to more laboratories as EEG is less expensive. However, there are currently no studies directly comparing the RIFT response in EEG versus MEG. Therefore, it would be of great interest to investigate if the current findings can be replicated using EEG.”

      (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Even studies that tried to determine the earliest semantic effects arrived at around 200 ms (e.g. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382728/, https://psycnet.apa.org/record/2013-17451-002). The present results need to be discussed in a bit more detail in the context of the visual word recognition literature.

      We have incorporated this valuable suggestion into the discussion section to enhance the clarity of our key result regarding the timing of parafoveal semantic integration. The revised manuscript with tracked changes can be found on Page 14, and the relevant passage is provided below:

      “Our results also provide information about the time course of semantic integration …… by as early as within 100 ms after fixating on the pre-target word. The timing of this parafoveal semantic effect appears remarkably early, considering that typical semantic access for a single word occurs no earlier than around 200 ms, as demonstrated in the visual word recognition literature (Carreiras et al., 2014). For instance, in a Go/NoGo paradigm, the earliest distinguishable brain activity related to category-related semantic information of a word occurs at 160 ms (Amsel et al., 2013; Hauk et al., 2012). Therefore, the RIFT results presented here suggest that natural reading involves parallel processing that spans multiple words. The level of (covert) attention allocated to the target word, as indexed by the significant difference in RIFT responses compared to the baseline interval, was observed even three words in advance (see Figure 2C). This initial increase in RIFT coincided with the target entering the perceptual span (McConkie and Rayner, 1975; Rayner, 1975; Underwood and McConkie, 1985), likely aligning with the initial extraction of lower-level perceptual information about the target. The emerging sensitivity of the RIFT signal to target plausibility, detected around 100 ms after the fixation on the pre-target word, suggests that readers at that time had accumulated sufficient semantic information about the target words and integrated that information with the evolving sentence context. Therefore, it is plausible that the initial semantic processing of the target word commenced even before the pre-target fixation and was distributed across multiple words. This parallel processing of multiple words facilitates rapid and fluent reading.”

      References:

      Carreiras M, Armstrong BC, Perea M, Frost R. 2014. The what, when, where, and how of visual word recognition. Trends Cogn Sci 18:90–98. doi:10.1016/j.tics.2013.11.005

      Amsel BD, Urbach TP, Kutas M. 2013. Alive and grasping: Stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. doi:10.1016/J.NEUROIMAGE.2013.03.058

      Hauk O, Coutout C, Holden A, Chen Y. 2012. The time-course of single-word reading: Evidence from fast behavioral and brain responses. Neuroimage 60:1462. doi:10.1016/J.NEUROIMAGE.2012.01.061

      McConkie GW, Rayner K. 1975. The span of the effective stimulus during a fixation in reading. Percept Psychophys 17:578–586. doi:10.3758/BF03203972

      Rayner K. 1975. The perceptual span and peripheral cues in reading. Cogn Psychol 7:65–81.

      Underwood NR, McConkie GW. 1985. Perceptual Span for Letter Distinctions during Reading. Read Res Q 20:153. doi:10.2307/747752

      (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question of whether the observed effect is really "critical" for sentence comprehension. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance). The last sentence of the discussion is currently not justified by the results.

      We acknowledge that the correlation analysis between the RIFT effect and reading speed on the group level lacks causality, making it less ideal for addressing this question. We have incorporated this acknowledgment as one of the limitations of the current study in the revised manuscript on Page 16, as indicated by the tracked changes, and the relevant passage is provided below:

      “Two noteworthy limitations exist in the current study. …… Secondly, the correlation analysis between the pre-target RIFT effect and individual reading speed (Figure 5) does not establish a causal relationship between parafoveal semantic integration and reading performance. Given that the comprehension questions in the current study were designed primarily to maintain readers’ attention and the behavioural performance reached a ceiling level, employing more intricate comprehension questions in future studies would be ideal to accurately measure reading comprehension and reveal the impact of semantic parafoveal processing on it.”

      We reformulated the last sentence:

      “These results support the idea that words are processed in parallel and suggest that early and deep parafoveal processing may be important for fluent reading.”

      (5) L. 577f.: ICA components were selected by visual inspection. I would strongly recommend including EOG in future recordings when the control of eye movements is critical.

      We appreciate the reviewer for providing this valuable suggestion. We acknowledge that EOG recordings were not included in the current study due to restrictions on MEG data collection from the University of Birmingham during the COVID-19 pandemic. In our future studies, we will follow the reviewer's suggestion to incorporate EOG recordings in data collection. This addition will facilitate optimal eye movement-related artifact rejection through ICA, as recommended by Dimigen in his methodological paper:

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      (6) The authors mention "saccade planning" a few times. I would suggest looking at the SWIFT model of eye movement control, which is less mechanistic than the dominant EZ-Reader model (https://psycnet.apa.org/record/2005-13637-003). It may be useful for the framing of the study and interpretation of the results (e.g. second paragraph of discussion).

      In the revised manuscript, we have provided a more comprehensive explanation eye movements/saccade planning, aligning it with the SWIFT model. Please refer to Page 15 with tracked changes, and the updated passage is provided below:

      “The results of the present study are aligned with the SWIFT model of eye movement control in natural reading (Engbert et al., 2005), wherein the activation field linked to a given word is hypothesized to be both temporally and spatially distributed. Indeed, we found that the initial increase in covert attention to the target word occurred as early as three words before, as measured by RIFT responses (Figure 2C). These covert processes enable the detection of semantic incongruity (Figure 3B and Figure 3C). However, it may occur at the non-labile stage of saccade programming, preventing its manifestation in fixation measures of the currently fixated pre-target word (Figure 1B). Therefore, the RIFT technique’s capacity to yoke patterns to a specific word offers a unique opportunity to track the activation field of word processing during natural reading.”

      References:

      Engbert R, Nuthmann A, Richter EM, Kliegl R. 2005. Swift: A dynamical model of saccade generation during reading. Psychol Rev 112:777–813. doi:10.1037/0033-295X.112.4.777

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript is well-written and presents a structured analysis of the data, it requires further clarification and substantiation regarding the originality of the research questions, the advantages of the proposed methodology, and the interpretation of the results related to semantic integration. Additional references and a more thorough discussion of related research are needed to strengthen the manuscript's contribution to the field.

      We appreciate the reviewer's kind words about this manuscript and the insightful comments and suggestions provided. In the revised manuscript, we have now placed additional emphasis on the importance of investigating semantic integration within the realm of parafoveal processing in natural reading. We have clarified the advantages of employing MEG and RIFT and expanded upon our results in the context of Antúnez et al.'s 2022 paper, as suggested by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      (1) L. 59: The "N400" has been linked to much more than "semantic access". I think it is widely accepted that "access" happens (or at least begins) earlier, and that the N400 reflects high-level integration processes etc.

      Earlier debates about whether the N400 is more linked to access or integration have resolved in favour of an access account, but with a growing appreciation of the blurred boundaries between constructions like access, priming, and integration, as Reviewer 1 also pointed out in comment #2.

      (2) L. 177: I wasn't sure about the selection of sensors. Were the same sensors used for all participants (whether they had a tagging response or not)?

      We appreciate the reviewer for highlighting the confusion regarding the sensor selection procedure in the study. In response, we have added further clarifications about this procedure in the Method section of the revised manuscript. The relevant changes can be found on Page 25 with tracked changes, and the modified passage is reproduced below:

      "Please note that the tagging response sensors may vary in number across participants (7.9 ± 4.5 sensors per participant, M ± SD). Additionally, they may have a different but overlapping spatial layout, primarily over the visual cortex. For the topography of all tagging response sensors, please refer to Figure 2A."

      (3) Ll. 247ff.: I don't understand the idea of a "spill-over effect". The future cannot spill into the past. Or does this refer to possible artefacts or technical problems?

      In the revised manuscript, we have rephrased this passage with tracked changes on Page 11, and the updated version is provided below:

      “We conducted a similar analysis of the coherence measured when participants fixated the target word and found no significant modulations related to the contextual congruity of that target word. …… Thus, the parafoveal semantic integration effect identified during the pre-target intervals cannot be attributed to signal contamination from fixations on the target word induced by the temporal smoothing of filters.”

      (4) I struggled to follow the "internal attention" explanation for the paradoxical RIFT effect (p. 11/12).

      We appreciate the reviewer for pointing out the confusion, and we have rephrased the passage in the revised manuscript with tracked changes on Page 13. The revised version is provided below:

      "Previous work has demonstrated that tagging responses decrease as attention shifts from an external task (e.g., counting visual targets) to an internal task (e.g., counting heartbeats) (Kritzman et al., 2022). Similarly, in a reading scenario, visually perceiving the flickering word constitutes an external task, while the internal task involves the semantic integration of previewed information into the context. If more attentional resources are internally directed when faced with the challenge of integrating a contextually incongruent word, fewer attentional resources would remain for processing the flickering word. This may be the kind of shift reflected in the reduction in RIFT responses."

      References:

      Kritzman L, Eidelman-Rothman M, Keil A, Freche D, Sheppes G, Levit-Binnun N. 2022. Steady-state visual evoked potentials differentiate between internally and externally directed attention. Neuroimage 254:119133.

      (5) L. 572: Why was detrending necessary on top of a 0.5 Hz high-pass filter? Was detrending applied to the continuous raw data, or to epochs? Was it just the linear trend or other polynomial terms?

      We agree with the Reviewer that, given the prior application of a 0.5Hz high-pass filter to the data, the detrending does not alter the data. Nonetheless, we included this procedure in the manuscript for the sake of completeness. In the revised manuscript, we have provided additional clarification on this point, as indicated by the tracked changes on Page 23. The modified passage is presented below:

      "Subsequently, detrending was applied individually to each channel of the continuous raw data to factor out the linear trend."

      (6) Source analysis, p. 25f.: How was the beamformer regularized?

      This information was already included in the original manuscript on Page 26. The original text is provided below for reference:

      “No regularisation was performed to the CSD matrices (lambda = 0).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Why does stimulation at 0.15 Hz show a third harmonic signal (Figure 5A) but 0.25 Hz does not show a second harmonic signal?

      Second and third harmonic signals were sometimes observed in 0.15 Hz and also in 0.25 Hz and other frequency stimulations. The second harmonic signal is easier to understand as vasomotion may be reacting to both directions of oscillating stimuli. The reason for the emergence of the third harmonics was totally unknown. These harmonic signals were not always observed, and the magnitude of these signals was variable. The frequency-locked signal was robust, thus, in this manuscript, we decided to describe only this signal. These observations are mentioned in the revised manuscript (Results, page 9, paragraph 2).

      References for the windows are missing. Closed craniotomy: (Morii, Ngai, and Winn 1986). Thinned skull: (Drew et al. 2010).

      These references were incorporated into the revised manuscript.

      An explanation of, or at least a discussion on, why a flavoprotein or other intrinsic signal from the parenchyma might follow vasomotion with high fidelity would be most helpful.

      We spend a large part of the Results describing that any fluorescence signal from the brain parenchyma follows the vasomotion because the blood vessels largely lack fluorescence signals within the filter band that we observe. This is described as “shadow imaging”. What was rather puzzling was that flavoprotein or other intrinsic signals were phase-shifted in time. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. This is described in the manuscript as the following.

      (Results, page 13, paragraph 2)

      “Production and degradation of flavin and other metabolites may be induced by the fluctuation in the blood vessel diameter with a fixed delay time. The phase shift in the autofluorescence could be due to the additive effect of “shadow” imaging of the vessel and to the concentration fluctuation of the autofluorescent metabolite”

      Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections to the text and figures:

      (1) Figures 1 and 2- The single line slice basal and dilated traces are larger in Figure 2 (intact skull) than in Figure 1 (thinned skull)- have these been mixed up, as the authors state in the text that larger dilations are detected in the thinned skull preparation?

      The example vessel described for the thinned skull (Figure 1) happened to be larger than that shown for the intact skull (Figure 2). We did not describe that larger dilations are observed in the thinned skull preparation. What was described was that the vessel profiles were shallower in the intact skull. This is because the presence of the intact skull blurs the fluorescence image.

      (2) Figure 3- I think the lower panel of the amplitude spectrums from 3 individual animals included in D would benefit from being in its own panel within this Figure (i.e. E). The peak ratio is also used in this figure, but the equation to calculate this is not displayed until Figure 4.

      We thank the reviewer for recommending making the figure more comprehensible. We have divided panel D into D and E and shifted the panel character accordingly. The manuscript text was also updated.

      As the reviewer describes, the peak ratio of 0.25 Hz is used in Figure 3E (original). However, the equation to calculate this figure is described in the appropriate location within the main text of the manuscript (Results, page 10, paragraph 2) as well as in the figure legend.

      (3) Figure 5- In the visual stimulation traces displayed in C you have included a 10-degree scale bar, which looks similar in amplitude to the trace but the text states these are 17-degree amplitude traces.

      We thank the reviewer for noticing this mistake of labeling in the figure. We have corrected the error in the revised figure.

      (4) Figure 6- For the Texas red fluorescence traces and image scales displayed in F, you have shown the responding traces on the right and non-responding on the left, but the figure legend states the amplitude is strong on the left and weak on the right.

      We thank the reviewer for noticing the error in the figure legend text. We have corrected the error in the revised manuscript.

      (5) Figure 6- It would be helpful for the reader if the r value was displayed on the graph in G.

      We thank the reviewer for the suggestion. We have indicated the r value in Figure 6G as the reviewer recommended.

      Reviewer #3 (Recommendations For The Authors):

      Major

      It is unclear to me if the authors are studying vasomotion per se. Vasomotion is an intrinsic, natural rhythm of blood vessel diameter oscillation that is entrained by endogenous rhythmic neural activity. Importantly, if you take neural activity away, the blood vessel (with flow and pressure) should still be capable of oscillating due to an intrinsic mechanism within the vessel wall. In contrast, if one increases neural activity by way of sensory stimulation and blood flow increases, this is the basis of functional hyperemia. If one stimulates the brain over and over again at a particular frequency, it is expected that blood flow will increase whenever neural activity increases to the stimulus, up to a particular frequency until the blood vessel cannot physically track the stimulus fast enough. Functional hyperemia does not depend on an intrinsic oscillator mechanism. It occurs when the brain becomes active above endogenous resting activity due to sensory or motor activity.

      We thank the reviewer for stressing the importance of the distinction between “vasomotion” and functional “hyperemia”.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia, with both vasoconstriction and vasodilation, induced with slow oscillating visual stimuli was called “visually induced vasomotion”. This distinction in the terminology is now explicitly introduced in the revised manuscript (Introduction, page 3, paragraph 2-3; page 4, paragraph 1-2).

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed. Importantly, this visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. How much of the visually induced vasomotion relies on the mechanisms of intrinsic spontaneous vasomotion is also undetermined. Discussion about the future directions of understanding the mechanisms of visually induced vasomotion and entrainment is described in better detail in the revised manuscript (Discussions, page 19, paragraph 1).

      To me, one would need to silence the naturally occurring vasomotion to study it. As soon as one activates the brain with an external stimulus, functional hyperemia is being studied. One idea that would be interesting to look at is whether a single or perhaps a double stimulus, in an untrained vs trained mouse, shows vasodilation that occurs across the cortex and in the cerebellum. In other words, is there something special about repeating the signal over and over again that results in brain-wide synchronization, or does a single or double oscillation of the same frequency (0.25Hz) also transiently synchronize the brain? My guess is that a short stimulus would give you the same thing (especially in a trained mouse) and that there is nothing special about oscillating the signal over and over again (except for the learning component).

      We thank the reviewer for the ideas of new experiments to understand whether the visually induced vasomotion shares the same mechanisms for creating spontaneous vasomotion or not.

      We would like to emphasize again that the visually induced vasomotion is not observed in the Novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to the visual stimuli. Entrainment with repeated presentation of visual stimuli is required for this global synchronization phenomenon to occur.

      We would also like to emphasize that, even in Expert animals, the visually induced vasomotion that is frequency-locked to the presented stimulus does not always occur immediately. As shown in Figure 3D lower panel (Figure 3E in the revised figure), the vasomotion did not always immediately frequency-lock. The vasomotion was also not always stable throughout the 15 min of visual stimulation presentation. These characteristics are emphasized in the revised manuscript (Results, page 10, paragraph 1).

      Therefore, we would assume that a single or double frequency of the visual stimulation would not always be sufficient to transiently frequency-lock the visually induced vasomotion.

      An alternative idea is to test frequencies lower than vasomotion. Vasomotion typically oscillates around a wide range of very low frequencies averaging around 0.1Hz, yet here the authors entrain blood vessel oscillations towards the top end of vasomotion, at 0.25Hz. What would happen if the authors tried synchronizing brain activity with 0.025Hz? Would the natural vasomotion frequency still be there, or would it be gone, dominated by the 0.025Hz entrainment?

      We would assume that visually induced vasomotion will not be induced with 0.025 Hz visual stimuli. This is too slow to induce smooth pursuit of the visual stimuli with eye movement. We show that, even if smooth eye pursuit occurs, the visually induced vasomotion may or may not occur (Figure 6F). However, visually induced vasomotion does not largely occur without eye movement. Therefore, the proposed experiment by the reviewer is likely not doable.

      Finally, perhaps the authors can see if there is a long-lasting change in natural vasomotion occurring after the animal has been trained to 0.25Hz. For example, is there greater power in the endogenous fluctuation at either 0.25Hz (or perhaps 0.1Hz) with no visual stimulation given but after the animal has been trained? These ideas would be interesting to test and could help clarify whether this is plasticity in functional hyperemia or plasticity in vasomotion.

      It should also be mentioned that the frequency-locked vasomotion quickly dissipates as soon as the visual stimulation is halted (Figure 3D upper panel, middle). However, we agree with the reviewer that it would be interesting to see whether the fragmentation of the spontaneous vasomotion is observed less in the Trained or Expert mice compared to the Novice mice, to understand whether the entrainment effect would propagate to the properties of the spontaneous vasomotion.

      This issue I have raised is not a fundamental flaw in the paper, it pertains more to the wording, phrasing, and pitch of the paper i.e. is this really entrained and plastic vasomotion? I am skeptical. Nevertheless, I think the authors should try some of these suggestions to better characterize this effect.

      We agree that the phrasing used in the original manuscript was rather confusing, as “vasomotion” normally refers to spontaneous vascular movement. However, functional “hyperemia” may not adequately express the phenomenon that we observe either. The phenomenon that we observe is slowly oscillating vasodilation and vasoconstriction that is induced with visual stimuli with a temporal frequency similar to the spontaneously occurring “vasomotion”. This phenomenon is not a direct hyperemia response to the visual stimuli as it requires entrainment and it spreads globally throughout the whole brain. We revised our manuscript to define the terminology that we use.

      An important question is if neural activity is entraining the CBF responses. The authors should do one experiment in a pan-neural GCaMP line to test if neural activity in the visual cortex (and other areas captured in the widefield microscope) shows a progressive and gradual synchronization (or not) to the vasomotion responses with training. It is possible to do this through a thinned skull window. This important to know if/how synchronized population neural activity scales with training. Perhaps they will not correlate and there is something more subtle going on.

      In our paper, we mainly studied visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex.

      An important point that should be pointed out is that the neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated. This argument is now incorporated in the revised manuscript (Discussions, page 19, paragraph 1).

      We agree with the reviewer that, to identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required. We recognize this “shadow” effect and we are currently developing methods to take out the “shadow” effect and the intracellular pH fluctuation effect from the fluorescence traces.

      The authors nicely show that plasticity in vasomotion coincides with the mouse learning the HOKR task and that as eye movement tracks the stimulus, CBF gets entrained. However, there could also be a stress effect going on in the early trials, and as the mouse gets used to the procedure and stress comes down, the vasomotion entrainment can be seen. It could be the case that the vasomotion process is there on the first trial, but masked by stress-induced effects on neural and/or vascular activity. I did not see anything in the methods about how the mouse was habituated to head restraint. Was the first visual stim trial the first time the mouse was head restrained? If so, there could be a strong stress effect. The authors should address this either by clarifying that habituation to head restraint was done, or by doing a control experiment where each animal receives at least 1week of progressive and gradual head restraint before doing the same HOKR experiment using multiple trials.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the TexasRed experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, TexasRed was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, TexasRed was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion. This argument is included in the revised manuscript (Discussions, page 20, paragraph 2).

      Minor

      The first sentence of the introduction requires citations. It is also a somewhat irrelevant comparison to make.

      Necessary citation was made in the revised manuscript, as the reviewer suggested. We think that describing how the energy is distributed in the brain would provide one of the most important breakthroughs to the understanding of how efficient information processing in the brain works. Therefore, we would like to keep this introduction.

      The third and fourth sentence of the introduction equates vasodilation/vasoconstriction with vasomotion and it is not this simple. Vasomotion is a specific physiological process involving rhythmic changes to artery diameter. Also, the frequency of these slow oscillations needs to be stated. The authors only say they are slower than 10Hz.

      The definition of spontaneous vasomotion with indication of typical temporal frequency is described in the revised manuscript, as the reviewer suggested.

      More than half of the introduction is describing the paper itself, rather than setting the stage for the findings. The authors need a more thorough account of what is known and what is not known in this area. Some of this information is in the discussion, which should be moved up to the intro.

      We have revised the introduction to include the definition of spontaneous vasomotion and visually induced vasomotion or functional hyperemia, as the reviewer suggested.

      In the first paragraph of the results section, the authors should state in what way the mice are awake. Are they freely mobile? Are they head-restrained? Are they resting or moving or doing both at different times? This is clarified later but it should come up front as someone reads through the paper.

      As the reviewer suggested, we clarified that the experiments were done in awake and head-restrained mice within the first paragraph for the Results section.

      The authors say "As shown later, blood vessels on the surface...". There is no need to say "as shown later".

      This is deleted as the reviewer suggested.

      The use of "full width at 10% maximum" of the Texas red intensity for the diameter measure is a little odd, as it may actually overestimate the diameter, but I see what the authors were trying to do. A full-width half max is standard here and that is likely more appropriate. Also, the line profiles of intensity are not raw data. The authors say the trace is strongly filtered/smoothed. If so, this creates a somewhat artificial platform to make the diameter measurement. The authors should show raw data from a single experiment and make the measurement from that. The raw line profile should look almost square, where a full-width half-max would work well.

      Contrary to what the reviewer observed, the raw line profile was not almost square. Even if there were almost no blur in the XY dimension in the optical imaging system, one would not expect to see a square line profile, as the thickness of the vessel increases in the Z dimension towards the center, as this is not a confocal or two-photon microscope image, and an ideal optical section was not created. Therefore, the full-width half-maximum value would definitely be an underestimate of the actual vessel diameter. It may be possible to equate an ideal value for cutoff if we have the 3D point spread function of the imaging. 10% is an arbitrary number but we think 10% is the minimum intensity that we can distinguish from the background intensity fluctuations. We did not attempt to derive the “true” diameter of the vessel and full-width at 10% maximum is just an index of the actual diameter. In most of the manuscript, we only deal with the change of the vessel diameter relative to the basal diameter, therefore, we considered that careful derivation of the absolute diameter estimate is not necessary. This argument is detailed in the Materials and Methods section in the revised manuscript (page 31, paragraph 2).

      The raw line profile before filtering is shown overlaid in Figure 1C, as the reviewer suggested.

      In Figures 1 and 2, state/label what brain region this is.

      The blood vessels between the bregma and lambda on the cortex were observed and described in Figures 1 and 2. This is described in the revised manuscript, as the reviewer suggested.

      Can the authors also show what a vein or venule looks like using their quantification method in Figures 1 and 2? This would be a helpful comparison to a static vein.

      The methods shown in Figures 1 and 2 would not allow us to distinguish between vein and venule in our study. Methods that allow quantification of the relative blood vessel diameter fluctuation due to spontaneous or visually induced vasomotion activities are shown in Figures 1 and 2. Later in the manuscript, the whole intensity fluctuation of TexasRed or autofluorescence in the brain parenchyma is studied, and in this case, no distinction between vein and venules could be made.

      Statements such as this are not necessary: "Later in the manuscript, we will be dealing with vasomotion dynamics observed with the optical fiber photometry methods, in which the blood vessel type under the detection of the fiber could not be identified". Simply talk about this data when you get to it.

      We have deleted this statement in this part of the manuscript, as the reviewer suggested.

      Same as this, please consider deleting: "Spontaneous vasomotion dynamic differences between different classes of blood vessels would be of interest to study using a more sophisticated in vivo two-photon microscope which we do not own." Just describe the data you have from the methods you have. There is no need to lament.

      We deleted this sentence, as the reviewer suggested.

      Figure 3 D the light blue boxes showing the time period of visual stimulation physically overlay with the frequency-time spectrograms. They should not overlay with this graph because it makes them more light blue, distorting the figure which also uses light blue in the heat map.

      Figure 3D was modified, as the reviewer suggested.

      The authors say: "The reason why the vasomotion detected in our system through the intact skull in awake in vivo mice was less periodic was unknown." Yes, but you are imaging an awake mouse. Many spontaneous behaviours such as whisking, grooming, twitching, and struggling will manifest as increased artery diameter. These will be functional hyperemia occurring events on top of rhythmic vasomotion. This can be briefly discussed.

      As the reviewer comments, the vasomotion detected in awake mice was likely to be less periodic because the spontaneous animal behavior induces functional hyperemia and interrupts spontaneous vasomotion. This interpretation was included in the revised manuscript (Results, page 8, paragraph 1).

      The authors say "extremely tuned" on page 8. They should not use words like "extremely". Perhaps say "more strongly tuned" or equivalent.

      We have changed “extremely” to “more strongly”, as the reviewer suggested.

      The authors say "First, the Texas Red fluorescence images were Gaussian filtered in the spatial XY dimension to take out the random noise presumably created within the imaging system." It is inadvisable to alter the raw data in this way unless there is a sound reason to do so. If there is random noise this should not affect the Fast Fourier Transform analysis. If there is regular noise caused by instrumentation artefact, which is picked up by the analysis then perhaps this could be filtered out. A static Texas red sample in a vial can be used to determine if there is artefactual noise.

      We mainly used the Gaussian filter for better presentation of the imaged data. The TexasRed fluorescence was low in intensity and the acquired images were Gaussian filtered in the spatial XY dimesion to reduce the pixelated noise at the expense of spatial resolution reduction. This filter should not affect the temporal frequency of the observed vasomotion. This is now more clearly indicated in the revised manuscript (Results, page 10, paragraph 2).

      There are endogenous fluorescent molecules in cell metabolism that change dynamically to neural activity: NADH, NADPH, and FAD. These are almost certainly a fraction of the auto-fluorescent signal the authors are measuring and it would be expected to see small fluctuations in these metabolites with neural activity. Perhaps this can be discussed, and the authors can likely argue that metabolic signals are much smaller than the change caused by vasodilation.

      We found that the autofluorescence signal was phase-shifted in time relative to the vasomotion, which was visualized with TexasRed. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. It is also expected that these metabolites may fluctuate according to the neuronal activity that triggers visually induced vasomotion or functional hyperemia. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      The authors say "however, we found that, if Texas Red had to be injected before every training session, the mouse did not learn very well." This is interesting. Why do the authors suppose this was the case? Stress from the injection? Or perhaps some deleterious effect on blood vessel function caused by the dye itself? Either way, I think this honest statement should remain. Others need to know about it.

      We think that the stress from the injection interferes with the HOKR learning. However, as shown, TexasRed injection after the mouse had learned did not interfere with the eye movement or with the visually induced vasomotion. We do not know whether the injection stress directly interferes with the blood vessel function and affects the plastic vasomotion entrainment. These arguments are now described in the revised manuscript (Discussions, page 20, paragraph 2). The statement above remains as is, as the reviewer suggested.

      YCnano50 is a calcium sensor and not really appropriate for the use employed by the authors. They are exciting YFP at 505nm but unless the authors are using a laser line, there is some bandwidth of excitation light that is likely exciting the CFP too which still absorbs light up to ~490nm. Here, calcium signalling may affect the YFP signal. This can be discussed.

      Multiband-pass filter (Chroma 69008x with the relevant band of 503 nm / 19.5 nm (FWHM)) was used for direct excitation of YFP. Negligible light is passed below 490 nm. CFP excitation above 490 nm is assumed to be negligible and usually not defined in literature. We assume that with our optical system, fluorescence by direct YFP excitation dominates the effect from the minor CFP excitation effect. We explicitly describe this in the revised manuscript (Materials and Methods, page 28, paragraph 2).

      The discussion is interesting but does not actually discuss much of the data or measurements in the paper. Most of the discussion reads more like a topical review, rather than a critical analysis of the effects/measurements and why the authors' interpretations are likely correct. This can be improved.

      As the reviewer suggests, we have improved the discussion by starting with the summary of the results (Discussion, page 19, paragraph 1). We also included the possibility of stress affecting visually induced vasomotion (Discussion, page 20, paragraph 2).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. We have records of the fast-z correction applied by the ScanImage on microscope during acquisition, so we have supplied the online fast-z motion correction .csv files for two example sessions on our GitHub page as supplementary files:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      These files correspond to Figure S3b (2367_200214_E210_1) and to Figures 5 and 6 (3056_200924_E235_1). These are now also referenced in the main text. See lines ~595, pg 18 and lines ~762, pg 24.

      We have also made minor revisions to the main text of the manuscript with clear descriptions of methods that we have found important for the minimization of movement artifacts, such as fully tightening all mounting devices, implanting the cranial window with proper, evenly applied pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel. See Line ~309, pg 10.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We have opted to not change the title of the paper, because we feel that adding the qualifier, “in two preparations,” would add unnecessary complexity. In addition, while the dorsal mount preparation allows for imaging of bilateral dorsal cortex, the side mount preparation does indeed allow for imaging of both dorsal and lateral cortex across the right hemisphere (a bit of contralateral dorsal cortex is also imageable), and the design can be easily “flipped” across a mirror-plane to allow for imaging of left dorsal and lateral cortex. Taken together, we do show preparations that allow for pan-cortical 2-photon imaging.

      We do agree that imprecise reference to the two preparations can sometimes lead to confusion. Therefore, we made several small revisions to the manuscript, including at ~line 545, to make it clearer that we used two imaging preparations to generate our combined 2-photon mesoscope dataset, and that each of those two preparations had both benefits and limitations.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: The preparation by Esmaeili et al. 2021 has some similarities to, but also differences from, our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries for our side mount preparation, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We have compared these preparations more thoroughly in the revised manuscript. (See lines ~278.)

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion of areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons per session. We now mention these various factors and have made clear that we were not, for the purposes of this paper, trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      We refer to these issues now briefly in the main text. (See ~line 93, pg 3).

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript at ~line 235, pg. 7.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it is possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use ultrasound gel instead (which we found to be, to some degree, optically inferior to water), but without the horizontal light shield, light from the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult under these conditions because the camera would need the same optical access angle as the 2-photon objective, or would need to be moved downward toward the air table and rotated up at an angle of 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance.

      The B-SOiD analysis that we show in Figure 6 is based on a model trained on 80% of the data from four sessions taken from the same mouse, and then tested on all of a single session from that mouse. Initial attempts to train across sessions from different mice were unsuccessful, probably due to differences in behavioral repertoires across mice. However, we have performed extensive tests with B-SOiD and are confident that these sorts of results are reproducible across mice, although we are not prepared to publish these results at this time.

      We now clarify these points in the main text at ~line 865, pg 27.

      An additional comparison of the results of B-SOiD trained on different numbers of sessions to that of keypoint-MOSEQ (Weinreb et al, 2023, bioRxiv) trained on ~20 sessions can now be found as supplementary material on our GitHub site:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/Figure_SZZ_BSOID_MOSEQ_align.pdf

      The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: After re-examination of the original analysis output files, we have indeed discovered that some of the Rastermap neuron density maps in Figure 6e were incorrectly aligned with their respective qualitative behaviors due to a discrepancy in file numbering between the images in 6e and the ensembles identified in 6c (each time that Rastermap is run on the same data, at least with the older version available at the time of creation of these figures, the order of the ensembles on the y-axis changes and thus the numbering of the ensembles would change even though the neuron identities within each group stayed the same for a given set of parameters).

      This unfortunate panel alignment / graphical display error present in the original reviewed preprint has been fixed in the current, updated figure (i.e. twitch corresponds to Rastermap groups 2 and 3, whisk to group 6, walk to groups 5 and 4, and oscillate to groups 0 and 1), and in the main text at ~line 925, pg 29. We have also changed the figure legend, which also contained accurate but misaligned information, for Figure 6e to reflect this correction.

      One can now see that, because the data from both figures is from the same session in the same mouse, as you correctly point out, Fig 5d left (walk and whisk) corresponds roughly to Fig 6e group R7, “walk”, and that Fig 5d right (whisk) corresponds roughly to Fig 6e group R4, “twitch”.

      We have double-checked the identity of other CCF map displays of Rastermap neuron density and of mean correlations between neural activity and behavioral primitives in all other figures, and we found no other such alignment or mis-labeling errors.

      We have also added a caveat in the main text at ~lines 925-940, pg. 30, pointing out the preliminary nature of these findings, which are shown here as an example of the viability of the methods. Analysis of the variability of Rastermap alignments across sessions is beyond the scope of the current paper, although it is an issue that we hope to address in upcoming analysis papers.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We added a brief explanation of Suite2p motion correction at ~line 136, pg 4. We have also added additional details concerning CCF / MMM alignment and other analysis issues. In general we cite other papers where possible to avoid repeating details of analysis methods that are already published.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including those for multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We have edited the motivations behind the study to clarify the general problems that are being addressed. However, as the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details necessarily deal specifically with this system.

      We briefly compare the methods and results from our Thorlabs system to that of Diesel-2p, another comparable system, based on what we have been able to glean from the literature on its strengths and weaknesses. See ~lines 206-213, pg 6.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with in-depth analysis papers that are currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (d). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      We now reference this figure on ~lines 190-192, pg 6 of the main text, near the beginning of the Results section.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453:

      “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks”

      ,we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For the claim stated on line 463:

      “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”

      ,we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107).

      We have included these two new references in the new, revised version of our paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: It would be useful if we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group within each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons, or ~3.3% density, per CCF area.) Our current figure legend states the maximums of the scale bar look-up values (reds) for each group, which range from ~8% to 32%.

      However, because the data in panel 6e are from a single session and are being provided as an example of our methods and not for the purpose of claiming a specific result at this point, we choose not to report statistics. It is worth pointing out, perhaps, that Rastermap group densities for a given CCF area close to 3.3% are likely not different from chance, and those closer to ~40%, which is our highest density (for area M2 in Rastermap group 7, which corresponds to the qualitative behavior “walk”), are most likely not due to chance. Without analysis of multiple sessions from the same mouse we believe that making a clear statement of significance for this likelihood would be premature.

      We now clarify this decision and related considerations in the main text at ~line 920, pg 29.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript.

      Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (c). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data that one can expect to obtain using our methods. We will provide a more complete analysis of data obtained using our methodology in the near future in another manuscript.

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare 1-photon widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli (i.e. “passive sensory stimulation”), while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We have corrected these statements and incorporated these and other relevant references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, and speed. We will reference the papers you mention without an extensive literature review, but we would like to emphasize the following points:

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity related fluorescence from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      We have further clarified our discussion of these issues in the main text at ~lines 76-80, pg 2.

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we now include brief supplementary material demonstrating the changes in the window preparation that we observed over the prolonged time periods of our study, for both the dorsal and side mount preparations. The following link to this material is now referenced at ~line 287, pg 9, and at the end of Fig S1:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      We have also included brief additional details in the main text that we found were useful for facilitating long term use of these preparations. These are located at ~line 287-290, pg 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sharing raw data and code:

      I strongly encourage sharing some of the raw data from your experiments and all the code used for data analysis (e.g. in a github repository). This would help the reader evaluate data quality, and reproduce your results.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      Our existing GitHub repository, already referenced in the paper, is located here:

      https://github.com/vickerse1/mesoscope_spontaneous

      We have added an additional reference in the main text to the existence of these publicly available resources, including the appropriate links, located at ~lines 190-200, pg 6.

      (2) Use of proprietary software:

      The reliance on proprietary tools like LabView and Matlab could be a limitation for some researchers, given the associated costs and accessibility issues. If possible, consider incorporating or suggesting alternatives that are open-source, to make your methodology more accessible to a broader range of researchers, including those with limited resources.

      Authors’ Response: We are reluctant to recommend open source software that we have not thoroughly tested ourselves. However, we will mention, when appropriate, possible options for the reader to consider.

      Although LabView is proprietary and can be difficult to code, it is particularly useful when used in combination with National Instruments hardware. ScanImage in use with the Thorlabs mesoscope uses National Instruments hardware, and it is convenient to maintain hardware standards across the integrated rig/experimental system. Labview is also useful because it comes with a huge library of device drivers that makes addition of new hardware from basically any source very convenient.

      That being said, there are open source alternatives that could conceivably be used to replace parts of our system. One example is AutoPilot (author: Jonny Saunders), for control of behavioral data acquisition: https://open-neuroscience.com/post/autopilot/.

      We are not aware of an alternative to Matlab for control of ScanImage, which is the supported control software for the ThorLabs 2-photon mesoscope.

      Most of our processing and analysis code (see GitHub page: https://github.com/vickerse1/mesoscope_spontaneous) is in Python, but some of the code that we currently use remains in Matlab form. Certainly, this could be re-written as Python code. However, we feel like this is outside the scope of the current paper. We have provided commenting to all code in an attempt to aid users in translating it to other languages, if they so desire.

      (3) Quantifying the effect of tilted head:

      To address the potential impact of tilting the mouse's head on your findings, a quantitative analysis of any systematic differences in the behavior (e.g. Bsoid motifs) could be illuminating.

      Authors’ Response: We have performed DeepLabCut analysis of all sessions from both preparations, across several iterations with different parameters, to extract pose estimates, and we have also performed BSOiD of these sessions. We did not find any obvious qualitative differences in the number of behavioral motifs identified, the dwell times of these motifs, and similar issues, relating to the issue of tilting of the mouse’s head in the side mount preparation. We also did not find any obvious differences in the relative frequencies of high level qualitative behaviors, such as the ones referred to in Fig. 6, between the two preparations.

      Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      (4) Clarification in the discussion section:

      The paragraph titled "Advantages and disadvantages of our approach" seems to diverge into discussing future directions, rather than focusing on the intended topic. I suggest revisiting this section to ensure that it accurately reflects the strengths and limitations of your approach.

      Authors’ Response: We agree with the reviewer that this section included several potential next steps or solutions for each advantage and disadvantage, which the reviewer refers to as “future directions” and are thus arguably beyond the scope of this section. Therefore we have retitled this section as, “Advantages and disadvantages of our approach (with potential solutions):”.

      Although we believe this to be a logical organization, and we already include a section focused purely on future directions in the Discussion section, we have refocused each paragraph of the advantages/disadvantages subsection to concentrate on the advantages and disadvantages per se. In addition, we have made minor changes to the “future directions” section to make it more succinct and practical. These changes can be found at lines ~1016-1077, pg 33-34.

      Reviewer #2 (Recommendations For The Authors):

      Below are some more detailed points that will hopefully help to further improve the quality and scope of the manuscript.

      • While it is certainly favorable for many questions to measure large-scale activity from many brain regions, the introduction appears to suggest that this is a prerequisite to understanding multimodal decision-making. This is based on the argument that combining multiple recordings with movement indicators will 'necessarily obscure the true spatial correlation structures'. However, I don't understand why this is the case or what is meant by 'true spatial correlation structures'. Aren't there many earlier studies that provided important insights from individual cortical areas? It would be helpful to improve the writing to make this argument clearer.

      Authors’ Response: The reviewer makes an excellent point and we have re-worded the manuscript appropriately, to reflect the following clarifications. These changes can be found at ~lines 58-71, pg. 2.

      We believe you are referring to the following passage from the introduction:

      “Furthermore, the arousal dependence of membrane potential across cortical areas has been shown to be diverse and predictable by a temporally filtered readout of pupil diameter and walking speed (Shimoaka et al, 2018). This makes simultaneous recording of multiple cortical areas essential for comparison of the dependence of their neural activity on arousal/movement, because combining multiple recording sessions with pupil dilations and walking bouts of different durations will necessarily obscure the true spatial correlation structures.”

      Here, we do not mean to imply that earlier studies of individual cortical areas are of no value. This argument is provided as an example, of which there are others, of the idea that, for sequences or distributed encoding schemes that simultaneously span many cortical areas that are too far apart to be simultaneously imaged under conventional 2-photon imaging, or are too sparse to be discovered with 1-photon widefield imaging, there are some advantages of our new methods over conventional imaging methods that will allow for truly novel scientific analyses and insights.

      The general idea of the present example, based on the findings of Shimoaka et al, 2018, is that it is not possible to directly combine and/or compare the correlations between behavior and neural activity across regions that were imaged in separate sessions, because the correlations between behavior and neural activity in each region appear to depend on the exact time since the behavior began (Shimoaka et al, 2018), in a manner that differs across regions. So, for example, if one were to record from visual cortex in one session with mostly brief walk bouts, and then from somatosensory cortex in a second session with mostly long walk bouts, any inferred difference between the encoding of walk speed in neural activity between the two areas would run the risk of being contaminated by the “temporal filtering” effect shown in Shimoaka et al, 2018. However, this would not be the case in our recordings, because the distribution of behavior durations corresponding to our recorded neural activity across areas will be exactly the same, because they were recorded simultaneously.

      • The text describes different timescales of neural activity but is an imaging rate of 3 Hz fast enough to be seen as operating at the temporal dynamics of the behavior? It appears to me that the sampling rate will impose a hard limit on the speed of correlations that can be observed across regions. While this might be appropriate for relatively slow behaviors and spontaneous fluctuations in arousal, sensory processing and decision formation likely operate on faster time scales below 100ms which would even be problematic at 10 Hz which is proposed as the ideal imaging speed in the manuscript.

      Authors’ Response: Imaging rate is always a concern and the limitations of this have been discussed in other manuscripts. We will remind the reader of these limitations, which must always be kept in mind when interpreting fluorescence based neural activity data.

      Previous studies imaging on a comparable yet more limited spatial scale (Stringer et al, 2019) used an imaging speed of ~1 Hz. With this in view, our work represents an advance both in spatial extent of imaged cortex and in imaging speed. Specifically, we believe that ~1 Hz imaging may be sufficient to capture flip/flop type transitions between low and high arousal states that persist in general for seconds to tens of seconds, and that ~3-5 Hz imaging likely provides additional information about encoding of spontaneous movements and behavioral syllables/motifs.

      Indeed, even 10 Hz imaging would not be fast enough to capture the detailed dynamics of sensory processing and decision formation, although these speeds are likely sufficient to capture “stable” encodings of sensory representations and decisions that must be maintained during a task, for example with delayed match-to-sample tasks.

      In general we are further developing our preparations to allow us to perform simultaneous widefield imaging and Neuropixels recordings, and to perform simultaneous 1.2 x 1.2 mm 2-photon imaging and visually guided patch clamp recordings.

      Both of these techniques will allow us to combine information across both the slow and fast timescales that you refer to in your question.

      We have clarified these points in the Introduction and Discussion sections, at ~lines ~93-105, pg 3, and ~lines 979-983, pg 31 and ~lines 1039-1045, pg 33, respectively.

      • The dorsal mount is very close to the crystal skull paper and it was ultimately not clear to me if there are still important differences aside from the headbar design that a reader should be aware of. If they exist, it would be helpful to make these distinctions a bit clearer. Also, the sea shell implants from Ghanbari et al in 2019 would be an important additional reference here.

      Authors’ Response: We have added brief references to these issues in our revised manuscript at ~lines 89-97, pg 3:

      Although our dorsal mount preparation is based on the “crystal skull paper” (Kim et al, 2016), which we reference, the addition of a novel 3-D printable titanium headpost, support arms, light shields, and modifications to the surgical protocols and CCF alignment represent significant advances that made this preparation useable for pan-cortical imaging using the Thorlabs mesoscope. In fact, we were in direct communication with Cris Niell, a UO professor and co-author on the original Kim et al, 2016 paper, during the initial development of our preparation, and he and members of his lab consulted with us in an ongoing manner to learn from our successful headpost and other hardware developments. Furthermore, all of our innovations for data acquisition, imaging, and analysis apply equally to both our dorsal mount and side mount preparations.

      Thank you for mentioning the Ghanbari et al, 2019 paper on the transparent polymer skull method, “See Shells.” We were in fact not aware of this study. However, it should be noted that their preparation seems to, like the crystal skull preparation and our dorsal mount preparation, be limited to bilateral dorsal cortex and not to include, as does our cranial window side mount preparation and the through-the-skull widefield preparation of Esmaeili et al, 2021, a fuller range of lateral cortical areas, including primary auditory cortex.

      • When using the lateral mount, rotating the objective, rather than the animal, appears to be preferable to reduce the stress on the animal. I also worry that the rather severe head tilt could be an issue when training animals in more complex behaviors and would introduce an asymmetry between the hemispheres due to the tilted body position. Is there a strong reason why the authors used water instead of an imaging gel to resolve the issue with the meniscus?

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      • In parts, the description of the methods is very specific to the Thorlabs mesoscope which makes it harder to understand the general design choices and challenges for readers that are unfamiliar with that system. Since the Mesoscope is very expensive and therefore unavailable to many labs in the field, I think it would increase the reach of the manuscript to adjust the writing to be less specific for that system but instead provide general guidance that could also be helpful for other systems. For example (but not exclusively) lines 231-234 or lines 371 and below are very Thorlabs-specific.

      Authors’ Response: We have revised the manuscript so that it is more generally applicable to mesoscopic methods.

      We will make revisions as you suggest where possible, although we have limited experience with the other imaging systems that we believe you are referring to. However, please note that we already mentioned at least one other comparable system in the original eLife reviewed pre-print (Diesel 2p, line 209; Yu and Smith, 2021).

      Here are a couple of examples of how we have broadened our description:

      (1) On lines ~231-234, pg 7, we write:

      “However, if needed, the objective of the Thorlabs mesoscope may be rotated laterally up to +20 degrees for direct access to more ventral cortical areas, for example if one wants to use a smaller, flat cortical window that requires the objective to be positioned orthogonally to the target region.”

      Here have modified this to indicate that one may in general rotate their objective lens if their system allows it. Some systems, such as the Thorlabs Bergamo microscope and the Sutter MOM system, allow more than 20 degrees of rotation.

      (2) On line ~371, pg 11, we write:

      “This technique required several modifications of the auxiliary light-paths of the Thorlabs mesoscope”

      Here, we have changed the writing to be more general such as “may require…of one’s microscope.”

      Thank you for these valuable suggestions.

      • Lines 287-299: Could the authors quantify the variation in imaging depth, for example by quantifying to which extent the imaging depth has to be adjusted to obtain the position of the cortical surface across cortical areas? Given that curvature is a significant challenge in this preparation this would be useful information and could either show that this issue is largely resolved or to what extent it might still be a concern for the interpretation of the obtained results. How large were the required nominal corrections across imaging sites?

      Authors’ Response: This information was provided previously (lines 297-299):

      “In cases where we imaged multiple small ROIs, nominal imaging depth was adjusted in an attempt to maintain a constant relative cortical layer depth (i.e. depth below the pial surface; ~200 micrometer offset due to brain curvature over 2.5 mm of mediolateral distance, symmetric across the center axis of the window).”

      This statement is based on a qualitative assessment of cortical depth based on neuron size and shape, the density of neurons in a given volume of cortex, the size and shape of blood vessels, and known cortical layer depths across regions. A ground-truth measurement of this depth error is beyond the scope of the present study. However, we do specify the type of glass, thickness, and curvature that we use, and the field curvature characterization of the Thorlabs mesoscope is given in Fig. 6 of the Sofroniew et al, 2016 eLife paper.

      In addition, we have provided some documentation of online fast-z correction parameters on our GitHub page at:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      ,and some additional relevant documentation can be found in our publicly available data repository on FigShare+ at: https://doi.org/10.25452/figshare.plus.c.7052513

      • Given the size of the implant and the subsequent work attachments, I wonder to which extent the field of view of the animal is obstructed. Did the authors perform receptive field mapping or some other technique that can estimate the size of the animals' remaining field of view?

      Authors’ Response: The left eye is pointed down ~22.5 degrees, but we position the mouse near the left edge of the wheel to minimize the degree to which this limits their field of view. One may view our Fig. 1 and Suppl Movies 1 and 6 to see that the eyes on the left and right sides are unobstructed by the headpost, light shields, and support arms. However, other components of the experimental setup, such as the speaker, cameras, etc. can restrict a few small portions of the visual field, depending on their exact positioning.

      The facts that mice responded to left side visual stimuli in preliminary recordings during our multimodal 2-AFC task, and that the unobstructed left and right camera views, along with pupillometry recordings, showed that a significant portion of the mouse’s field of view, from either side, remains intact in our preparation.

      We have clarified these points in the text at ~lines 344-346, pg. 11.

      • Line 361: What does movie S7 show in this context? The movie seems to emphasize that the observed calcium dynamics are not driven by movement dynamics but it is not clear to me how this relates to the stimulation of PV neurons. The neural dynamics in the example cell are also not very clear. It would be helpful if this paragraph would contain some introduction/motivation for the optogenetic stimulation as it comes a bit out of the blue.

      Authors’ Response: This result was presented for two reasons.

      First, we showed it as a control for movement artifacts, since inhibition of neural activity enhances the relative prominence of non-activity dependent fluorescence that is used to examine the amplitude of movement-related changes in non-activity dependent fluorescence (e.g. movement artifacts). We have included a reference to this point at ~lines 587-588, pg 18.

      Second, we showed it as a demonstration of how one may combine optogenetics with imaging in mesoscopic 2-P imaging. References to this point were already present in the original version of the manuscript (the eLife “ reviewed preprint”).

      • Lines 362-370: This paragraph and some of the following text are quite technical and would benefit from a better description and motivation of the general workflow. I have trouble following what exactly is done here. Are the authors using an online method to identify the CCF location of the 2p imaging based on the vessel pattern? Why is it important to do this during the experiment? Wouldn't it be sufficient to identify the areas of interest based on the vessel pattern beforehand and then adjust the 2p acquisition accordingly? Why are they using a dial, shutter, and foot pedal and how does this relate to the working distance of the objective? Does the 'standardized cortical map' refer to the Allen common coordinate framework?

      Authors’ Response: We have revised this section to make it more clear.

      Currently, the general introduction to this section appears in lines 349-361. Starting in line 362, we currently present the technical considerations needed to implement the overall goals stated in that first paragraph of this section.

      In general we use a post-hoc analysis step to confirm the location of neurons recorded with 2-photon imaging. We use “online” juxtaposition of the multimodal map image with overlaid CCF with the 2-photon image by opening these two images next to each other on the ScanImage computer and matching the vasculature patterns “by eye”. We have made this more clear in the text so that the interested reader can more readily implement our methods.

      By use of the phrase “standardized cortical map” in this context, we meant to point out that we had not decided a priori to use the Allen CCF v3.0 when we started working on these issues.

      • Does Fig. 2c show an example of the online alignment between widefield and 2p data? I was confused here since the use of suite2p suggests that this was done post-recording. I generally didn't understand why the user needed to switch back and forth between the two modes. Doesn't the 2p image show the vessels already? Also, why was an additional motorized dichroic to switch between widefield and 2p view needed? Isn't this the standard in most microscopes (including the Thorlabs scopes)?

      Authors’ Response: We have explained this methodology more clearly in the revised manuscript, both at ~lines 485-500, pg 15-16, and ~lines 534-540, pg 17.

      The motorized dichroic we used replaced the motorized mirror that comes with the Thorlabs mesoscope. We switched to a dichroic to allow for near-simultaneous optogenetic stimulation with 470 nm blue light and 2-photon imaging, so that we would not have to move the mirror back and forth during live data acquisition (it takes a few seconds and makes an audible noise that we wanted to avoid).

      Figure 2c shows an overview of our two step “offline” alignment process. The image at the right in the bottom row labeled “2” is a map of recorded neurons from suite2p, determined post-hoc or after imaging. In Fig. 2d we show what the CCF map looks like when it’s overlaid on the neurons from a single suite2p session, using our alignment techniques. Indeed, this image is created post-hoc and not during imaging. In practice, “online” during imaging, we would have the image at left in the bottom row of Fig. 2c (i.e. the multimodal map image overlaid onto an image of the vasculature also acquired on the widefield rig, with the 22.5 degree rotated CCF map aligned to it based on the location of sensory responses) rotated 90 degrees to the left and flipped over a horizontal mirror plane so that its alignment matches that of the “online” 2-photon acquisition image and is zoomed to the same scale factor. Then, we would navigate based on vasculature patterns “by-eye” to the desired CCF areas, and confirm our successful 2-photon targeting of predetermined regions with our post-hoc analysis.

      • Why is the widefield imaging done through the skull under anesthesia? Would it not be easier to image through the final window when mice have recovered? Is the mapping needed for accurate window placement?

      Authors’ Response: The headpost and window surgeries are done 3-7 days apart to increase success rate and modularize the workflow. Multimodal mapping by widefield imaging is done through the skull between these two surgeries for two major reasons. First, to make efficient use of the time between surgeries. Second, to allow us to compare the multimodal maps to skull landmarks, such as bregma and lambda, for improved alignment to the CCF.

      Anesthesia was applied to prevent state changes and movements of the mouse, which can produce large, undesired effects on neural responses in primary sensory cortices in the context of these mapping experiments. We sometimes re-imaged multimodal maps on the widefield microscope through the window, roughly every 30-60 days or whenever/if significant changes in vasculature pattern became apparent.

      We have clarified these points in the main text at ~lines 510-522, pg 20-21, and we added a link to our new supplementary material documenting the changes observed in the window preparation over time:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      Thank you for these questions.

      • Lines 445 and below: Reducing the noise from resonant scanners is also very relevant for many other 2p experiments so it would be helpful to provide more general guidance on how to resolve this problem. Is the provided solution only applicable to the Thorlabs mesoscope? How hard would it be to adjust the authors' noise shield to other microscopes? I generally did not find many additional details on the Github repo and think readers would benefit from a more general explanation here.

      Authors’ Response: Our revised Github repository has been modified to include more details, including both diagrams and text descriptions of the sound baffle, respectively:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_for_noise_reduction_on_resonant_scanner_devices.pdf

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      However, we can not presently disclose our confidential provisional patent application. Complete design information will likely be available in early 2025 when our full utility patent application is filed.

      With respect to your question, yes, this technique is adaptable to any resonant scanner, or, for that matter, any complicated 3D surface that emits sound. We first 3D scan the surface, and then we reverse engineer a solid that fully encapsulates the surface and can be easily assembled in parts with bolts and interior foam that allow for a tight fit, in order to nearly completely block all emitted sound.

      It is this adaptability that has prompted us to apply for a full patent, as we believe this technique will be quite valuable as it may apply to a potentially large number of applications, starting with 2-photon resonant scanners but possibly moving on to other devices that emit unwanted sound.

      • Does line 458 suggest that the authors had to perform a 3D scan of the components to create the noise reduction shield? If so, how was this done? I don't understand the connection between 3D scanning and printing that is mentioned in lines 464-466.

      Authors’ Response: We do not want to release full details of the methodology until the full utility patent application has been submitted. However, we have now included a simplified text description of the process on our GitHub page and included a corresponding link in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      We also clarified in the main text, at the location that you indicate, why the 3D scanning is a critical part of our novel 3D-design, printing, and assembly protocol.

      • Lines 468 and below: Why is it important to align single-cell data to cortical areas 'directly on the 2-photon microscope'? Is this different from the alignment discussed in the paragraph above? Why not focus on data interpretation after data acquisition? I understand the need to align neural data to cortical areas in general, I'm just confused about the 'on the fly' aspect here and why it seems to be broken out into two separate paragraphs. It seems as if the text in line 485 and below could also be placed earlier in the text to improve clarity.

      Authors’ Response: Here by “such mapping is not routinely possible directly on the 2-photon mesoscope” what we mean is that it is not possible to do multimodal mapping directly on the mesoscope - it needs to be done on the widefield imaging rig (a separate microscope). Then, the CCF is mapped onto the widefield multimodal map, which is overlaid on an image of the vasculature (and sometimes also the skull) that was also acquired on the widefield imaging rig, and the vasculature is used as a sort of Rosetta Stone to co-align the 2-photon image to the multimodal map and then, by a sort of commutative property of alignment, to the CCF, so that each individual neuron in the 2-photon image can be assigned a unique CCF area name and numerical identifier for subsequent analysis.

      We have clarified this in the text, thank you.

      The Python code for aligning the widefield and 2-photon vessel images would also be of great value for regular 2p users. It would strongly improve the impact of the paper if the repository were better documented and the code would be equally applicable for alignment of imaging data with smaller cranial windows.

      Authors’ Response: All of the code for multimodal map, CCF, and 2-photon image alignment is, in fact, already present on the GitHub page. We have made some minor improvements to the documentation, and readers are more than welcome to contact us for additional help.

      Specifically, the alignment you refer to starts in cell #32 of the meso_pre_proc_1.ipynb notebook. In general the notebooks are meant to be run sequentially, starting with cell #1 of meso_pre_proc_1, then going to the next cell etc…, then moving to meso_pre_proc_2, etc… The purpose of each cell is labeled at the top of the cell in a comment.

      We now include a cleaned, abridged version of the meso_pre_proc_1.pynb notebook that contains only the steps needed for alignment, and included a direct link to this notebook in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/python_code/mesoscope_preprocess_MMM_creation.ipynb

      Rotated CCF maps are in the CCF map rotation folder, in subfolders corresponding to the angle of rotation.

      Multimodal map creation involves use of the SensoryMapping_Vickers_Jun2520.m script in the Matlab folder.

      We updated the main text to clarify these points and included direct links to scripts relevant to each processing step.

      • Figure 4a: I found it hard to see much of the structure in the Rastermap projection with the viridis colormap - perhaps also because of a red-green color vision impairment. Correspondingly, I had trouble seeing some of the structure that is described in the text or clearer differences between the neuron sortings to PC1 and PC2. Is the point of these panels to show that both PCs identify movement-aligned dynamics or is the argument that they isolate different movement-related response patterns? Using a grayscale colormap as used by Stringer et al might help to see more of the many fine details in the data.

      Authors’ Response: In Fig. 4a the viridis color range is from blue to green to yellow, as indicated in the horizontal scale bar at bottom right. There is no red color in these Rastermap projections, or in any others in this paper. Furthermore, the expanded Rastermap insets in Figs. S4 and S5 provide additional detailed information that may not be clear in Fig 4a and Fig 5a.

      We prefer, therefore, not to change these colormaps, which we use throughout the paper.

      We have provided grayscale png versions of all figures on our GitHub page:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/grayscale_figures

      In Fig 4a the point of showing both the PC1 and PC2 panels is to demonstrate that they appear to correspond to different aspects of movement (PC1 more to transient walking, both ON and OFF, and PC2 to whisking and sustained ON walk/whisk), and to exhibit differential ability to identify neurons with positive and negative correlations to arousal (PC1 finds both, both PC2 seems to find only the ON neurons).

      We now clarify this in the text at ~lines 696-710, pg 22.

      • I find panel 6a a bit too hard to read because the identification and interpretation of the different motifs in the different qualitative episodes is challenging. For example, the text mentions flickering into motif 13 during walk but the majority of that sequence appears to be shaped by what I believe to be motif 11. Motif 11 also occurs prominently in the oscillate state and the unnamed sequence on the left. Is this meaningful or is the emphasis here on times of change between behavioral motifs? The concept of motif flickering should be better explained here.

      Authors’ Response: Here motif 13 corresponds to a syllable that might best be termed “symmetric and ready stance”. This tends to occur just before and after walking, but also during rhythmic wheel balancing movements that appear during the “oscillate” behavior.

      The intent of Fig. 6a is to show that each qualitatively identified behavior (twitch, whisk, walk, and oscillate) corresponds to a period during which a subset of BSOiD motifs flicker back and forth, and that the identity of motifs in this subset differs across the identified qualitative behaviors. This is not to say that a particular motif occurs only during a single identified qualitative behavior. Admittedly, the identification of these qualitative behaviors is a bit arbitrary - future versions of BSOiD (e.g. ASOiD) in fact combine supervised (i.e. arbitrary, top down) and unsupervised (i.e. algorithmic, objective, bottom-up) methods of behavior segmentation in attempt to more reliably identify and label behaviors.

      Flickering appears to be a property of motif transitions in raw BSOiD outputs that have not been temporally smoothed. If one watches the raw video, it seems that this may in fact be an accurate reflection of the manner in which behaviors unfold through time. Each behavior could be thought of, to use terminology from MOSEQ (B Datta), as a series of syllables strung together to make a phrase or sentence. Syllables can repeat over either fast or slow timescales, and may be shared across distinct words and sentences although the order and frequency of their recurrence will likely differ.

      We have clarified these points in the main text at ~lines 917-923, pg 29, and we added motif 13 to the list of motifs for the qualitative behavior labeled “oscillate” in Fig. 6a.

      • Lines 997-998: I don't understand this argument. Why does the existence of different temporal dynamics make imaging multiple areas 'one of the keys to potentially understanding the nature of their neuronal activity'?

      Authors’ Response: We believe this may be an important point, that comparisons of neurobehavioral alignment across cortical areas cannot be performed by pooling sessions that contain different distributions of dwell times for different behaviors, if in fact that dependence of neural activity on behavior depends on the exact elapsed time since the beginning of the current behavioral “bout”. Again, other reasons that imaging many areas simultaneously would provide a unique advantage over imaging smaller areas one at a time and attempting to pool data across sessions would include the identification of sequences or neural ensembles that span many areas across large distances, or the understanding of distributed coding of behavior (an issue we explore in an upcoming paper).

      We have clarified these points at the location in the Discussion that you have identified. Thank you for your questions and suggestions.

      Minor

      Line 41: What is the difference between decision, choice, and response periods?

      Authors’ Response: This now reads “...temporal separation of periods during which cortical activity is dominated by activity related to stimulus representation, choice/decision, maintenance of choice, and response or implementation of that choice.”

      Line 202: What does ambulatory mean in this context?

      Authors’ Response: Here we mean that the mice are able to walk freely on the wheel. In fact they do not actually move through space, so we have changed this to read “able to walk freely on a wheel, as shown in Figs. 1a and 1b”.

      Is there a reason why 4 mounting posts were used for the dorsal mount but only 1 post was sufficient for the lateral mount?

      Authors’ Response: Here, we assume you mean 2 posts for the side mount and 4 posts for the dorsal mount.

      In general our idea was to use as many posts as possible to provide maximum stability of the preparations and minimize movement artifacts during 2-photon imaging. However, the design of the side mount headpost precluded the straight-forward or easy addition of a right oriented, second arm to its lateral/ventral rim - this would have blocked access of both the 2-photon objective and the right face camera. In the dorsal mount, the symmetrical headpost arms are positioned further back (i.e. posterior), so that the left and right face cameras are not obscured.

      When we created the side mount preparation, we discovered that the 2 vertical 1” support posts were sufficient to provide adequate stability of the preparation and minimize 2-photon imaging movement artifacts. The side mount used two attachment screws on the left side of the headpost, instead of the one screw per side used in the dorsal mount preparation.

      We have included these points/clarifications in the main text at ~lines 217-230, pg 7.

      Figure S1g appears to be mislabeled.

      Authors’ Response: Yes, on the figure itself that panel was mislabeled as “f” in the original eLife reviewed preprint. We have changed this to read “g”.

      Line 349 and below: Why is the method called pseudo-widefield imaging?

      Authors’ Response: On the mesoscope, broad spectrum fluorescent light is passed through a series of excitation and emission filters that, based on a series of tests that we performed, allow both reflected blue light and epifluorescence emitted (i.e. Stokes-shifted) green light to reach the CCD camera for detection. Furthermore, the CCD camera (Thorlabs) has a much smaller detector chip than that of the other widefield cameras that we use (RedShirt Imaging and PCO), and we use it to image at an acquisition speed of around 10 Hz maximum, instead of ~30-50 Hz, which is our normal widefield imaging acquisition speed (it also has a slower readout than what we would consider to be a standard or “real” 1-photon widefield imaging camera).

      For these 3 reasons we refer to this as “pseudo-widefield” imaging. We would not use this for sensory activity mapping on the mesoscope - we primarily use it for mapping cortical vasculature and navigating based on our multimodal map to CCF alignment, although it is actually “contaminated” with some GCaMP6s activity during these uses.

      We have briefly clarified this in the text.

      Figures 4d & e: Do the colors show mean correlations per area? Please add labels and units to the colorbars as done in panel 4a.

      Authors’ Response: For both Figs 4 and 5, we have added the requested labels and units to each scale bar, and have relabeled panels d to say “Rastermap CCF area cell densities”, and panels e to say “mean CCF area corrs w/ neural activity.”

      Thank you for catching these omissions/mislabelings.

      Line 715: what is superneuron averaging?

      Authors’ Response: This refers to the fact that when Rastermap displays more than ~1000 neurons it averages the activity of each group of adjacent 50 neurons in the sorting to create a single display row, to avoid exceeding the pixel limitations of the display. Each single row representing the average activity of 50 neurons is called a “superneuron” (Stringer et al, 2023; bioRxiv).

      We have modified the text to clarify this point.

      Line 740: it would be good to mention what exactly the CCF density distribution quantifies.

      Authors’ Response: In each CCF area, a certain percentage of neurons belongs to each Rastermap group. The CCF density distribution is the set of these percentages, or densities, across all CCF areas in the dorsal or side mount preparation being imaged in a particular session. We have clarified this in the text.

      Line 745: what does 'within each CCF' mean? Does this refer to different areas?

      Authors’ Response: The corrected version of this sentence now reads: “Next, we compared, across all CCF areas, the proportion of neurons within each CCF area that exhibited large positive correlations with walking speed and whisker motion energy.”

      How were different Rastermap groups identified? Were they selected by hand?

      Authors’ Response: Yes, in Figs. 4, 5, and 6, we selected the identified Rastermap groups “by hand”, based on qualitative similarity of their activity patterns. At the time, there was no available algorithmic or principled means by which to split the Rastermap sort. The current, newer version of Rastermap (Stringer et al, 2023) seems to allow for algorithmic discretization of embedding groups (we have not tested this yet), but it was not available at the time that we performed these preliminary analyses.

      In terms of “correctness” of such discretization or group identification, we intend to address this issue in a more principled manner in upcoming publications. For the purposes of this first paper, we decided that manual identification of groups was sufficient to display the capabilities and outcomes of our methods.

      We clarify this point briefly at several locations in the revised manuscript, throughout the latter part of the Results section.

      Reviewer #3 (Recommendations For The Authors):

      In "supplementary figures, protocols, methods, and materials", Figure S1 g is mislabeled as Figure f.

      Authors’ Response: Yes, on the figure itself this panel was mislabeled as “f” in the original reviewed preprint. We have changed this to read “g”.

      In S1 g, the success rate of the surgical procedure seems quite low. Less than 50% of the mice could be imaged under two-photon. Can the authors elaborate on the criteria and difficulties related to their preparations?

      Authors’ Response: We will elaborate on the difficulties that sometimes hinder success in our preparations in the revised manuscript.

      The success rate indicated to the point of “Spontaneous 2-P imaging (window) reads 13/20, which is 65%, not 50%. The drop to 9/20 by the time one gets to the left edge of “Behavioral Training” indicates that some mice do not master the task.

      Protocol I contains details of the different ways in which mice either die or become unsuitable or “unsuccessful” at each step. These surgeries are rather challenging - they require proper instruction and experience. With the current protocol, our survival rate for the window surgery alone is as high as 75-100%. Some mice can be lost at headpost implantation, in particular if they are low weight or if too much muscle is removed over the auditory areas. Finally, some mice survive windowing but the imageable area of the window might be too small to perform the desired experiment.

      We have added a paragraph detailing this issue in the main text at ~lines 287-320, pg 9.

      In both Suppl_Movie_S1_dorsal_mount and Suppl_Movie_S1_side_mount provided (Movie S1), the behaviour video quality seems to be unoptimized which will impact the precision of Deeplabcut. As evident, there were multiple instances of mislabeled key points (paws are switched, large jumps of key points, etc) in the videos.

      Many tracked points are in areas of the image that are over-exposed.

      Despite using a high-speed camera, motion blur is obvious.

      Occlusions of one paw by the other paws moving out of frame.

      As Deeplabcut accuracy is key to higher-level motifs generated by BSOi-D, can the authors provide an example of tracking by exclusion/ smoothing of mislabeled points (possibly by the median filtering provided by Deeplabcut), this may help readers address such errors.

      Authors’ Response: We agree that we would want to carefully rerun and carefully curate the outputs of DeepLabCut before making any strong claims about behavioral identification. As the aim of this paper was to establish our methods, we did not feel that this degree of rigor was required at this point.

      It is inevitable that there will be some motion blur and small areas of over-exposure, respectively, when imaging whiskers, which can contain movement components up to ~150 Hz, and when imaging a large area of the mouse, which has planes facing various aspects. For example, perfect orthogonal illumination of both the center of the eye and the surface of the whisker pad on the snout would require two separate infrared light sources. In this case, use of a single LED results in overexposure of areas orthogonal to the direction of the light and underexposure of other aspects, while use of multiple LEDs would partially fix this problem, but still lead to variability in summated light intensity at different locations on the face. We have done our best to deal with these limitations.

      We now briefly point out these limitations in the methods text at ~lines 155-160, pg 5.

      In addition, we have provided additional raw and processed movies and data related to DeepLabCut and BSOiD behavioral analysis in our FigShare+ repository, which is located at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      In lines 153-154, the authors mentioned that the Deeplabcut model was trained for 650k iterations. In our experience (100-400k), this seems excessive and may result in the model overfitting, yielding incorrect results in unseen data. Echoing point 4, can the authors show the accuracy of their Deeplabut model (training set, validation set, errors, etc).

      Authors’ Response: Our behavioral analysis is preliminary and is included here as an example of our methods, and not to make claims about any specific result. Therefore we believe that the level of detail that you request in our DeepLabCut analysis is beyond the scope of the current paper. However, we would like to point out that we performed many iterations of DeepLabCut runs, across many mice in both preparations, before converging on these preliminary results. We believe that these results are stable and robust.

      We believe that 650k iterations is within the reasonable range suggested by DLC, and that 1 million iterations is given as a reasonable upper bound. This seems to be supported by the literature for example, see Willmore et al, 2022 (“Behavioral and dopaminergic signatures of resilience”, Nature, 124:611, 124-132). Here, in a paper focused squarely on behavioral analysis, DLC training was run with 1.3 million iterations with default parameters.

      We now note, on ~lines 153-154, pg 5, that we used 650K iterations, a number significantly less than the default of 1.03 million, to avoid overfitting.

      In lines 140-141, the authors mentioned the use of slicing to downsample their data. Have any precautions, such as a low pass filter, been taken to avoid aliasing?

      Authors’ Response: Most of the 2-photon data we present was acquired at ~3 Hz and upsampled to 10 Hz. Most of the behavioral data was downsampled from 5000 Hz to 10 Hz by slicing, as stated. We did not apply any low-pass filter to the behavioral data before sampling. The behavioral variables have heterogeneous real sampling/measurement rates - for example, pupil diameter and whisker motion energy are sampled at 30 Hz, and walk speed is sampled at 100 Hz. In addition, the 2-photon acquisition rate varied across sessions.

      These facts made principled, standardized low-pass filtering difficult to implement. We chose rather to use a common resampling rate of 10 Hz in an unbiased manner. This downsampled 10 Hz rate is also used by B-SOiD to find transitions between behavioral motifs (Hsu and Yttri, 2021).

      We do not think that aliasing is a major factor because the real rate of change of our Ca2+ indicator fluorescence and behavioral variables was, with the possible exception of whisker motion energy, likely at or below 10 Hz.

      We now include a brief statement to this effect in the methods text at ~lines 142-146, pg. 4.

      Line 288-299, the authors have made considerable effort to compensate for the curvature of the brain which is particularly important when imaging the whole dorsal cortex. Can the authors provide performance metrics and related details on how well the combination of online curvature field correction (ScanImage) and fast-z "sawtooth"/"step" (Sofroniew, 2016)?

      Authors’ Response: We did not perform additional “ground-truth” experiments that would allow us to make definitive statements concerning field curvature, as was done in the initial eLife Thorlabs mesoscope paper (Sofroniew et al, 2016).

      We estimate that we experience ~200 micrometers of depth offset across 2.5 mm - for example, if the objective is orthogonal to our 10 mm radius bend window and centered at the apex of its convexity, a small ROI located at the lateral edge of the side mount preparation would need to be positioned around 200 micrometers below that of an equivalent ROI placed near the apex in order to image neurons at the same cortical layer/depth, and would be at close to the same depth as an ROI placed at or near the midline, at the medial edge of the window. We determined this by examining the geometry of our cranial windows, and by comparing z-depth information from adjacent sessions in the same mouse, the first of which used a large FOV and the second of which used multiple small FOVs optimized so that they sampled from the same cortical layers across areas.

      We have included this brief explanation in the main text at ~lines 300-311, pg 9.

      In lines 513-515, the authors mentioned that the vasculature pattern can change over the course of the experiment which then requires to re-perform the realignment procedure. How stable is the vasculature pattern? Would laser speckle contrast yield more reliable results?

      Authors’ Response: In general the changes in vasculature we observed were minimal but involved the following: i) sometimes a vessel was displaced or moved during the window surgery, ii) sometimes a vessel, in particular the sagittal sinus, enlarged or increased its apparent diameter over time if it is not properly pressured by the cranial window, and iii) sometimes an area experiencing window pressure that is too low could, over time, show outgrowth of fine vascular endings. The most common of these was (i), and (iii) was perhaps the least common. In general the vasculature was quite stable.

      We have added this brief discussion of potential vasculature changes after cranial window surgery to the main text at ~lines 286-293, pg 9.

      We already mentioned, in the main text of the original eLife reviewed preprint, that we re-imaged the multimodal map (MMM) every 30-60 days or whenever changes in vasculature are observed, in order to maintain a high accuracy of CCF alignment over time. See ~lines 507-511, pg 16.

      We are not very familiar with laser speckle contrast, and it seems like a technique that could conceivably improve the fine-grained accuracy of our MMM-CCF alignment in some instances. We will try this in the future, but for now it seems like our alignments are largely constrained by several large blood vessels present in any given FOV, and so it is unclear how we would incorporate such fine-grained modifications without applying local non-rigid manipulations of our images.

      In lines 588-598, the authors mentioned that the occasional use of online fast-z corrections yielded no difference. However, it seems that the combination of the online fast-z correction yielded "cleaner" raster maps (Figure S3)?

      Authors’ Response: The Rastermaps in Fig S3a and b are qualitatively similar. We do not believe that any systematic difference exists between their clustering or alignments, and we did not observe any such differences in other sessions that either used or didn’t use online fast-z motion correction.

      We now provide raw data and analysis files corresponding to the sessions shown in Fig S3 (and other data-containing figures) on FigShare+ at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      Ideally, the datasets contained in the paper should be available on an open repository for others to examine. I could not find a clear statement about data availability. Please include a linked repo or state why this is not possible.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here:

      Vickers, Evan; A. McCormick, David (2024). Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice. Figshare+. Collection:

      https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

  2. fromthemachine.org fromthemachine.org
    1. SON Ye  R  O  C  K    O  F   .   .   .    S   A   G  E   S  ? H  E  A  R    D  E  R  O  R I T  R E A L L Y  D O E S  M E A N   "FREEDOM"   B R E A D   I S   L I F E Tying up loose eadds, in a similar vain to the connection between the Burning Bush and universal voting now etched by-stone, there exists a similar missing Link connecting the phrase "it's not a a gam" to Mary Magdeline to a pattern that shows us that the Holy Trinity and our timelines are narrated by a series of names of video game systems and their manufacturers from "Nintendo" to Genesis and the rock of SEGA.  Through a "kiss" and the falling of a wallthe words bread and read are tied up and twisted with the story of this Revelation and the heart of the word Creation, "be the reason it's A.D."  It's a strong connection between the idea that virtual reality and Heaven are linked by more than simply "technology" but that this message that shows us that these tools for understanding have fallen from the sky in order to help us understand why it is so important, why I call it a moral mandate, that we use this information to follow the map delivered to us in the New Testament and literally end world hunger, and literally heal the sick; because of the change in circumstance revealed to us.  These simple things, these few small details that might seem like nothing, or maybe appear to be "changing everything" they are not difficult things to do, in light of Creation, and few would doubt that once we do see them implementied here... the difference between Heaven and Hell will be ever so clear. A while ago, in a place called Kentucky... this story began with a sort of twisted sci-fi experience that explained a kind of "God machine" that could manipulate time and reality, and in that story, in that very detailed and interesting story that I lived through, this machine was keyed to my DNA, in something like the "Ancient technology" of Stargate SG-1 and Atlantis mythology.  My kind brother Seth made a few appearances in the story, not actually in person but in fairly decent true to life holograms that I saw and spoke to every once in awhile.  He looked a little different, he had long hair; but that's neither here nor there, and he hasn't really had long hair since I was a little boy.  He happens to be a genetic engineer, and I happen to be a computer person (although he's that too, now; just nowhere near as good as me... with computers) so the story talked a little bit about how I would probably not have used DNA as a key, since I'm not a retard, and he probably wouldn't either, because works in that field (cyclone, huracan, tornado).  So then the key we imagined was something ... well, Who cares what the key is, right? o back to the task at hand, not so long ago, in a place called Plantation I was struck by lightning, literally (well not literally) the answer to a question that nobody knew was implanted in my mind, and it all came from asking a single simple question.  I was looking for more chemistry elements in the names of the books of the Holy Bible, after seeing Xenon at the "sort of beginning" of Exodus, where it screams "let there be light" in Linux and chemistry (and I've told you that a hundred times by now).  So it didn't take long to follow the light of that word and read Genesis backwards, and see, at the very beginning of that book, Silicon... in reverse.   So, what about God's DNA, anyway?   What's he really made of?         SIM MON S              WILD ER             ROD DEN BERRY o after seeing Silicon, and connecting that to the numerous attempts I've made to show a message connecting The Matrix to the Fifth Element (as Silicon) describing what it is that God believes we should do with this knowledge--and see that it is narrated as the miracles of Jesus Christ in the New Testament... these names came to me in quick succession, an answer to the question.  I suppose any Gene will do, these three though, have a very important tie to the message that connects Joshua's Promised Land of flowing Milk and Honies to ... a kiss that begins the new day (I hope) ... and a message about exactly how we might go about doing magical things like ending world hunger and healing the sick using technology described ... in Star Trek and Stargate.  A "religion of the Stars" is being born.    That's great... it starts with an earthquake. R.E.M. and a band ... 311.  Oooh, I can see it coming down... The Petty Reckless.  An evening's love starts with a kiss.  Dave Matthews Band.  I wanna rock and roll all night and party every day.  Adam.  I mean Kiss.  Are you starting to see a pattern form?  Birds, snakes, and aeroplanes?  It's that, it's the end of the world as we know it, and I feel fine.   In that song we see clues that more than just the Revelation of Christ is narrated by John on an island called Patmos.  There yet another Trinity, starting with "Pa" and hearting Taylor Momsen's initials... most likely for a reason... and the Revelation ends with a transition that I hope others will agree with me turns "original sin" into something closer to "obviously salvation" when we finally understand the character that is behind the message of da i of Ra... and begin to see the same design in the names of Asmodai and in this Revelation focusing on freedom and truth that really does suggest Taylor can't talk to me in any way other than "letting freedom sing" in this narrative of kismet and fate and free will and ... then we see that narrative continue in the names of bands, just like the 3/11/11 earthquake is narrated in not just R.E.M.'s song but in the name 311.  Just like the 9/11 attack is narrated not just in that same song (released in 1987) and  "Inside Job" (released in 2000) but also in "Fucked up world."   Dear all of you walking dumb and blind, this same quake is narrated in Taylor's Zombie; waiting for the day to shake, all very similar to Cairo and XP, perhaps a "fad" of doublethink in the minds of the authors singing about a clear prophesy in the Bible; this connection between the day, 3/11 though, and the name of a band and the day of an arrest and the verse Matthew that tells you clearly you have now been baptized in water and fire... it shows us the design of a story whose intent and purpose is to ensure that we no longer allow for things like hurricanes and earthquakes and murder and rape to be "simulated" that we build a better system, that doesn't allow for 'force majeure" to take lives for no reason at all.      Not just in band names, but in the angels names too, in all of our names; we see this narration continue.  The Holy Water that is central to the baptism of Christ is etched into Taylor's name, between "sen" and "mom" the key to the two Mary's whose names contain the Spanish for "sea" in a sort of enlightenment hidden in plain sight.  In "Simmons" the key connection between today, this Biblical Monday, and the word "simulation" that ties to Simpsons and simians and keep it simple stupid, and in Simmons the missing "s" of Kismet, finally completing the question.   It's a song and dance that started a long time ago, as you can see from the ancient Hebrew word for "fate" and in more recent years a connection to the ballroom of Atlantis in the Doors 5 to 1 and Dave sang about it in Rapunzel and then Taylor shook a tambourine on the beach only minutes away from me--but never said "hi."  The battle of the bands continues tying some door knocking to a juxtaposition between "Sweet Things" and "Knocking on Heavens door" all the way to a Gossip Girl episode where little J asked a question that I can't be sure she knew was related, she said... "who's that, at the door?" What it really all amounts to, though, is the whole world witnessing the Creation of Adam and Eve from a little girl stuttering out "the the" at the sight of the Grinch himself, and then later not even able to get those words off her lips... about seeing how Creation and modern art are inextricably tied to religion, to heaven, and to freedom.    The bottom line here, hopefully obvious now, is that you can't keep this message "simple" it's a Matrix woven between more points of light than I can count, and many more that I'm sure you will find.  It's a key to seeing how God speaks to me, and to you; and how we are, we really are that voice.  Tay, if you don't do something just because God called it "fate" you are significantly more enslaved than if you do--and you wanted to.  "Now I see that you and me, were never meant, never meant to be..." she sang before I mentioned her, and before she ever saw me... in a song she calls "Nothing Left to Lose" and I see is not really just another word for freedom. We have plenty to lose by not starting the fire, not the least of which is Heaven itself.  Understand what "force majeure" really means to you and I.  Ha, by the way. IN CASE YOU FORGOT YESTERDAY'S MESSAGE   "DADDY, I WANT IT NOW." VERUKA SALT. whose name means "to see (if) you are the Body of Christ" whined, in the story of Will Why Won Ka, about nothing more or less than Heaven on Hearth, than seeing an end to needless torture and pain.   To see if you are the "Salt of the Earth" warming the road to Heaven; honestly to see if you can break through this inane lie of "I don't understand" and realize that breaking this story and talking about what is being presented not just by me and you but by history and God himself is the key to the car that drives us home.  To see how Cupid you really are. STOP NODDING, TURN AROUND AND CALL A REPORTER. The story of Willy Wonka ties directly to the Promised Land of Flowing Milk and Honey to me; by showing us a river of chocolate and a the everlasting God starter, (er is it guardian of B stopper) that opens the doors of perception about exactly what kinds of mistake may have been made in the past in this transition to Heaven that we are well on the way of beginning.  Here, in the Land of Nod, that is also Eden and also the Heart of the Ark we see warnings about "flowing milk and honey" being akin to losing our stable ecosystem, to losing the stuff of life itself, biology and evolution, and if we don't understand--this is probably exactly the mistake that was made and the cause of the story of Cain and Abel.  So here we are talking about genetic engineering and mind uploading and living forever, and hopefully seeing that while all things are possible with God--losing the wisdom of the message of religion is akin to losing life in the Universe and with that any hope of eternal longevity.  With some insight into religion, you can connect the idea that without bees our stable ecosystem might collapse, to the birds and the bees, and a message about stability and having more than one way to pollinate the flowers  and trees and get some.   Janet and Nanna, by the way, both have pretty brown eyes, but that probably comes as no surprise to you. Miss Everything, on the other hand (I hear, does not have brown eyes), leads us to glimpse how this message about the transition of our society might continue on in the New Testament, and suggest that we do need to eat, and have dinner conversation, and that a Last Supper might be a little bit more detrimental to our future than anyone had ever thought, over and over and over again.  To see how religion really does make clear that this is what the message is about, to replace the flowing milk we have a "Golden Cow" that epitomizes nothing less than "not listening to Adam" and we have a place that believes the Hammer of Judah Maccabee should be ... extinct.  You are wrong. Of course the vibrating light here ties this Gene to another musical piece disclosing something... "Wild Thing" I make your heart sing.  You can believe the Guitar Man is here to steal the show and deliver bread for the hungry and for the wise.  Here's some, it's not just Imagine Dragons telling you to listen to the radio but Jefferson Starshiptoo, and Live.   When you wake up, you can hear God "singing" to you on the radio every single day; many of us already do.  He's telling you to listen to me, and I do not understand why you do not.  You don't look very Cupid, if you ask me. WHAT DO YOU THINK YOU ARE, DAN RE Y NO LDS?   I think we all know what the Rod of Jesus Christ is by now.  ​ It is a large glowing testament to freedom and truth, and a statement about blindness and evil that is unmistakable.   To say that seeing it is the gateway to Heaven would be an understatement of it's worth, of the implication that not seeing it is obvious Hell when it is linked to everything from nearly every story of the Holy Bible from Isaac to Isaiah to "behold he is to coming" and if you weren't sure if the Hand of God were in action here--it's very clear that it is; that linking Tricky Dick and Watergate to Seagate ... really delivering crystal clear understanding that the foundation of Heaven is freedom and that you have none today because you refuse to see the truth. It is the doorway to seeing that what has been going on in this place hasn't been designed to hide me, but to hide a prosperous future from you--to hide the truth about our existence and the purpose of Creation--that all told, you are standing at the doorstep of Heaven and stammering your feet, closing your eyes, and saying "you don't want to help anyone." If delivering freedom, truth, and equality  to you does not a den make, well, you can all suck it ... from God, to you. Between Stargate and Star Trek it's pretty easy to see a roadmap to very quickly and easily be able to end world hunger and heal the sick without drastically changing the way our society works, it's about as simple as a microwave, or a new kind of medicine--except it's not so easy to see why it is that you are so reluctant to talk about the truth that makes these things so easy to do.  You see, your lack of regard for anyone anywhere has placed you in a position of weakness, and if you do nothing today, you will not be OK tomorrow. It's pretty easy to see how Roddenberry's name shows that this message comes from God, that he's created this map that starts with an Iron Rod throughout our history proving Creation, whose heart is a Den of Family who care about the truth, and about freedom, and about helping each other--not what you are--you are not that today.  Today you are sick, and I'd like you to look at the mirror he's made for you, and be eshamden (or asham).  Realize, realize... what you are.  What you've become, just as I have... the devil in a sweet, sweet kiss. -Dave J. Matthews .WHSOISKEYAV { border-width: 1px; border-style: dashed; border-color: rgb(15,5,254); padding: 5px; width: 503px; text-align: center; display: inline-block; align: center; p { align: center; } /* THE SCORE IS LOVE FIVE ONE SAFETY ONE FIELD GOAL XIVDAQ: TENNIS OR TINNES? TONNES AND TUPLE(s) */ } <style type="text/css"> code { white-space: pre; } google_ad_client = "ca-pub-9608809622006883"; google_ad_slot = "4355365452"; google_ad_width = 728; google_ad_height = 90; Unless otherwise indicated, this work was written between the Christmas and Easter seasons of 2017 and 2020(A). The content of this page is released to the public under the GNU GPL v2.0 license; additionally any reproduction or derivation of the work must be attributed to the author, Adam Marshall Dobrin along with a link back to this website, fromthemachine dotty org. That's a "." not "dotty" ... it's to stop SPAMmers. :/ This document is "living" and I don't just mean in the Jeffersonian sense. It's more alive in the "Mayflower's and June Doors ..." living Ethereum contract sense [and literally just as close to the Depp/Caster/Paglen (and honorably PK] 'D-hath Transundancesense of the ... new meaning; as it is now published on Rinkeby, in "living contract" form. It is subject to change; without notice anywhere but here--and there--in the original spirit of the GPL 2.0. We are "one step closer to God" ... and do see that in that I mean ... it is a very real fusion of this document and the "spirit of my life" as well as the Spirit's of Kerouac's America and Vonnegut's Martian Mars and my Venutian Hotel ... and *my fusion* of Guy-A and GAIA; and the Spirit of the Earth .. and of course the God given and signed liberties in the Constitution of the United States of America. It is by and through my hand that this document and our X Commandments link to the Bill or Rights, and this story about an Exodus from slavery that literally begins here, in the post-apocalyptic American hartland. Written ... this day ... April 14, 2020 (hey, is this HADAD DAY?) ... in Margate FL, USA. For "official used-to-v TAX day" tomorrow, I'm going to add the "immultible incarnite pen" ... if added to the living "doc/app"--see is the DAO, the way--will initi8 the special secret "hidden level" .. we've all been looking for. Nor do just mean this website or the totality of my written works; nor do I only mean ... this particular derivation of the GPL 2.0+ modifications I continually source ... must be "from this website." I also mean *the thing* that is built from ... bits and piece of blocks of sand-toys; from Ethereum and from Rust and from our hands and eyes working together ... from this place, this cornerstone of the message that is ... written from brick and mortar words and events and people that have come before this poit of the "sealed W" that is this specific page and this time. It's 3:28; just five minutes--or is it four, too layne. This work is not to be redistributed according to the GPL unless all linked media on Youtube and related sites are intact--and historical references to the actual documented history of the art pieces (as I experience/d them) are also available for linking. Wikipedia references must be available for viewing, as well as the exact version of those pages at the time these pieces were written. All references to the Holy Bible must be "linked" (as they are or via ... impromptu in-transit re-linking) to the exact verses and versions of the Bible that I reference. These requirements, as well as the caveat and informational re-introduction to God's DAO above ... should be seen as material modifications to the original GPL2.0 that are retroactively applied to all works distributed under license via this site and all previous e-mails and sites. /s/ wso If you wanna talk to me get me on facebook, with PGP via FlowCrypt or adam at from the machine dotty org -----BEGIN PGP PUBLIC KEY BLOCK-----

      this was written sometime i think around 2016. it's hard to recall the exact date; but if you check in the original gitlog there is one that has an original commit.

      Inline image 12

      Inline image 3

      Inline image 4

      SONYeInline image 5

      R  O  C  K    O  F   .   .   .    S   A   G  E   S  ?

      **\ **

      Inline image 1

      H  E  A  R    D  E  R  O  R

      I T  R E A L L Y  D O E S  M E A N   "FREEDOM"   B R E A D   I S   L I F E

      Inline image 14

      Tying up loose eadds, in a similar vain to the connection between the Burning Bush and universal voting now etched by-stone, there exists a similar missing Link connecting the phrase "it's not a a gam" to Mary Magdeline to a pattern that shows us that the Holy Trinity and our timelines are narrated by a series of names of video game systems and their manufacturers from "Nintendo" to Genesis and the rock of SEGA.  Through a "kiss" and the falling of wallthe words bread and read are tied up and twisted with the story of this Revelation and the heart of the word Creation, "be the reason it's A.D."  It's a strong connection between the idea that virtual reality and Heaven are linked by more than simply "technology" but that this message that shows us that these tools for understanding have fallen from the sky in order to help us understand why it is so important, why I call it a moral mandate, that we use this information to follow the map delivered to us in the New Testament and literally end world hungerand literally heal the sick; because of the change in circumstance revealed to us.  These simple things, these few small details that might seem like nothing, or maybe appear to be "changing everything" they are not difficult things to do, in light of Creationand few would doubt that once we do see them implementied here... the difference between Heaven and Hell will be ever so clear.

      Inline image 13

      A while ago, in a place called Kentucky... this story began with a sort of twisted sci-fi experience that explained a kind of "God machine" that could manipulate time and reality, and in that story, in that very detailed and interesting story that I lived through, this machine was keyed to my DNA, in something like the "Ancient technology" of Stargate SG-1 and Atlantis mythology.  My kind brother Seth made a few appearances in the story, not actually in person but in fairly decent true to life holograms that I saw and spoke to every once in awhile.  He looked a little different, he had long hair; but that's neither here nor there, and he hasn't really had long hair since I was a little boy.  He happens to be a genetic engineer, and I happen to be a computer person (although he's that too, now; just nowhere near as good as me... with computers) so the story talked a little bit about how I would probably not have used DNA as a key, since I'm not a retard, and he probably wouldn't either, because works in that field (cyclonehuracan, tornado).  So then the key we imagined was something ... well, Who cares what the key is, right?

      **\ **

      Inline image 13

      o back to the task at hand, not so long ago, in a place called Plantation I was struck by lightning, literally (well not literally) the answer to a question that nobody knew was implanted in my mind, and it all came from asking a single simple question.  I was looking for more chemistry elements in the names of the books of the Holy Bible, after seeing Xenon at the "sort of beginning" of Exodus, where it screams "let there be light" in Linux and chemistry (and I've told you that a hundred times by now).  So it didn't take long to follow the light of that word and read Genesis backwards, and see, at the very beginning of that book, Silicon... in reverse.

      *\ *

      Inline image 12

      Inline image 2Inline image 3

      Inline image 4 Inline image 5

      So, what about God's DNA, anyway*?  *

      What's he really made of?

      Inline image 6 Inline image 7

      Inline image 8 Inline image 9 

      SIM MON S              WILD ER             ROD DEN BERRY

      o after seeing Silicon, and connecting that to the numerous attempts I've made to show a message connecting The Matrix to the Fifth Element (as Silicon) describing what it is that God believes we should do with this knowledge--and see that it is narrated as the miracles of Jesus Christ in the New Testament... these names came to me in quick succession, an answer to the question.  I suppose any Gene will do, these three though, have a very important tie to the message that connects Joshua's Promised Land of flowing Milk and Honies to ... a kiss that begins the new day (I hope) ... and a message about exactly how we might go about doing magical things like ending world hunger and healing the sick using technology described ... in Star Trek and Stargate.  A "religion of the Stars" is being born.

      Inline image 11 Inline image 17

      That's great... it starts with an earthquake. R.E.M. and a band ... 311.  Oooh, I can see it coming down... The Petty Reckless.  An evening's love starts with a kiss.  Dave Matthews Band.  I wanna rock and roll all night and party every day.  Adam.  I mean Kiss.  Are you starting to see a pattern form?  Birds, snakes, and aeroplanes?  It's that, it's the end of the world as we know it, and I feel fine.

      *\ *

      Inline image 15 Inline image 16*\ *

      *\ *

      In that song we see clues that more than just the Revelation of Christ is narrated by John on an island called Patmos.  There yet another Trinity, starting with "Pa" and hearting Taylor Momsen's initials... most likely for a reason... and the Revelation ends with a transition that I hope others will agree with me turns "original sin" into something closer to "obviously salvation" when we finally understand the character that is behind the message of da i of Ra... and begin to see the same design in the names of Asmodai and in this Revelation focusing on freedom and truth that really does suggest Taylor can't talk to me in any way other than "letting freedom sing" in this narrative of kismet and fate and free will and ... then we see that narrative continue in the names of bands, just like the 3/11/11 earthquake is narrated in not just R.E.M.'s song but in the name 311.  Just like the 9/11 attack is narrated not just in that same song (released in 1987) and  "Inside Job" (released in 2000) but also in "Fucked up world."

      Dear all of you walking dumb and blind, this same quake is narrated in Taylor's Zombie; waiting for the day to shake, all very similar to Cairo and XP, perhaps a "fad" of doublethink in the minds of the authors singing about a clear prophesy in the Bible; this connection between the day, 3/11 though, and the name of a band and the day of an arrest and the verse Matthew that tells you clearly you have now been baptized in water and fire... it shows us the design of a story whose intent and purpose is to ensure that we no longer allow for things like hurricanes and earthquakes and murder and rape to be "simulated" that we build a better system, that doesn't allow for 'force majeure" to take lives for no reason at all.

      Inline image 19 Inline image 20 Inline image 21

      Not just in band names, but in the angels names too, in all of our names; we see this narration continue.  The Holy Water that is central to the baptism of Christ is etched into Taylor's name, between "sen" and "mom" the key to the two Mary's whose names contain the Spanish for "sea" in a sort of enlightenment hidden in plain sight.  In "Simmons" the key connection between today, this Biblical Monday, and the word "simulation" that ties to Simpsons and simians and keep it simple stupid*, and in Simmons the missing "s" of Kismet, finally completing the question.***

      ***\


      Inline image 23 Inline image 24*\


      *\ *

      It's a song and dance that started a long time ago, as you can see from the ancient Hebrew word for "fate" and in more recent years a connection to the ballroom of Atlantis in the Doors 5 to 1 and Dave sang about it in Rapunzel and then Taylor shook a tambourine on the beach only minutes away from me--but never said "hi."  The battle of the bands continues tying some door knocking to a juxtaposition between "Sweet Things" and "Knocking on Heavens door" all the way to a Gossip Girl episode where little J asked a question that I can't be sure she knew was related, she said... "who's that, at the door?"

      *\ *

      What it really all amounts to, though, is the whole world witnessing the Creation of Adam and Eve from a little girl stuttering out "the the" at the sight of the Grinch himself, and then later not even able to get those words off her lips... about seeing how Creation and modern art are inextricably tied to religion, to heaven, and to freedom.

      *\ *

      Inline image 25 Inline image 26*\ *

      *\ *

      The bottom line here, hopefully obvious now, is that you can't keep this message "simple" it's a Matrix woven between more points of light than I can count, and many more that I'm sure you will find.  It's a key to seeing how God speaks to me, and to you; and how we are, we really are that voice.  Tay, if you don't do something just because God called it "fate" you are significantly more enslaved than if you do--and you wanted to.  "Now I see that you and me, were never meant, never meant to be..." she sang before I mentioned her, and before she ever saw me... in a song she calls "Nothing Left to Lose" and I see is not really just another word for freedom.

      We have plenty to lose by not starting the fire, not the least of which is Heaven itself.  Understand what "force majeure" really means to you and I.  Ha, by the way.

      Inline image 22

      IN CASE YOU FORGOT YESTERDAY'S MESSAGE

      **\ **

      Inline image 6*\ *

      *\ *

      Inline image 27 Inline image 12

      "DADDY, I WANT IT NOW."

      VERUKA SALT. whose name means "to see (if) you are the Body of Christ" whined, in the story of Will Why Won Ka, about nothing more or less than Heaven on Hearth, than seeing an end to needless torture and pain.   To see if you are the "Salt of the Earth" warming the road to Heaven; honestly to see if you can break through this inane lie of "I don't understand" and realize that breaking this story and talking about what is being presented not just by me and you but by history and God himself is the key to the car that drives us home.  To see how Cupid you really are.

      Inline image 29

      STOP NODDING, TURN AROUND AND CALL A REPORTER.

      The story of Willy Wonka ties directly to the Promised Land of Flowing Milk and Honey to me; by showing us a river of chocolate and a the everlasting God starter, (er is it guardian of B stopper) that opens the doors of perception about exactly what kinds of mistake may have been made in the past in this transition to Heaven that we are well on the way of beginning.  Here, in the Land of Nod, that is also Eden and also the Heart of the Ark we see warnings about "flowing milk and honey" being akin to losing our stable ecosystem, to losing the stuff of life itself, biology and evolution, and if we don't understand--this is probably exactly the mistake that was made and the cause of the story of Cain and Abel.  So here we are talking about genetic engineering and mind uploading and living forever, and hopefully seeing that while all things are possible with God--losing the wisdom of the message of religion is akin to losing life in the Universe and with that any hope of eternal longevity.\ With some insight into religion, you can connect the idea that without bees our stable ecosystem might collapse, to the birds and the bees, and a message about stability and having more than one way to pollinate the flowers  and trees and get some.   Janet and Nanna, by the way, both have pretty brown eyes, but that probably comes as no surprise to you.\ Miss Everything, on the other hand (I hear, does not have brown eyes), leads us to glimpse how this message about the transition of our society might continue on in the New Testament, and suggest that we do need to eat, and have dinner conversation, and that a Last Supper might be a little bit more detrimental to our future than anyone had ever thought, over and over and over again.  To see how religion really does make clear that this is what the message is about, to replace the flowing milk we have a "Golden Cow" that epitomizes nothing less than "not listening to Adam" and we have a place that believes the Hammer of Judah Maccabee should be ... extinct.  You are wrong.

      Inline image 30*\ *

      *\ *

      Of course the vibrating light here ties this Gene to another musical piece disclosing something... "Wild Thing" I make your heart sing.  You can believe the Guitar Man is here to steal the show and deliver bread for the hungry and for the wise.  Here's some, it's not just Imagine Dragons telling you to listen to the radio but Jefferson Starship*too, and Live.  *

      *\ *

      When you wake up, you can hear God "singing" to you on the radio every single day; many of us already do.  He's telling you to listen to me, and I do not understand why you do not.  You don't look very Cupid, if you ask me.**

      ***\


      Inline image 31

      Inline image 32

      Inline image 33

      WHAT DO YOU THINK YOU ARE,

      DAN RE Y NO LDS?

      **\ **

      Inline image 14 Inline image 28

      I think we all know what the Rod of Jesus Christ is by now.

      Inline image 35​

      It is a large glowing testament to freedom and truth, and a statement about blindness and evil that is unmistakable.   To say that seeing it is the gateway to Heaven would be an understatement of it's worth, of the implication that not seeing it is obvious Hell when it is linked to everything from nearly every story of the Holy Bible from Isaac to Isaiah to "behold he is to coming" and if you weren't sure if the Hand of God were in action here--it's very clear that it is; that linking Tricky Dick and Watergate to Seagate ... really delivering crystal clear understanding that the foundation of Heaven is freedom and that you have none today because you refuse to see the truth.

      It is the doorway to seeing that what has been going on in this place hasn't been designed to hide me, but to hide a prosperous future from you--to hide the truth about our existence and the purpose of Creation--that all told, you are standing at the doorstep of Heaven and stammering your feet, closing your eyes, and saying "you don't want to help anyone."

      Inline image 36

      If delivering freedom, truth, and equality  to you does not a den make,

      well, you can all suck it

      ... from Godto you.

      **\ **

      Inline image 37

      Between Stargate and Star Trek it's pretty easy to see a roadmap to very quickly and easily be able to end world hunger and heal the sick without drastically changing the way our society works, it's about as simple as a microwave, or a new kind of medicine--except it's not so easy to see why it is that you are so reluctant to talk about the truth that makes these things so easy to do.  You see, your lack of regard for anyone anywhere has placed you in a position of weakness, and if you do nothing today, you will not be OK tomorrow.\ It's pretty easy to see how Roddenberry's name shows that this message comes from God, that he's created this map that starts with an Iron Rod throughout our history proving Creation, whose heart is a Den of Family who care about the truth, and about freedom, and about helping each other--not what you are--you are not that today.  Today you are sick, and I'd like you to look at the mirror he's made for you, and ***be eshamden (or asham). ***

      Inline image 13

      Realize, realize... what you are.  What you've become, just as I have... the devil in a sweet, sweet kiss.**

      ***\


      -Dave J. Matthews

      Inline image 1

      Unless otherwise indicated, this work was written between the Christmas and Easter seasons of 2017 and 2020(A). The content of this page is released to the public under the GNU GPL v2.0 license; additionally any reproduction or derivation of the work must be attributed to the author, Adam Marshall Dobrin along with a link back to this website, fromthemachine dotty org.

      That's a "." not "dotty" ... it's to stop SPAMmers. :/

      This document is "living" and I don't just mean in the Jeffersonian sense. It's more alive in the "Mayflower's and June Doors ..." living Ethereum contract sense and literally just as close to the Depp/C[aster/Paglen (and honorably PK] 'D-hath Transundancesense of the ... new meaning; as it is now published on Rinkeby, in "living contract" form. It is subject to change; without notice anywhere but here--and there--in the original spirit of the GPL 2.0. We are "one step closer to God" ... and do see that in that I mean ... it is a very real fusion of this document and the "spirit of my life" as well as the Spirit's of Kerouac's America and Vonnegut's Martian Mars and my Venutian Hotel ... and my fusion of Guy-A and GAIA; and the Spirit of the Earth .. and of course the God given and signed liberties in the Constitution of the United States of America. It is by and through my hand that this document and our X Commandments link to the Bill or Rights, and this story about an Exodus from slavery that literally begins here, in the post-apocalyptic American hartland. Written ... this day ... April 14, 2020 (hey, is this HADAD DAY?) ... in Margate FL, USA. For "official used-to-v TAX day" tomorrow, I'm going to add the "immultible incarnite pen" ... if added to the living "doc/app"--see is the DAO, the way--will initi8 the special secret "hidden level" .. we've all been looking for.

      Nor do just mean this website or the totality of my written works; nor do I only mean ... this particular derivation of the GPL 2.0+ modifications I continually source ... must be "from this website." I also mean the thing that is built from ... bits and piece of blocks of sand-toys; from Ethereum and from Rust and from our hands and eyes working together ... from this place, this cornerstone of the message that is ... written from brick and mortar words and events and people that have come before this poit of the "sealed W" that is this specific page and this time. It's 3:28; just five minutes--or is it four, too layne.

      This work is not to be redistributed according to the GPL unless all linked media on Youtube and related sites are intact--and historical references to the actual documented history of the art pieces (as I experience/d them) are also available for linking. Wikipedia references must be available for viewing, as well as the exact version of those pages at the time these pieces were written. All references to the Holy Bible must be "linked" (as they are or via ... impromptu in-transit re-linking) to the exact verses and versions of the Bible that I reference. These requirements, as well as the caveat and informational re-introduction to God's DAO above ... should be seen as material modifications to the original GPL2.0 that are retroactively applied to all works distributed under license via this site and all previous e-mails and sites. /s/ wso\ If you wanna talk to me get me on facebook, with PGP via FlowCrypt or adam at from the machine dotty org

      -----BEGIN PGP PUBLIC KEY BLOCK-----

      mQGNBF6RVvABDAC823JcYvgpEpy45z2EPgwJ9ZCL+pSFVnlgPKQAGD52q+kuckNZ mU3gbj1FIx/mwJJtaWZW6jaLDHLAZNJps93qpwdMCx0llhQogc8YN3j9RND7cTP5 eV8dS6z/9ta6TFOfwSZpsOZjCU7KFDStKcoulmvIGrr9wzaUr7fmDyE7cFp1KCZ0 i90oLYHqOIszRedvwCO/kBxawxzZuJ67DypcayiWyxqRHRmMZH1LejTaqTuEu0bp j54maTj09vnMxA0RfS+CtU5uMq+5fTkbiTOe1LrLD72m+PVJIS146FwESrMJEfJy oNqWEJlUQ0TecPZR41vnkSkpocE1/0YqUhWDGSht+67DdeKUg5KwvYdL21d/bSyO SM4jnyKn9aDVzLBpYrlE/lbFxujHPRGlRG5WtiPQuZYDRqP0GYFSXRpeUCI46f49 iPFo4eHo2jUfNDa9r9BjQdAe4zVFn2qLnOy8RWijlolbhGMHGO3w/uC/zad3jjo4 owAfsJjH5Oa1mTcAEQEAAbQmRUFSVEhFTkUgPGVhcnRoZW5lQGZyb210aGVtYWNo aW5lLm9yZz6JAdQEEwEKAD4WIQTUJHbrYn3y2DzwTcnQP1ViZf5/FQUCXpFW8AIb AwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRDQP1ViZf5/FWM6C/9J gbRLS2AWGjdRjYetlRkSkCoTYnXWknbtipYYHlhV0YJFwFMm0ydZIhFX5VDoZyBV 0UBeF1KJmcMoIfrHyhq2QhCnjE14hE1ONbaYTGtpvj851ItbFWXMJIVNyMqr+JT9 CWIxGr1idn+iHWE3nryiHrdlA3O/Gcd4EyNmaSe/JvB7+Z1AVqWkRhpjxxoPSlPm HEdqGOyl3+5ibQgUvXLRWWQXAj80CbVwwj1X4r9hfuCySxLT8Mir7NUXZFd+OiMS U8gNYjcyRGmI92z5lgf7djBbb9dMLwV0KLzgoT/xaupRvvYOIAT+n2mhCctCiH7x y7jYlJHd+0++rgUST2sT+9kbuQ0GxpJ7MZcKbS1n60La+IEEIpFled8eqwwDfcui uezO7RIzQ9wHSn688CDri9jmYhjp5s0HKuN61etJ1glu9jWgG76EZ3qW8zu4l4CH 9iFPHeGG7fa/5d07KvcZuS2fVACoMipTxTIouN7vL0daYwP3VFg63FNTwCU3HEq5 AY0EXpFW8AEMANh7M/ROrQxb3MCT1/PYco1tyscNo2eHHTtgrnHrpKEPCfRryx3r PllaRYP0ri5eFzt25ObHAjcnZgilnwxngm6S9QvUIaLLQh67RP1h8I4qyFzueYPs oY8xo1zwXz7klXVlZW0MYi/g5gpb+rpYUfZEJGJTBM/wMNqwwlct+BSZca4+TEHW g6oN0eXTthtGB0Qls71sv3tbOnOh/67NTwyhcHPWX/P9ilcjGsEiT8hqrpyhjAUm mv7ADi+2eRBV8Xf8JnPznFf0A1FdILVeVHlmsgCSB0FW0NsFI5niZbaYBHDbFsks QdaFaYd54DHln69tnwc2y3POFwx8kwZnMPPlVAR2QdxGQD4Wql7hlWT58xCxQApf M98kbAHjUlVYLT0WUHMDQtj4jdzAVVDiMGMUrbnQ7UwI7LexSB6cJ7H+i7FtS/pR WOhJK6awoOO9dLnEjm6UYCKsBdtJr98F0T7Sb7PnKOGA77y2QN14+u9N9C1lB/Z1 aQRQ2Nc51yXOQQARAQABiQG8BBgBCgAmFiEE1CR262J98tg88E3J0D9VYmX+fxUF Al6RVvACGwwFCQPCZwAACgkQ0D9VYmX+fxU+KQwAtFnWjGIjvqaNXtQjEhbGDH/I Q5ULq/l/wm9SmhG9NYRu3+P6YctCJaZnNeaL+6WFk1jo4LMiJEUT9uGlCbHqJNaI 6Gll1w6QOVLSL8s5V1L477+psluv4WBpi3XkWYlhDOFENCcWd49RQsA2YCX4pW7Q 7GcoSEJoav38MxHmJHYPfjSEvUZXDQIt8PFHSEScvyDWfYtMdRzjmSOOPdzhDDEy 5JBOBcEdSTyDiyDU/sBoAY0e8lvwHYW3p+guZSGSYVhGQ8JECzJOzwc/msMW/tJS 2MLWmWVh5/1P8BVUtLC2AQy6nij6o+h6vEiNzpdYrc+rzT3X5cACvJ0RtCZcrnhl O9PLiona2LEbry6QX5NL41/SAJNno3i72xPnQEe25gn3nbyT+jCoJzw2L0y8pmNB D+PKrk7/1ROFFVN8dJeGwxLGdBcz1zk2xeumzy7OaV8psUyYsJNcjyHUKgclblBW rMR2DgqEYn8QdK54ziKCnmQQZeMPiC6wlUWgg5IqmQGNBF6RVyMBDADALD7NkJ5H dtoOpoZmAbPSlVGXHDbJZuq7J13vew6dtXDIAraeGrsBqkF8bhddwVLzWylMrYCG Bf2L1+5BDgvqu6G+6dcVSbBsnZAS0zfJ0H8EmTvUMxMF7qOZYyrxfLz+pQRq8Osz Icab6ZI/KB6qZyQRvEFPB6pJjt+VvuwgJZTObIwbBbgQri2i02VBkjchsVhiSX9l +eiK7O8ROHKb3P181oScIsHywBOZ9DxRAYbFk5dnBqxO3WKb02H0zqE6440cjXwq TrZZg6ayN/IlPajO8iJPYZ1aIBykxYq1WHo+nhFMYz/VVk2WJorFeOgWaLGXb73c ty96f3qXTdvMDAIWHx8YCD5LbuqasO6LNQm4oQxkCoB3K9WFf/2SvSYb7yMYykb8 clTPt+KO0dsxjWhrJnfnIhC+2Chqv2QvRbFz0S9CpUnGGDweJ1uRNV0y70tO0q7t xXSTDRU3ib6vAHA0K/2MFzwUcog4o5bj7E9uCNJH/DJLZKsMIe4xsvkAEQEAAbQk SEVBVkVOVUVTIDxBVkVOVUBGUk9NVEhFTUFDSElORS5PUkc+iQHUBBMBCgA+FiEE IRklfU/C1qukq3xMXcNH0t3P9ZsFAl6RVyMCGwMFCQPCZwAFCwkIBwIGFQoJCAsC BBYCAwECHgECF4AACgkQXcNH0t3P9Zs+kgv/XEuuWc89Bjg1QQqKZueKNUHjyjnE 2adfoZUH6Q7ir4JZyRBCVpAwrgssmiKid30+SIjwQcpb9JYa/X1XJcDUcJW/I21d Agz/zbEqn/Cou0dUpNCtxgm4BdSHWGoOtgfspXZlXBQ407tRMZ8ykmLB1Bt0oHvw PT0ZOtqXM4pyFnd2eFe5YGbNgl3zqvoC/6CMN3vqswvRlu1BpUuAjdW8AHO5Yvje +Bp852u+4Qpy6PMBiWGsBMYwtf6T7sckpMGlR0TsozwBlAm5ePKK28B0rLJPkZLJ Eo5p4rKRapEaZsWV5Qu1ajrVru7qmpUhZtX0/DddGHfXVuLssmKLP6TumpQB1zvQ vfoBltjvOx35Wps2vHuCzXLw2bROIOzhAxFB+17zxnSbE54N4LIGRpkELuwxwGbg FtD1fi9KtH7xcn33eOK1+UD47V+hKyJGrQgSThly2zdIC2bvfHtFdfp8lOFpT0AU xjEeoJGqdQVupptXyugPlM5/96UJP8OZG0ADuQGNBF6RVyMBDAC3As6eMkoEo3z9 TkCWlvS0vBQmY3gF0VEjlAIqFWpDIdK3zVzMnKUokIT1i7nkadLzHZT2grB4VXuJ FvpbYw5NPR4cDe9grlOMLEaF3oSJ1jZ4V1/rj9v1Hddo8ELi/NToVrt1SB5GCVXB DkYpNLtTiCqHSU07YqwaqH8a+qbDmPxSQdIybkZiTiCEB+6PfQQlBpENEDlov6jm zZF+IcfM6s3kZDX5KFULweH30gMjq8Se8bPtUzW013+tuuwEVr1/YRLrIh+9O6Z+ pdA7gLMRYnD9ZLDytEvpb1lBBSY++5bIJ7xps80//DNqPYqwFmZQgTg0V9XbHE2e wLcOF8a2lYluckU7D///sWQhW+VxuM7R2gEBvYBhOgjWhIF2Aw6NbymW1Ontvyhu eOZCXXxV5W44PxXT8uDdhl9CNcHoBKKJyED8tKjigtn4axpsQeUrnOSbqEXSyqES WnE2wYUDzALcwFkzsvtLyd4xaz55KkPQkAkk0BZd1ezgXxb/obMAEQEAAYkBvAQY AQoAJhYhBCEZJX1PwtarpKt8TF3DR9Ldz/WbBQJekVcjAhsMBQkDwmcAAAoJEF3D R9Ldz/WbAFwL/382HsrldVXnkPmJ1E2YEOFz4rcHRetJ+M5H65K/2p32ONQ5KCbE s8MRY6g2CkE70en2HlpDwr/MdATwxBzIjEpjgHbfqCqVVATY+kSpXsttaKKAUVHi bFgV4QkdDJNSpcHEj+bqaggRnuWiV9T6ECG7kQjHiEXPNojzsiaXMDiM5r+acZm6 82id9qOFySQ2cZEy5HbwXM+ITLQGngnppa7du2KdgiqDeqtODOTWZvLYAq2tmEwD 3TT6ttLUBwOOu2IWpDkXswlrk62ESorE5mpLxop9fsxD39E2H06JoC/YfUPIVkEv fj06e7LEdcx0I7kRfD1v6qOUUsMsLZnmyGIk24iFjLkwu1VToWfwXDN1D2+SeAat 9ydNt4M7oEbd1QaOXXjmqpdU+VUiWcBXg+p3/WdV60MkyAgc3x+YanLljy/Rh18h cZwVlinf/tgvAQLi5f9hpwrwUMoGKijEYHKuEvi3C12Si7UVDfuIR7yS0dKcfuKF MbgwdvNXqpD9W5kBjQRekVd4AQwApHVgw2PVlBDpVcyoymUOXFQIJzJ9wRtr6/sG zwv8rrQnUEtOkkna7TDU3/UTj9FUH0gbpAKGNNPaPj5q0dlLIvzxb15r1uvDGaGL MA+8GFaGFnkxzhg0aXrcKZAN0/Zhgi2B7P8oXQuug5mi1JVDkZN5SeCZNOubdQWL 3xz3jEHp3ixj1mdOdvfdWQFR4CVMXt/A6VI2ujLVb3Yalft/c5bbclAgcJQhgDUu NqGYJEJonESNRSd8fEvhNb6cx7+Djd9+Wyctr76mwOr3nRb1N1OGhFxWjIroUpfz b+6y3oQjT58cJA1ZHqmJ6UlZd81hNNd9KWpbDVwONEPpiqPzfSaonxuqQa0/Cy4W 403OhfoLM/1ZDqD4YrJ/rpyNEfSSdqptWiY0KeErLOYng7rStW/4ZeZVj6b2xxB2 Oas/Z1QYfJyFUki9vaJ5IyN6Y7nVdSP6mbAQC9ESh+VPvRUMpYi4pMGK4rweBVHu oMRRwzk7W5zVIgd425WUe3eCQFn3ABEBAAG0K0VTQ0FQRSBST09NIDxFU0NBUEFF REVTQEZST01USEVNQUNISU5FLk9SRz6JAdQEEwEKAD4WIQTvnDJqcmqzlF87/t82 pJ91j4NOaAUCXpFXeAIbAwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAK CRA2pJ91j4NOaJVjC/4oo5yCHe7M2h1DiTXVcLI5rXQ1feY7B1feg+YJX/mI4+EV xjC/y5VVpV4syJk5GGZNXhKPHiGLaBYvglTlYOJ98RSEsHrwT3go6S8ZVvMNdP5v CEncn2vm5JGnp4k26PuOzMcJioQLOoUjWtcPFis3gG+ueH3NcPZ22oZUql2xuerh TQZegGp+jJ7bdxwYElx5jDDDkh196d5nlO2ZKENl0ZDp4GAzRNjnQ7KBV6R74J3U cLQDWY8vAFaRBZXIC5XtSzj9lr+jWgvxz7Il51+26VDTEtSafZ2uZfCOFk7GrzJg

      sneak preview

      now linking to the next page ... in the discussion:

      https://fromthemachine.org/2017/08/waiting-for-that-green-light.html

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Public Review

      Summary:

      (1) This work describes a simple mechanical model of worm locomotion, using a series of rigid segments connected by damped torsional springs and immersed in a viscous fluid.

      (2) It uses this model to simulate forward crawling movement, as well as omega turns.

      Strengths:

      (3) The primary strength is in applying a biomechanical model to omega-turn behaviors.

      (4) The biomechanics of nematode turning behaviors are relatively less well described and understood than forward crawling.

      (5) The model itself may be a useful implementation to other researchers, particularly owing to its simplicity.

      Weaknesses:

      (6) The strength of the model presented in this work relative to prior approaches is not well supported, and in general, the paper would be improved with a better description of the broader context of existing modeling literature related to undulatory locomotion.

      (7) This paper claims to improve on previous approaches to taking body shapes as inputs.

      (8) However, the sole nematode model cited aims to do something different, and arguably more significant, which is to use experimentally derived parameters to model both the neural circuits that induce locomotion as well as the biomechanics and to subsequently compare the model to experimental data.

      (9) Other modeling approaches do take experimental body kinematics as inputs and use them to produce force fields, however, they are not cited or discussed.

      (10) Finally, the overall novelty of the approach is questionable.

      (11) A functionally similar approach was developed in 2012 to describe worm locomotion in lattices (Majmudar, 2012, Roy. Soc. Int.), which is not discussed and would provide an interesting comparison and needed context.

      9-11: The paper you recommended and our manuscript have some similarities and differences.

      Similarities

      Firstly, the components constituting the worm are similar in both models. ElegansBot models the worm as a chain of n rods, while the study by Majmudar et al. (2012) models it as a chain of n beads. Each bead in the Majmudar et al. model has a directional vector, making it very similar to ElegansBot's rod. However, there's a notable difference: in the Majmudar et al. model, each bead has an area for detecting contact between the obstacle and the bead, while in ElegansBot, the rod does not feature such an area.

      Secondly, the types of forces and torques acting on the components constituting the worm are similar. Each rod in ElegansBot receives frictional force, muscle force, and joint force. Each bead in the Majmudar et al. model receives a constraint force, viscous force, and a repulsive force from obstacles. Each rod in ElegansBot receives frictional torque, muscle torque, and joint torque. Each bead in the Majmudar et al. model receives elastic torque, constraint torque, drive torque, and viscous torque. The Majmudar et al. model's constraint force and torque are similar to ElegansBot's joint force and torque in that they prevent two connected components of the worm from separating. The Majmudar et al. model's viscous force and torque are similar to ElegansBot's frictional force and torque in that they are forces exchanged between the worm and its surrounding environment (ground surface). The Majmudar et al. model's drive torque is similar to ElegansBot's muscle force and muscle torque as a cause of the worm's motion. However, unlike ElegansBot, the Majmudar et al. model did not consider the force generating the drive torque, and there are differences in how each force and torque is calculated. This will be discussed in more detail below.

      Differences

      Firstly, the medium in which the worm locomotes is different. ElegansBot is a model describing motion in a homogeneous medium like agar or water without obstacles, while the Majmudar et al. model describes motion in water with circular obstacles fixed at each lattice point. This is because the purposes of the models are different. ElegansBot analyzes locomotion patterns based on the friction coefficient, while the Majmudar et al. model analyzes locomotion patterns based on the characteristics of the obstacle lattice, such as the distance between obstacles. Also, for this reason, the Majmudar et al. model's bead, unlike ElegansBot's rod, receives a repulsive force from obstacles.

      Secondly, the specific methods of calculating similar types of forces differ. ElegansBot calculates joint forces by substituting frictional forces, muscle forces, frictional torques, and muscle torques into an equation derived from differentiating a boundary condition equation twice over time, where two neighboring rods always meet at one point. This involves determining the process through which various forces and torques are transmitted across the worm. Specifically, it entails calculating how the frictional forces and torques, as well as the muscle forces and torques acting on each rod, are distributed throughout the entire length of the worm. In contrast, The Majmudar et al. model uses Lagrange multipliers method based on a boundary condition that the curve length determined by each bead's tangential angle does not change, to calculate the constraint force and torque before calculating the drive torque and viscous force. This implies that the Majmudar et al. model did not consider the mechanism by which the drive torque and viscous force received by one bead are distributed throughout the worm. ElegansBot's rod receives an anisotropic Stokes frictional force from the ground surface, while the Majmudar et al. model considered the frictional force according to the Navier-Stokes equation for incompressible fluid, assuming the fluid velocity at the bead's location as the bead's velocity.

      Thirdly, unlike the Majmudar et al. model, ElegansBot considers the inertia of the worm components. Therefore, ElegansBot can simulate regardless of how low or high the ground surface's friction coefficient is. the Majmudar et al. model is not like this.

      (12) The idea of applying biomechanical models to describe omega turns in C. elegans is a good one, however, the kinematic basis of the model as used in this paper (the authors do note that the control angle could be connected to a neural model, but don't do so in this work) limits the generation of neuromechanical control hypotheses.

      8, 12: We do not agree with the claim that ElegansBot could limit other researchers in generating neuromechanical control hypotheses. The term θ_("ctrl" ,i)^((t) ) used in our model is designed to be replaceable with neuromechanical control in the future.

      (13) The model may provide insights into the biomechanics of such behaviors, however, the results described are very minimal and are purely qualitative.

      (14-1) Overall, direct comparisons to the experiments are lacking or unclear.

      14-1: If you look at the text explaining Fig. 2 and 5 (Fig. 2 and 4 in old version), it directly compares the velocity, wave-number, and period as numerical indicators representing the behavior of the worm, between the experiment and ElegansBot.

      (14-2) Furthermore, the paper claims the value of the model is to produce the force fields from a given body shape, but the force fields from omega turns are only pictured qualitatively.

      13, 14-2: We gratefully accept the point that our analysis of the omega-turn is qualitative. Therefore, we have conducted additional quantitative analysis on the omega-turn and inserted the results into the new Fig. 4. We have considered the term 'Force field' as referring to the force vector received by each rod. We have created numerical indicators representing various behaviors of the worm and included them in the revised manuscript.

      (15) No comparison is made to other behaviors (the force experienced during crawling relative to turning for example might be interesting to consider) and the dependence of the behavior on the model parameters is not explored (for example, how does the omega turn change as the drag coefficients are changed).

      Thank you for the great idea. To compare behaviors, first, a clear criterion for distinguishing behaviors is needed. Therefore, we have created a new mathematical definition for behavior classification in the revised manuscript (“Defining Behavioral Categories” in Method). After that, we compared the force and power (energy consuming rate) between each forward locomotion, backward locomotion, and omega-turn (Fig. 4). And in the revised manuscript, we newly analyzed how the turning behavior changes with variations in the friction coefficients in Figs. S4-S7.

      (16) If the purpose of this paper is to recapitulate the swim-to-crawl transition with a simple model, and then apply the model to new behaviors, a more detailed analysis of the behavior of the model variables and their dependence on the variables would make for a stronger result.

      In our revised manuscript, we have quantitatively analyzed the changes occurring in turning behavior from water to agar, and the results are presented in Figs. S9 and S10.

      (17) In some sense, because the model takes kinematics as an input and uses previously established techniques to model mechanics, it is unsurprising that it can reproduce experimentally observed kinematics, however, the forces calculated and the variation of parameters could be of interest.

      (18) Relatedly, a justification of why the drag coefficients had to be changed by a factor of 100 should be explored.

      (19) Plate conditions are difficult to replicate and the rheology of plates likely depends on a number of factors, but is for example, changes in hydration level likely to produce a 100-fold change in drag? or something more interesting/subtle within the model producing the discrepancy?

      18, 19: As mentioned in the paper, we do not know if the friction coefficients in the study of Boyle et al. (2012) and the friction coefficients in the experiment of Stephens et al. (2016) are the same. In our revised manuscript, we have explored more in detail the effects of the friction coefficient's scale factor, and explained why we chose a scale factor of 1/100 (“Proper Selection of Friction Coefficients” in Supplementary Information). In summary, we analyzed the changes in trajectory due to scaling of the friction coefficient, and chose the scale factor 1/100 as it allowed ElegansBot to accurately reproduce the worm's trajectory while also being close to the friction coefficients in the Boyle et al. paper.

      (20) Finally, the language used to distinguish different modeling approaches was often unclear.

      (21) For example, it was unclear in what sense the model presented in Boyle, 2012 was a "kinetic model" and in many situations, it appeared that the term kinematic might have been more appropriate. Thank you for the feedback. As you pointed it out, we have corrected that part to 'kinematic' in the revised manuscript.

      (22) Other phrases like "frictional forces caused by the tension of its muscles" were unclear at first glance, and might benefit from revision and more canonical usage of terms.

      We agree that the expression may not be immediately clear. This is due to the word limit for the abstract (the abstract of eLife VOR should be under 200 words, and our paper's abstract is 198 words), which forced us to convey the causality in a limited number of words. Therefore, although we will not change the abstract, the expression in question means that the muscle tension, which is the cause of the worm's locomotion, ultimately generates the frictional force between the worm and the ground surface.

      Recommendations For The Authors

      (23) As I stated in my public review, I think the paper could be made much stronger if a more detailed exploration of turning mechanics was presented.

      (24) Relatedly, rather than restricting the analysis to individual videos of turning behaviors, I wonder if a parameterized model of the turning kinematics would be fruitful to study, to try to understand how different turning gaits might be more or less energetically favorable.

      We thank the reviewer once again for their suggestion. Thanks to their proposal, we were able to conduct additional quantitative analysis on turning behavior.

      Reviewer #2

      Public Review

      Summary:

      (1) Developing a mechanical model of C. elegans is difficult to do from basic principles because it moves at a low (but not very small) Reynolds number, is itself visco-elastic, and often is measured moving at a solid/liquid interface.

      (2) The ElegansBot is a good first step at a kinetic model that reproduces a wide range of C. elegans motiliy behavior.

      Strengths: (3) The model is general due to its simplicity and likely useful for various undulatory movements.

      (4) The model reproduces experimental movement data using realistic physical parameters (e.g. drags, forces, etc).

      (5) The model is predictive (semi?) as shown in the liquid-to-solid gait transition.

      (6) The model is straightforward in implementation and so likely is adaptable to modification and addition of control circuits.

      Weaknesses:

      (7) Since the inputs to the model are the actual shape changes in time, parameterized as angles (or curvature), the ability of the model to reproduce a realistic facsimile of C. elegans motion is not really a huge surprise. (8) The authors do not include some important physical parameters in the model and should explain in the text these assumptions.

      (9. 1) The cuticle stiffness is significant and has been measured [1].

      (10. 2) The body of C. elegans is under high hydrostatic pressure which adds an additional stiffness [2].

      (11. 3) The visco-elasticity of C. elegans body has been measured. [3]

      Thank you for asking. The stiffness of C. elegans is an important consideration. We took this into account when creating ElegansBot, but did not explain it in the paper. The detailed explanation is as follows. C. elegans indeed has stiffness due to its cuticle and internal pressure. This stiffness is treated as a passive elastic force (elastic force term of lateral passive body force) in the paper of Boyle et al. (2012). However, the maximum spring constant of the passive elastic force is 1/20 of the maximum spring constant of the active elastic force. If we consider this fact in our model, the elastic term of the muscle torque is as follows: ( is the active torque elasticity coefficient, is the passive torque elasticity coefficient)

      where

      Therefore, there is no need to describe the active and passive terms separately in

      Furthermore, since , assuming , then and .

      (12) There is only a very brief mention of proprioception.

      (13) The lack of inclusion of proprioception in the model should be mentioned and referenced in more detail in my opinion.

      As you emphasized, proprioception is an important aspect in the study of C. elegans' locomotion. In our paper, its importance is briefly introduced with a sentence each in the introduction and discussion. However, our research is a model about the process of the creation of body motion originated from muscle forces, and it does not model the sensory system that senses body posture. Therefore, there is no mention of using proprioception in our paper's results section. What is mentioned in the discussion is that ElegansBot can be applied as the kinetic body model part in a combination model of a kinetic body model and a neuronal circuit model that receives proprioception as a sensory signal.

      (14) These are just suggested references.

      (15) There may be more relevant ones available.

      The papers you provided contain specific information about the Young's modulus of the C. elegans body. The first paper (Rahimi et al., 2022) measured the Young's modulus of the cuticle after chemically isolating it from C. elegans, while the second paper (Park et al., 2007) and third paper (Backholm et al., 2013) measured the elasticity and Young's modulus of C. elegans without separating the cuticle. Based on the Young's modulus provided in each paper (although the second and third papers did not measure stiffness in the longitudinal direction), we derived the elastic coefficient (assuming a worm radius of 25 μm, cuticle thickness of 0.5 μm, and 1/25 of longitudinal length of the cuticle of 40 μm). The range was quite broad, from 9.82ⅹ1011 μg/sec2 (from the first paper) to 2.16 ⅹ 108 μg / sec2 (from the third paper). Although the elastic coefficient value in our paper falls within this range, since the range of the elastic coefficient is wide, we think we can modify the elastic coefficient in our paper and will be able to reapply our model if more accurate values become known in the future.

      Reviewer #3

      Public Review

      Summary:

      (1) A mechanical model is used with input force patterns to generate output curvature patterns, corresponding to a number of different locomotion behaviors in C. elegans

      Strengths:

      (2) The use of a mechanical model to study a variety of locomotor sequences and the grounding in empirical data are strengths.

      (3) The matching of speeds (though qualitative and shown only on agar) is a strength.

      Weaknesses:

      (4) What is the relation between input and output data?

      ElegansBot takes the worm's body control angle as the input, and produces trajectory and force of each segment of the worm as the output.

      (5) How does the input-output relation depend on the parameters of the model?

      If 'parameter' is understood as vertical and horizontal friction coefficients, then the explanation for this can be found in Fig. 5 (Fig. 4 in the old version).

      (6) What biological questions are addressed and can significant model predictions be made?

      Equation of motion deciphering locomotion of C. elegans including turning behaviors which were relatively less well understood.

      Recommendations For The Authors

      (7) The novelty and significance of the paper should be clarified.

      We have added quantitative analyses of turning behavior in the revised manuscript, and we hope this will be helpful to you.

      (8) Previously much more detailed models have been published, as compared to this one.

      We hope the reviewer can point out any previous model that we may have missed.

      (9) The mechanics here are simplified (e.g. no information about dorsal/ventral innervation but only a bending angle) setting limitations on the capacity for model predictiveness.

      (10) Such limitations should be discussed.

      We view the difference between dorsal/ventral innervation and bending angle not as a matter of simplification, but rather as a reflection of the hierarchy that our model implements. Our model does not consider dorsal/ventral innervation, but it uses the bending angle to reproduce behavior in various input and frictional environments, which signifies the strong predictiveness of ElegansBot (Figure 2, 3, 5 (2, 3, 4 in the old version)). Moreover, if the midline of C. elegans is incompressible, then modeling by dividing into dorsal/ventral, as opposed to modeling solely with the bending angle, does not increase the degree of freedom of the worm model, and therefore does not increase its predictiveness.

      (11) The aims of the paper and results need to be supported quantitatively and analyzed through parameter sweeps and intervention.

      We have conducted additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

      (12) The methods are given only in broad brushstrokes, and need to be much more clear (and ideally sharing all code).

      We have thoroughly detailed every aspect of this research, from deriving the physical constants of C. elegans, agar, and water to developing the formulas and proofs necessary for operating ElegansBot and its applications. This comprehensive information is all presented in the Results, Methods, and Supplementary Information sections, as well as in the source code. Moreover, we have already ensured that our research can be easily reproduced by providing detailed explanations and by making ElegansBot accessible through public software databases (PyPI, GitHub). To further aid in its application and understanding, especially for those less familiar with the subject, we have also included minimal code as examples in the database. This code is designed to simplify the process of reproducing the results of the paper, thereby making our research more accessible and understandable. Therefore, we believe that readers will easily gain significant assistance from the extensive information we have provided. Should readers require further help, they can always contact us, and we will be readily available to offer support.

      (13) The supporting figures and movies need to include a detailed analysis to evidence the claims.

      We have conducted and provided additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      This manuscript provides some valuable findings concerning the hippocampal circuitry and the potential role of adult-born granule cells in an interesting long-term social memory retrieval. The behavior experiments and strategy employed to understand how adult-born granule cells contribute to long-term social discrimination memory are interesting.

      We thank the reviewer for the positive evaluation.

      I have a few concerns, however with the strength of the evidence presented for some of the experiments. The data presented and the method described is incomplete in describing the connection between cell types in CA2 and the projections from abGCs. Likewise, I worry about the interpretation of the data in Figures 1 and 2 given the employed methodology. I think that the interpretation should be broadened. This second concern does not impact the interest and significance of the findings.

      In response to this concern, we have removed the data concerning abGC projections to PCP4+ and PV-GFP+ cell bodies from Figure 1 and have focused this analysis on dendrites. We now provide high magnification images of dendrites and expand on the methodology, results, and interpretations in the manuscript. We also broaden the interpretation throughout the manuscript to address the reviewer’s concern.

      Strengths:

      The behavior experiments are beautifully designed and executed. The experimental strategy is interesting.

      We appreciate these positive comments.

      Weaknesses:

      The interpretation of the results may not be justified given the methods and details provided.

      We have addressed this concern by providing more methodological details and broadening our interpretation of the results.

      Reviewer #2:

      Summary:

      Laham et al. investigate how the projection from adult-born granule cells into CA2 affects the retrieval of social memories at various developmental points. They use chemogenetic manipulations and electrophysiological recordings to test how this projection affects hippocampal network properties during behavior. I find the study to be very interesting, the results are important for our understanding of how social memories of different natures (remote or immediate) are encoded and supported by the hippocampal circuitry. I have some points that I added below that I think could help clarify the conclusions:

      We appreciate the positive assessment and have addressed the more specific points below.

      My major concern with the manuscript was that making the transitions between the different experiments for each result section is not very smooth. Maybe they can discuss a bit in a summary conclusion sentence at the end of each result section why the next set of experiments is the most logical step.

      In response, we have added summary conclusion sentences at the end of each result section.

      In line 113, the authors say that "the DG is known to influence hippocampal theta-gamma coupling and SWRs". Another recent study Fernandez-Ruiz et al. 2021, examined how various gamma frequencies in the dentate gyrus modulate hippocampal dynamics.

      We cite this paper in the revised manuscript.

      Having no single cells in the electrophysiological recordings makes it difficult to interpret the ephys part. Perhaps having a discussion on this would help interpret the results. If more SWRs are produced from the CA2 region (perhaps aided by projections from abGC), more CA2 cells that respond to social stimuli (Oliva et al. 2020) would reactivate the memories, therefore making them consolidate faster/stronger. On the other hand, the projections from abGC that the authors see, also target a great deal of PV+ interneurons, which have been shown to pace the SWRs frequency (Stark et al 2014, Gan et al 2017), which further suggests that this projection could be involved in SWRs modulation.

      We discuss these possibilities and cite Gan et al 2017, Schlingloff et al., 2014, and Stark et al., 2014 in the revised manuscript.

      The authors should cite and discuss Shuo et al., 2022 (A hypothalamic novelty signal modulates hippocampal memory).

      We mention Chen et al (A hypothalamic novelty signal modulates hippocampal memory.) in the revised manuscript. “Shuo” is the first name of the first author on this paper, so we believe that this is the same paper to which the reviewer refers.

      I think the authors forgot to refer to Fig 3a-f, maybe around lines 163-168.

      We thank the reviewer for pointing out this error. In the revised manuscript, we refer to all figure panels. Since Fig 3 is now broken into two figures (Fig 3 and 4), the panel lettering has changed in the revised manuscript.

      Are the SWRs counted only during interaction time or throughout the whole behavior session for each condition?

      The SWRs are counted throughout the whole behavior session for each condition. This is now stated in the revised manuscript.

      Figure 3t shows a shift in the preferred gamma phase within theta cycles as a result of abGC projections to CA2 ablation with CNO, especially during Mother CNO condition. I think this result is worth mentioning in the text.

      We now mention this finding in the revised manuscript.

      Figure 3u in the legend mention "scale bars = 200um", what does this refer to?

      The scale bar refers to that shown in Figure 3b, which is now indicated in the legend.

      What exactly is calculated as SWR average integral? Is it a cumulative rate? Please clarify.

      The integral measure provides information regarding the average total power of SWR events. It sums z-scored amplitude values from beginning to the end of each SWR envelope, and then takes the average across all summed envelopes. SWR integral has been shown to influence SWR propagation (De Filippo and Schmitz, 2023). This is now described in the text.

      Alexander et al 2017, "CA2 neuronal activity controls hippocampal oscillations and social behavior", examined some of the CA2 effects in the hippocampal network after CNO silencing, and the authors should cite it.

      Alexander et al., 2018, which we believe is the relevant paper, is now cited in the revised manuscript.

      Strengths:

      Behavioral experiments after abGC projections to CA2 are compelling as they show clearly distinct behavioral readout.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      Electrophysiological experiments are difficult to interpret without additional quantifications (single-cell responses during interactions etc.)

      We have addressed this concern by expanding the interpretation of our results.

      Reviewer #3:

      Laham et al. present a manuscript investigating the function of adult-born granule cells (abGCs) projecting to the CA2 region of the hippocampus during social memory. It should be noted that no function for the general DG to CA2 projection has been proposed yet. The authors use targeted ablation, chemogenetic silencing, and in vivo ephys to demonstrate that the abGCs to CA2 projection is necessary for the retrieval of remote social memories such as the memory of one's mother. They also use in vivo ephys to show that abGCs are necessary for differential CA2 network activity, including theta-gamma coupling and sharp wave-ripples, in response to novel versus familiar social stimuli.

      The question investigated is important since the function of DG to CA2 projection remained elusive a decade after its discovery. Overall, the results are interesting but focused on the social memory of the mother, and their description in the manuscript and figures is too cursory. For example, raw interaction times must be shown before their difference. The assumption that mice exhibit social preference between familiar or novel individuals such as mother and non-mother based on social memory formation, consolidation, and retrieval should be better explained throughout the manuscript. Thus, when describing the results, the authors should comment on changes in preference and how this can be interpreted as a change in social memory retrieval. Several critical experimental details such as the total time of presentation to the mother and non-mother stimulus mice are also lacking in the manuscript. The in vivo e-phys results are interesting as well but even more succinct with no proposed mechanism as to how abGCs could regulate SWR and PAC in CA2.

      In response to these comments, we provide raw interaction times in a new Figure (Fig. S1). We also provide more information about the experiments and figures in the revision. We explain the rationale for our behavioral interpretations and discuss proposed mechanisms for how abGCs regulate SWR and PAC.

      The manuscript is well-written with the appropriate references. The choice of the behavioral test is somewhat debatable, however. It is surprising that the authors chose to use a direct presentation test (presentation of the mother and non-mother in alternation) instead of the classical 3-chamber test which is particularly appropriate to investigate social preference. Since the authors focused exclusively on this preference, the 3-chamber test would have been more adequate in my opinion. It would greatly strengthen the results if the authors could repeat a key experiment from their investigation using such a test. In addition, the authors only impaired the mother's memory. An additional experiment showing that disruption of the abGCs to CA2 circuit impairs social memory retrieval would allow us to generalize the findings to social memories in general. As the manuscript stands, the authors can only conclude the importance of this circuit for the memory of the mother. Developmental memory implies the memory of familiar kin as well.

      We selected the direct social interaction test because it allows for more naturalistic social behaviors than measuring investigation times toward social stimuli located inside wire mesh containers. We also decided to focus our studies on the retrieval of mother memories because these are likely the first social memories to be formed. We emphasize that our results cannot be generalized to memories of other social stimuli but given studies on recent social memory formation and retrieval in adults that manipulate abGCs and CA2 separately, we feel that it is likely that this circuit is involved in these functions as well. However, we specify throughout the manuscript that our experiments can only tell us about mother memories. We have also changed the title to reflect this.

      The in vivo ephys section (Figure 3) is interesting but even more minimalistic and it is unclear how abGCs projection to CA2 can contribute to SWR and theta-gamma PAC. In Figure 1, the authors suggest that abGCs project preferentially to PV+ neurons in CA2. At a minimum, the authors should discuss how the abGCs to PV+ neurons to CA2 pyramidal neurons circuit can facilitate SWR and theta-gamma PAC.

      We have divided Figure 3 into two figures (Figures 3 and 4) and revised the electrophysiology section of the results section. In the revised paper, we now discuss how abGC projections to PV+ interneurons may facilitate SWR and PAC.

      Finally, proposing a function for 4-6-week-old abGCs projecting to CA2 begs two questions: What are abGCs doing once they mature further, and more generally, what is the function of the DG to CA2 projection? It would be interesting for the authors to comment on these questions in the discussion.

      In response to these comments, we discuss possible answers to these interesting questions.

      Recommendations for the authors:

      Reviewer #1:

      Specifically, in Figure 1, for the analysis of the synapses formed between abGCs and CA2 PNS (as identified by PCP4 expression) and CA2 PV+ cells (as identified by cre-dependent AAV-mCherry expression) in PV-cre line. In panels c and d the soma of a CA2 PN cell is shown, as well as the soma of a PV cell is shown. Why was the soma analyzed? What relevance is there for this? It is my understanding that synapses form on dendrites- this would be much more relevant to show, in my opinion. Also, the methods for panels e and f state that the 3R-Tau+ intensity was analyzed only in stratum lucidum. (There was a normalization for the overall 3R-Tau intensity in SL of CA2 that was obtained by dividing the 3R-Tau intensity of corpus callosum). I don't understand then how a comparison of 3RTau intensity could have been done for CA2 PN soma. There are no CA2 PN soma in stratum lucidum. (This is fairly clearly shown in Figure 1aiii, with the PCP4 staining showing the soma in the somatic layer... not in stratum lucidum). What is being analyzed here?

      If the 3R-Tau intensity for dendrites is higher for PV cell dendrites, an example image of dendrites would be very helpful. How was the CA2 PV cell dendrite delimited from the CA2 PN dendrites at 40x magnification for the 3R-Tau intensity? Why were pre-synaptic puncta not examined? Is it possible to determine the post-synaptic target with these methods? This result could be particularly interesting, but I find it very difficult to understand the quantification or the justification behind it. To truly know if a cell is getting a connection, the best method would be to perform whole-cell patch clamp recordings of the post-synpatic target cells and use optogenetics of the abGCs. I understand that perhaps this may be beyond the scope of the paper, but it is a severe limitation for these results.

      We have eliminated the cell body measures from Figure 1 and focus instead on the dendrite measures, which we agree are more relevant. We now provide high magnification example images of pyramidal cell (PCP4+) and PV+ interneuron (GFP+) dendrites in Figure 1. We thank the reviewer for pointing out the error about the stratum lucidum as some of the dendrites analyzed are located in the pyramidal cell layer. In addition, neither PCP4 nor GFP label the full extent of dendrites emanating from CA2 pyramidal cells or PV+ interneurons respectively. We mention this in the revised manuscript because abGC projections to more distal dendrites might show a different pattern than that which was observed for proximal dendrites. We also provide more details about how the dendrites were delimited for the analysis, and mention that these results cannot definitively inform us about whether functional synaptic connections have been formed.

      Canulation over CA2 is potentially not specific to CA2 terminals. It would be optimal if the authors had some histology demonstrating specific cannula placement, as these surgeries are really tough to get perfectly centered over CA2. Even if it is perfectly centered, how much would the CNO diffuse into CA3? I think that given the methodology, the authors really need to consider that the behavioral results are not only a result of blocking abGC terminals in CA2 alone. Would it really change much if the abGC terminals are also silenced in CA3a/b as well? The McHugh lab has shown that area CA3 is also playing a role in social memory (Chiang, M.-C., Huang, A. J. Y., Wintzer, M. E., Ohshima, T. & McHugh, T. J. A role for CA3 in social recognition memory. Behav Brain Res 354, 2018). It may be that both areas CA2 and CA3 are important for the phenomenon being demonstrated in Figure 2. I think the impact of the study is just as interesting, as this examination of early social memories is very interesting and nicely done. In fact, areas CA2 and CA3 may be acting together (please see Stöber, T. M., Lehr, A. B., Hafting, T., Kumar, A. & Fyhn, M. Selective neuromodulation and mutual inhibition within the CA3-CA2 system can prioritize sequences for replay. Hippocampus 30, 1228-1238, 2020).

      We agree that it is possible that CNO infusions targeted at the CA2 would also influence CA3a/b and have revised the paper to include this possible interpretation. We also cite the suggested paper on CA3 involvement in social memory (Chiang et al., 2018) and the paper on CA2-CA3 interactions (Stöber et al, 2020).

      Figure 3 is packed with information, but not communicated in a reasonable way. Much more information and a description of the experimental protocol need to be presented. Furthermore, why are there no example traces for the SWRs recorded? There should be more analysis than just a difference score and frequency. How is j, k, and l analyzed and interpreted? Why no example traces there? Also, the n's seem way too small for Figure 3mr. Are there only 32 or three animals used for some of these conditions? This is insufficient in my opinion to conclude much for a 5-minute interaction.

      In response to this concern, we have divided Figure 3 into 2 figures – Figure 3 and Figure 4. In Figure 3, we provide example traces for SWRs, with additional SWR data presented in Figures S3 and S4, including data to complement the difference score data in Figure 3. In Figure 4, we include traces of phase amplitude coupling. We also provide more information in the methods about how the phase amplitude coupling data were analyzed. For Figure 4, we used methods described by Tort et al., 2010 to produce a modulation index, which is a measure of the intensity of coupling between theta phase and gamma amplitude. This method additionally allows for visualization of how gamma amplitude is modified across individual theta phase cycles. Regarding the question about n sizes in the 10-12 week abGC group (Fig. 3), the numbers are lower than in the 4-6 week abGC group because by 6 weeks after the first set of recordings, the electrodes in some of the mice were no longer usable. The n sizes for this specific study are 4-5 per group for Nestin-cre mice; 7-8 for Nestin-cre:Gi. This is now clarified in the figure legend.

      The discussion section of this paper does not put these results into a broader context with the field. There are other studies examining abGCs and their roles in novelty and memory formation (the work from Juna Song's lab, for example). These should be properly mentioned and discussed.

      In response, we have added discussion on the roles of abGCs in nonsocial novelty and memory formation and have cited papers from the Song lab.

      In the figure legend for Figure 2, there is no specific explanation for panel h. Perhaps the label is missing in the legend.

      We thank the reviewer for noting this error and now include a description in the revised manuscript.

      Reviewer #2:

      Adding more quantifications (single cells, isolating data during interactions versus noninteraction times) would help understand the results better. In the lack of this, adding a more clear rationale (even if only through the presentation of hypotheses) in between the transitions of the different results sections would make the study easier to read.

      In response to this comment, we have added transition sentences between results sections to clarify the rationale and make the manuscript easier to understand.

      Reviewer #3:

      Line 110: "Hippocampal phase-amplitude coupling (PAC) and generation of sharp waveripples (SWRs) have been linked to novel experience, memory consolidation, and retrieval (Colgin, 2015; Fernandez Ruiz et al., 2019; Meier et al., 2020; Joo and Frank, 2018; Vivekananda et al., 2021). The DG is known to influence hippocampal theta-gamma coupling and SWRs (Bott et al, 2016; Meier et al., 2020), yet no studies have examined the influence of abGCs on these oscillatory patterns." This information comes too early in the result section and is somewhat confusing.

      In response to this comment, we have moved this information and provided a better description.

      Line 118: "we found that mice with normal levels of abGCs can discriminate between their own mother and a novel mother." Be more descriptive of the results (present the raw interaction times with the statistical test to compare them), this is the conclusion.

      In response to this comment, we provide the raw interaction times in a new Figure (Fig. S1) and describe the results in more detail.

      Line 121: "These effects were not due to changes in physical activity". Be more specific. Did you subject the mice to a specific test? If not, how did you calculate locomotion? The data presented in the supplementary figure 1a only states the % locomotion.

      Locomotion was manually scored whenever an animal moved in the testing apparatus. Speed was not recorded. Total locomotion was divided by trial duration to create a % locomotion measure. We have added these details to the methods.

      Line 124: "Coinciding with the recovery of adult neurogenesis, GFAP-TK animals regained the ability to discriminate between their mother and a novel mother". Explain how the difference in interaction time can be interpreted as the ability to discriminate. You could also compute the discrimination index used by several other laboratories (difference of interaction normalized by the total interaction time).

      In response to this comment, we describe how the difference in interaction time can be interpreted as the ability to discriminate between novel and familiar mice.

      Line 133: "Targeted CNO infusion in Nestin-Cre:Gi mice enabled the inhibition of GiDREADD+ abGC axon terminals present in CA2." Provide data or references to support this claim. Injection of a dye of comparable size to CNO would help. Otherwise, mention that nearby CA3a could be affected as well.

      We cannot rule out that nearby CA3a was affected by our cannula infusions of CNO into CA2. Furthermore, since dyes likely diffuse at different rates than CNO, we believe that a dye injection would not eliminate this concern completely. Therefore, we have revised the paper to acknowledge the likelihood that the CNO infusion affected parts of CA3 in addition to CA2. We also changed the title to focus more on the CA2 electrophysiological recordings, which we know were obtained only from the CA2.

      Line 150: "When reintroduced to the now familiar adult mouse 6 hours later, after the effects of CNO had largely worn off". Provide data or references supporting this claim.

      In response, we cite articles that show behavioral effects of CNO DREADD activation are returned to baseline 6 hrs later.

      Line 165: "We found that SWR production is increased during social interaction, with more SWRs produced during novel mouse investigation, presumably during encoding social memories, than during familiar mouse investigation, presumably during retrieval of developmental social memories". How does this compare to the results in Oliva et al, Nature 2021?

      The Oliva et al 2021 paper recorded CA2 SWRs during home cage and during post-social stimulus exposure periods of sleep. The timing of the study does not coincide with the measures we made, but we cite the paper.

      Line 168: "Inhibition of abGCs in the presence of a social stimulus". How does silencing abGC impact CA2 pyramidal neurons' firing rate?

      The direct answer to this question is unknown because we did not measure single units, but based on studies done in the CA3, it is likely that firing rate in CA2 would increase.

      Line 203: "abGCs possess a time-sensitive ability to support retrieval of developmental social memories." Can you speculate on the function of the cells later on?

      In the revised paper, we speculate about the function of abGCs after they mature and no longer support retrieval of developmental social memories.

      Line 229: "GFAP-TK mice were group housed by genotype". Why not housed them with CD1 littermates?

      We housed these mice according to genotype to avoid having mice with different levels of abGCs (GFAP-TK + VGCV and CD1 + VGCV) living together in social groups. We did this to avoid potential differences that might emerge in social behavior.

      Line 237: "Adult TK, Nestin-cre, and Nestin-cre:Gi offspring underwent a social interaction test in which they directly interacted with the mother". Specify how long was the social interaction time.

      In the revised manuscript, we specify that mice interacted with each social stimulus for 5 minutes.

      Line 240: "After a 1-hour delay spent in the home cage". Were the mice single-housed or with their littermates during this delay?

      In the revised manuscript, we indicate that mice were put back into the home cage with their cagemates during the 1 hr delay period.

      Line 241: "The order of stimulus exposure was counterbalanced in all tests." Can you show some data to confirm that the order of presentation did not impair the interaction? Have you considered using your own version of the classical 3-chamber test in order to assess directly the preference for one or the other female mouse?

      Our data suggest that the order of testing is not responsible for the observed results. Across all experimental groups without an abGC manipulation (i.e., all direct social interaction assays excluding VGCV+ GFAP-TK trials and CNO+ Nestin-cre:Gi trials), ~84.4% of animals demonstrate a social preference for the novel mother over the mother (CD1 + GFAP-TK VGCV- cohort: 28/33; CD1 VGCV+ cohort: 17/17; CD1 and TK recovery cohort: 24/31; Nestin-cre and Nestin-cre:GI 4-6-week-old abGC cohort: 77/95; 10-12-week-old abGC cohort: 49/55; Total = 195/231 mice with an investigation preference for the novel mother). If stimulus presentation order were to bias social investigation preference toward the first stimulus presented, we would expect the percentage of animals demonstrating a social preference for each stimulus to be around 50%, as roughly half the animals were first exposed to the mother with the other half first exposed to the novel mother. The social novelty preference percentage reported above is comparable to percentages we observe in our lab's novel to familiar social interaction experiments, in which all animals are first exposed to a novel conspecific. We have yet to conduct experiments testing adults using the modified 3-chamber assay described in Laham et al., 2021.

      Statistics: The statistical tests used throughout the paper are appropriate but their description is too cursory. Please provide F values and specify the name of the tests used in the figure legends before giving the exact p values.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors were attempting to determine the extent that CIH altered swallowing motor function; specifically, the timing and probability of the activation of the larygneal and submental motor pools. The paper describes a variety of different motor patterns elicited by optogenetic activation of individual neuronal phenotypes within PiCo in a group of mice exposed to CIH. They show that there are a variety of motor patterns that emerge in CIH mice; this is apparently different than the more consistent motor patterns elicited by PiCo activation in normoxic mice (previously published).

      Strengths:

      The preparation is technically challenging and gives valuable information related to the role of PiCo in the pattern of motor activation involved in swallowing and its timing with phrenic activity. Genetic manipulations allow for the independent activation of the individual neuronal phenotypes of PiCo (glutamatergic, cholinergic) which is a strength.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      (1) The data presented are largely descriptive in terms of the effect of PiCo activation on the probability of swallowing and the pattern of motor activation changes following CIH. Comparisons made between experimental data acquired currently and those obtained in a previous cohort of animals (possibly years before) are extremely problematic, with the potential confounding influence of changing environments, genetics, and litter effects. The statistical analyses (i.e. comparing CIH with normoxic) appear insufficiently robust. Exactly how the data were compared is not described.

      Yes, we agree the data are descriptive in terms of characterizing the effect of CIH on PiCo activation. However, we would like to emphasize that the data are also mechanistic because they characterize the effects of specifically, optogenetically manipulating PiCo neurons after being exposed to CIH.

      Thank you for this comment and for pointing out our misleading description in the paper. This manuscript is meant to independently characterize the effects of CIH to the response of PiCo stimulation. We are not making direct comparisons between the previously published manuscript where mice were exposed to room air. There has been no statistical analysis made between previously published control and current CIH data, since we are not making a direct comparison, only an observational comparison.

      To make this clearer, and to address the reviewers concern, we have removed the room air data from figures 1E, 2C and 3A. However, we believe it is important to keep the data from mice exposed to room air in Figure 2B since we did not include this information in the previously published manuscript. It is important to point out that all mice exposed to CIH have some form of submental activity during laryngeal activation in response to PiCo stimulation. This is not the case when mice are exposed to room air only. In this figure, only descriptive analysis are presented. We adjusted our wording throughout the text, particularly in the discussion, to eliminate any confusion that we are making direct comparisons between the two studies. The following sentence has been added to the discussion “While we do not intend to make direct quantitative comparisons between the previously published PiCo-triggered swallows in control mice exposed to room air (Huff et al 2023) and the data presented here for mice exposed to CIH, we believe it is important to compare the conclusions made in these two studies.” This was the motivation for using the eLife Advance format. Since the present study demonstrates that PiCo affects swallow patterning which was not observed in the control data.

      (2) There is limited mechanistic insight into how PiCo manipulation alters the pattern and probability of motor activation. For example, does CIH alter PiCo directly, or some other component of the circuit (NTS)? Techniques that silence or activation projections to/from PiCo should be interrogated. This is required to further delineate and define the swallowing circuit, which remains enigmatic.

      We agree with the reviewer that our study raises many more questions than we are able to answer at the moment. This however applies to most scientific studies. Even though swallowing has been studied for many decades, the underlying circuitry remains largely enigmatic. We will continue to investigate the role of PiCo and its interaction with the NTS, in healthy and diseased states. These investigations require many different techniques, and approaches, some of which are still in development. For example, we are currently conducting experiments that silence portions of the NTS related to swallow and PiCo: ChAT/Vglut2 neurons using novel unpublished viral approaches. However, these are separate and ongoing studies beyond the scope of the current one.

      To address the reviewer’s comment, we have added to the following to the limitation section: “In addition, this preparation does not allow for recording of PiCo neurons to evaluate the direct effects of CIH in PiCo neuronal activity”. The following has also been added to the discussion: “Rather, our data reveal CIH disrupts the swallow motor sequence which is likely due to changes in the interaction between PiCo and the SPG, presumably located in the cNTS. While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow motor patterning itself. Here we show for the first time that CIH leads to disturbances in the generation of the swallow motor pattern that is activated by stimulating PiCo. This suggests that PiCo is not only important for coordinating swallow and breathing, but also modulating swallow motor patterning. Further studies are necessary to directly evaluate the presumed interactions between PiCo and the cNTS.”

      (3) The functional significance of the altered (non-classic) patterns is unclear.

      Like in our original study, the preparation used to stimulate PiCo does not allow to simultaneously characterize the functional significance of swallowing. Therefore, we have included this as a limitation in the limitation section: “In this preparation we are unable to directly determine the functionality of the variable swallow motor pattern seen after CIH. Different experimental techniques, such as videofluoroscopy would need to be used to directly evaluate functional significance. This technique is beyond the scope of this study and not possible to perform in this preparation. We acknowledge this limits our ability to make direct comparisons between dysphagic swallows in OSA patients.”

      Reviewer #1 (Recommendations For The Authors):

      (1) A more rigorous experimental approach is required. Littermates should be separated and exposed to either room air or CIH at the same (or close to the same) time.

      As stated above, we did not directly compare mice exposed to room air with mice exposed to CIH. Hence, we believe this is not necessary, and it would have meant repeating all the experiments already published in the original eLife paper.

      (2) Robust statistical analyses are required to determine whether the effects of CIH on the pattern/probability of motor activation are required.

      Since control and CIH group were not compared in this study, statistical hypothesis testing is not appropriate or applicable.

      (3) Use a combination of retrograde, Cre- AAVs and Cre-dependent approaches to interrogate the circuitry to/from PiCO that forms the swallowing network. This is what is needed to push this area forward, in my view.

      Thank you for this suggestion, we will consider this suggestion as we plan for future experiments. Indeed, we are in the process of developing novel approaches. However, in this context we would like to emphasize that further network investigations are exponentially more complicated given that we need to use a Flpo/Cre approach to specifically characterize the glutamatergic-cholinergic PiCo neurons. Most other laboratories that have studied PiCo have avoided this experimental complication and used only a “cre-dependent” approach. This approach is much simpler, but the data are much less specific and the conclusions sometimes misleading. Stimulating for example cholinergic neurons in the PiCo area will also activate Nucleus ambiguus neurons, stimulating glutamatergic neurons will also activate glutamatergic neurons that are not necessarily the glutamatergic/cholinergic neurons that we use to define PiCo specifically. Readers that are unfamiliar with these different approaches often miss this important difference. Hence, compared to stimulating other areas, stimulating the cholinergic-glutamatergic neurons in PiCo is much more specific than e.g. stimulating preBötzinger complex neurons. There are no markers that will specifically stimulate only preBötzinger complex neurons or neurons in the parafacial Nucleus. Unfortunately, this difference is often overlooked.

      (4) It should be made more clear how each of the "non-classic" swallowing patterns could cause dysfunction - especially to the reader who is not completely familiar with the neural control of swallowing.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since our approach does not allow us to use any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not speculated on the functional implications. We have added the following to the discussion section of this manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns. ”

      Minor:

      The Results should be written in a way that better conveys the neurophysiological effects of the manipulations. As it stands, it reads like a statistical report on how activation of each neuronal phenotype is statistically different from each other. As such it is difficult to read and understand the salient findings.

      Thank you for this insight. We have adjusted the language in the results section.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigated the role of a medullary region, named Postinspiratory Complex (PiCo), in the mediation of swallow/laryngeal behaviours, their coordination with breathing, and the possible impact on the reflex exerted by chronic intermittent hypoxia (CIH). This region is characterized by the presence of glutamatergic/cholinergic interneurons. Thus, experiments have been performed in single allelic and intersectional allelic recombinase transgenic mice to specifically excite cholinergic/glutamatergic neurons using optogenetic techniques, while recording from relevant muscles involved in swallowing and laryngeal activation. The data indicate that in anaesthetized transgenic mice exposed to CIH, the optogenetic activation of PiCo neurons triggers swallow activity characterized by variable motor patterns. In addition, these animals show an increased probability of triggering a swallow when stimulation is applied during the first part of the respiratory cycle. They conclude that the PiCo region may be involved in the occurrence of swallow and other laryngeal behaviours. These data interestingly improve the ongoing discussion on neural pathways involved in swallow-breathing coordination, with specific attention to factors leading to disruption that may contribute to dysphagia under some pathological conditions.

      The Authors' conclusions are partially justified by their data. However, it should be acknowledged that the impact of the study is to a certain extent limited by the lack of knowledge on the source of excitatory inputs to PiCo during swallowing under physiological conditions, i.e. during water-evoked swallowing. Also the connectivity between this region and the swallowing CPG, a structure not well defined, or other brain regions involved in the reflex is not known.

      We thank the reviewer for the comments and the strength of the paper. However, with regards to the “lack of knowledge”, we would like to emphasize that PiCo was first described in 2016, while e.g. the preBötzinger complex was described in 1991. Thus, it is not fair to assume the same level of anatomical and physiological understanding for PiCo as we became accustomed to for the preBötzinger complex. We are fairly confident that in 25 years from now, our knowledge of the in- and outputs of PiCo will be much less limited than it currently is.

      Strengths:

      Major strengths of the manuscript:

      • The methodological approach is refined and well-suited for the experimental question. The in vivo mouse preparation developed for this study takes advantage of selective optogenetic stimulation of specific cell types with the simultaneous EMG recordings from upper airway muscles involved in respiration and swallowing to assess their motor patterns. The animal model and the chronic intermittent hypoxia protocol have already been published in previous papers (Huff et al. 2022, 2023).

      • The choice of the topic. Swallow disruption may contribute to the dysphagia under some pathological conditions, such as obstructive sleep apnea. Investigations aimed at exploring and clarifying neural structures involved in this behaviour as well as the connectivity underpinning muscle coordination are needed.

      • This study fits in with previous works. This work is a logical extension of previous studies from this group on swallowing-breathing coordination with further advances using a mouse model for obstructive sleep apnea.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      Major weaknesses of the manuscript:

      • The Authors should be more cautious in concluding that the PiCo is critical for the generation of swallowing itself. It remains to demonstrate that PiCo is necessary for swallowing and laryngeal function in a more physiological situation, i.e. swallow of a bolus of water or food. It should be interesting to investigate the effects of silencing PiCo cholinergic/glutamatergic neurons on normal swallowing. In this perspective, the title should be slightly modified to avoid "swallow pattern generation" (e.g. Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production).

      Thank you for pointing out that this manuscript suggest PiCo is necessary for swallow generation. We agree further interventions to silence specifically PiCo ChAt/Vglut2 neurons will be necessary to investigate this claim. Which we have begun to evaluate for a future study by developing a novel as yet unpublished approach. We have altered language throughout the text to limit the perception that PiCo is the swallow pattern generator. We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

      • The duration of swallows evoked by optogenetic stimulation of PiCo is considerably shorter in comparison with the duration of swallows evoked by a physiological stimulus (water). This makes it hard to compare the timing and the pattern of motor response in CIH-exposed mice. In Figure 1, the trace time scale should be the same for water-triggered and PiCo-triggered swallows. In addition, it is not clear if exposure to CIH alters the ongoing respiratory activity. Is the respiratory rhythm altered by hypoxia? If a disturbed or irregular pattern of breathing is already present in CIH-exposed mice, could this alteration interfere with the swallowing behaviour?

      Thank you. We have changed the time scale so that all representative traces are on the same time scale.

      We explained in the original paper (Huff et al 2023) that the significant decrease in PiCo-evoked swallow duration compared to water evoked is likely due to the absence of oral/upper airway feedback. We are not making comparisons of the effects of CIH on swallow motor pattern between water-evoked and PiCo-evoked. Rather, we are only characterizing the effects of CIH on the swallow motor pattern in PiCo-evoked swallows. The purpose of Figure 1A is to show that the rostocaudal submental-laryngeal sequence in water-evoked swallows is preserved in “canonical” PiCo-evoked swallow like is shown in the original study. While we did not measure the effects of CIH on breathing and the respiratory pattern in this study, it has been established, by others, that CIH causes respiratory muscle weakness, impaired motor control of the upper airway and variable respiratory rhythm and rhythm generation. However, when characterizing the timing of swallow in relation to inspiration (Figure 1 Figure Supplement 1) and the reset of the respiratory rhythm (Figure 3 figure supplement 1) and by observationally comparing these results with mice exposed to room air (Huff et al 2023) we do not observe any obvious differences in swallow-breathing coordination. However, a separate study in wild-type mice focusing on a characterization of swallowing via water after CIH would be better suited to achieve a better understanding of the physiological changes of swallowing after CIH. We would like to point out that this has shown in Huff et al 2022 that altering respiratory rate/pattern via activation of various preBötzinger Complex neurons does not change swallow behavior. Except in the case of Dbx1 PreBötC neuron activation, which was independent of CIH. Increasing or decreasing respiratory rate via activation of PreBötC Vgat and SST neurons did not change the swallow pattern rather it changed the timing of when swallows occurred. It has been reported before by others that swallow has a hierarchical control over breathing and has the ability to shut breathing down. We believe that the swallowing behavior is independent of respiratory pattern and alterations in breathing pattern does not necessarily affect the swallow motor pattern rather could affect the swallow timing.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Lines 37-41 "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly the generation of swallow motor pattern was significantly disturbed."

      It should be better:

      "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly triggers variable swallow motor patterns".

      Thank you, this has been changed

      Lines 41-43 "This suggests, glutamatergic-cholinergic neurons in PiCo are not only critical for the gating of postinspiratory and swallow activity but also play important roles in the generation of swallow motor pattern." I suggest removing any language claiming PiCo is swallow gating and change "generation" in "modulation"

      "This suggests that glutamatergic-cholinergic neurons in PiCo are not only critical in regulating swallow-breathing coordination but also play important roles in the modulation of swallow motor pattern."

      Thank you, this has been changed

      Introduction:

      Line 88-90: Actually, in Huff et al. 2023 it is said "PiCo acts as an interface between the swallow pattern generator and the preBötzinger complex to coordinate swallow and breathing". Please, change accordingly. Please, remove Toor et al., 2019 since their conclusions are quite different.

      Line 100-101: Please, change the sentence according to the comments reported above.

      Thank you, this has been changed

      Results:

      Lines 104-105: Did you mean: "We confirmed that optogenetic stimulation of PiCo neurons in ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH triggers swallow and laryngeal activation similar to the control mice exposed to room air (Huff et al., 2023)." Otherwise, the sentence is not clear.

      Thank you, this has been changed

      Lines 129-130: This finding is not surprising since similar results have been reported in Huff et al. 2023.

      Thank you, we wanted to confirm that CIH did not alter this characteristic, which it did not. We believe that it is important to include this as it is a criterion for characterizing laryngeal activation.

      Lines 219: The number of water swallows is considerably lower than stimulation-evoked swallows. Why?

      We inject water into the mouth three times. Typically, there is one swallow in response to each water injection. Pico is stimulated 25 times at each duration. If we were to stimulate swallow with water as many times as optogenetic stimulation there would be an adaptive response to the water stimulation and the mouse would not respond. This does not seem to be the case with PiCo stimulation. Simple answer is, there are many more PiCo stimulations than water stimulation.

      Lines 228-232: "PiCo-triggered swallows are characterized by a significant decrease in duration compared to swallows evoked by water in ChATcre:Ai32 mice (265 {plus minus} 132ms vs 144 {plus minus} 101ms; paired t-test: p= 0.0001, t= 5.21, df= 8), Vglut2cre:Ai32 mice (308 {plus minus} 184ms vs 125 {plus minus} 44ms; paired t-test: p= 0.0003, t= 6.46, df= 7), and ChATcre:Vglut2FlpO:ChR2 mice (230 {plus minus} 67ms vs 130 {plus minus} 35ms; paired t-test: p= 0.0005, t= 5.62, df= 8) exposed to CIH (Table S1).".

      Thank you, this has been changed

      Line 252 and 254: remove SEM.

      Thank you, this has been changed

      Discussion

      Line 267: ...(Figure 1Bi), while 28% of PiCo-triggered swallows...

      Thank you, this has been changed

      Lines 283-290: "Thus, CIH does not alter PiCo's ability to coordinate the timing for swallowing and breathing. Rather, our data reveals that CIH disrupts the swallow motor sequence likely due to changes in the interaction between PiCo and the SPG, presumably the cNTS.

      While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow pattern generation itself. Thus, here we show for the first time that CIH resulted in the instability of the swallow motor pattern activated by stimulating PiCo, suggesting PiCo plays a role in its modulation.".

      Thank you, this has been changed

      Could the observed effects be due to a non-specific effect of hypoxia on neuronal excitability? In addition, it should be considered that PiCo-triggered swallows lack the behavioural setting of water-evoked swallows and do not activate the sensory component of the SPG to the same extent as the water-evoked swallows.

      Yes, this is very possible. We stated in our first manuscript that the decrease in PiCo-triggered swallow duration, as compared to water-triggered swallow duration, is likely because oral sensory components are not being activated to the same extent (Huff et al. 2023). Since we do not directly measure neuronal excitability, it is not known (in this study) whether CIH causes changes in the excitability to swallow related areas. However, others have shown increased excitability and activity of Vglut2 neurons after CIH exposure (Kline et al 2007,2010), and we have shown e.g. changes in the excitability of preBötC neurons (Garcia et al. 2016, 2017).

      Lines 293-300: The sentence is not clear. Is there any evidence indicating that glutamatergic neurons are differently affected by hypoxia than cholinergic neurons?

      Thank you, these sentences have been changed to increase clarity. The section now reads: There was no statistical difference in the probability of triggering a swallow during optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 neurons in mice exposed to room air (Huff et al 2023). However, when exposed to CIH, ChATcre:Ai32 and Vglut2:Ai32 mice have a lower probability of triggering a swallow -- in some mice swallow was never triggered via PiCo activation, while water-triggered swallows remained – compared to the ChATcre:Vglut2FlpO:ChR2 mice. While it is possible that portions of the presumed SPG remain less affected by CIH, which could offset these instabilities to produce functional swallows, our data suggest that PiCo targets microcircuits within the SPG that are highly affected by CIH. The NTS is a primary first site for upper airway and swallow-related sensory termination in the brainstem (Jean, 1984). CIH induces changes to the cardio-respiratory Vglut2 neurons, resulting in an increase in cNTS neuronal activity (Kline, 2010; Kline et al., 2007), as well as changes to preBötzinger neurons (Garcia et al., 2017; Garcia et al., 2016) and ChAT neurons in the basal forebrain (Tang et al., 2020). It is reasonable to suggests that CIH has differential effects on neurons that only express ChATcre and Vglut2cre versus the PiCo-specific interneurons that co-express ChATcre and Vglut2FlpO, emphasizing the importance of targeting and manipulating these PiCo-specific interneurons.”

      Lines 372-374: "Here we show that PiCo, a neuronal network which is critical for the generation of postinspiratory activity (Andersen et al. 2016) and implicated in the coordination of swallowing and breathing (Huff et al., 2023), is severely affected by CIH.".

      Thank you, this has been changed.

      Methods

      Line 398: Did you mean Slc17a6-IRES2-FlpO-D?

      Thank you, this has been changed.

      Line 399: were.

      Thank you, this has been changed.

      Line 403: ... expressing both ChAT and Vglut2 and will be reported as ChATcre:Vglut2FlpO.

      Thank you, this has been changed.

      Line 437: Mice of the ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 lines were kept in collective cages with food and water ad libitum placed inside custom-built chambers.

      Thank you, this has been changed.

      Line 479: (Figure 6a in Huff et al., 2022).

      Line 497: What does Fig 7 refer to?

      This should say Figure 1- figure supplement 2, This has been changed

      Lines 501-506: "First, swallow was stimulated by injecting 0.1cc of water into the mouth using a 1.0 cc syringe connected to a polyethylene tube. Second, 25 pulses of each 40ms, 80ms, 120ms, 160ms and 200ms continuous TTL laser stimulation at PiCo was repeated, at random, throughout the respiratory cycle. The lasers were each set to 0.75mW and triggered using Spike2 software (Cambridge Electronic Design, Cambridge, UK). These stimulation protocols were performed in all ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2." .

      Thank you, this has been changed.

      Line 526 and 540: (Fig.6 in Huff et al., 2022) and (Fig.6d in Huff et al., 2022).

      Thank you, this has been fixed

      Line 594: Figure 5 doesn't exist. Please, change the sentence.

      Thank you, this has been fixed

      Line 595 and 609: The reference Kirkcaldie et al. 2012 is referred to the neocortex and doesn't seem appropriate. Please, quote the atlas of Paxinos and Franklin.

      Thank you, this has been changed.

      Reference:

      Please, correct throughout the text editing of references by removing e.g J.M. or A. or David D. and so on. Only surnames should be mentioned.

      Thank you, this has been changed.

      Figures:

      Figure 1. A and B as well as the purple arrow are lacking. In addition, optogenetic stimulation is applied during different periods of inspiratory activity and this could impact the swallow motor pattern. In Bv, Non-LAR seems very similar to LAR. In panel E, please add the number of animals.

      Thank you, this has been fixed.

      We used the same optogenetic protocols in the original paper (Huff et al. 2023) and did not observe any changes to the swallow motor patter in relation to the time PiCo was stimulated. The only phase dependent response seen in both control and CIH is when PiCo Is stimulated during inspiration and a swallow is triggered, inspiration will be inhibited. Therefore, we do not believe variability in swallow motor pattern is dependent on the phase of breathing in which PiCo is stimulated.

      Biv LAR has a pause in EMG activity before the swallow begins (red arrow pointing to the pause). While Bv Non-LAR does not have this pause, rather the two behaviors converge (red arrow). In order for something to be considered an LAR the pause must be present which is why we separated these two motor patterns.

      Figure 1 - Figure Supplement 1. Why do the Authors call the lines "histograms"?

      Thank you, this has been fixed. This is a line graph of swallow frequency in relation to inspiration.

      Tables:

      In tables, data are provided as means and standard deviation. Please, specify this in the Method section.

      Thank you, the following is listed in the methods section: “All data are expressed as mean ± standard deviation (SD), unless otherwise noted.”

      Reviewer #3 (Public Review):

      In the present study, the authors investigated the effects of CIH on the swallowing and breathing responses to PICO stimulation. Their conclusion is that glutamatergic-cholinergic neurons from PICO are not only critical for the gating of post-inspiratory and swallow activity, but also play important roles in the generation of swallow motor patterns. There are several aspects that deserve the authors' attention and comments, mainly related to the study´s conclusions.

      • The authors refer to PICO as the generator of post-inspiratory rhythm. However, evidence points to this region as a modulator of post-inspiratory activity rather than a rhythmogenic site (Toor et al., 2019 - 10.1523/JNEUROSCI.0502-19.2019; Oliveira et al., 2021 - 10.1016/j.neuroscience.2021.09.015). For example, sustained activation of PICO for 10 s barely affected the vagus or laryngeal post-inspiratory activity (Huff et al., 2023 - 10.7554/eLife.86103).

      Yes, we did refer to PiCo as the postinspiratory rhythm generator as defined as Anderson et al. 2016. We base this statement on the following criteria and experiments: In Anderson et al. 2016, we demonstrate that PiCo can be isolated in vitro, that PiCo neurons are activated in phase with postinspiration, and that they are inhibited during inspiration by preBötC neurons via GABAergic mechanisms and not glycinergic mechanisms. We also demonstrate that optogenetically stimulating cholinergic neurons in the PiCo area resets the inspiratory rhythm both in vivo and in vitro. We also show that PiCo when isolated in transverse slices is autorhythmic and that PiCo, like the preBötC in transverse slices can generate respiratory rhythmic activity in vitro and independent of the preBötC. We also demonstrate that PiCo neurons are an order of magnitude more sensitive to opioids (DAMGO) than the preBötC and that local injections of DAMGO into the PiCo area in vivo abolishes postinspiration, and also abolishes the phase delay of the respiratory rhythm. None of these specific rhythmogenic properties have been studied by the Toor study or the Oliveira et al study. Hence, we do not understand why the reviewer cites these studies as evidence for modulation as opposed to rhythmogenic properties. The fact that PiCo is rhythmogenic should not be considered as an “exclusive property”. Specifically, this does not mean that PiCo is also “modulating” the swallow-breathing coordination as we have demonstrated more specifically in the Huff et al study. In the same sentence we also referred to the PreBӧtzinger complex as the inspiratory rhythm generator as defined by Smith et al 1991, and it seems that the reviewer did not object to this reference. But we would like to point out that the same criteria were used to define the preBötzinger complex as we used for PiCo, except that PiCo neurons are better defined than preBötzinger complex neurons. Dbx1 neurons are often used to characterize the PreBötC, but these neurons form a rostrocaudal and ventrodorsal column which involves also glia cells and transcends the preBötC. Glutamatergic neurons are everywhere, and so are Somatostatin or Neurokinin neurons. Moreover, the 1991 study was only performed in vitro, and did not include a histochemical analysis. We would also like to point out that the present manuscript is investigating the role of PiCo in swallow and laryngeal behaviors, and not specifically postinspiration. Thus, we are not entirely sure how this comment relates to this manuscript.

      • The optogenetic activation of glutamatergic and cholinergic neurons from PICO evoked submental and laryngeal responses, and CIH changed these motor responses. Therefore, the authors proposed that PICO is directly involved in swallow pattern generation and that CIH disrupts the connection between PICO and SPG (swallow pattern generator). However, the experiments of the present study did not provide evidence about connections between these two regions nor their possible disruption after CIH, or even whether PICO is part of SPG.

      We have edited the text to suggest PiCo modulates swallow motor sequence in addition to the coordination of swallow and breathing. We have also added that further experiments will be necessary to further investigate the connections between PiCo and SPG. But, unfortunately, compared to PiCo, the SPG is much less defined. As already stated above, it cannot be expected that a single study can address all possible open questions. Clearly, more work needs to be done outside of this study to answer all of these questions, which makes this an exciting area of research.

      • CIH affects several brainstem regions which might contribute to generating abnormal motor responses to PICO stimulation. For example, Bautista et al. (1995 - 10.1152/japplphysiol.01356.2011) documented that intermittent hypoxia induces changes in the activity of laryngeal motoneurons by neural plasticity mechanisms involving serotonin.

      Yes, we thank the reviewer for this comment and we agree that CIH effects multiple brainstem regions. We stated in the manuscript that we are measuring changes in two muscle complexes which spread among three motor neuron pools: hypoglossal nucleus, trigeminal nucleus, and nucleus ambiguus. We have added a discussion on laryngeal activity in the presence of acute bouts of extreme hypoxia, acute intermittent hypoxia, as well as chronic intermittent hypoxia.

      • To support the hypothesis that PICO is directly involved in swallow pattern generation the authors should perform the inhibition of Vglut2-ChAT neurons from PICO and then evoke swallow motor responses. If swallow is abolished when the neurons from this region are inhibited, it would indicate that PICO is crucial to generate this behavior.

      Thank you. We would like to clarify: “involvement” does not mean “necessary for”. Confusing this difference has caused much confusion and debate in the field. Just as an example: We can argue in great length whether inhibition is necessary for respiratory rhythmogenesis in vivo, but I think there is no question that inhibition is involved in respiratory rhythmogenesis in vivo. But to avoid any confusion, we have changed the text to suggest PiCo is involved in the modulation of swallow motor sequence. We agree various additional inhibition experiments are necessary to explain if PiCo is also a necessary component of the SPG, but this is not the question we have set out to address in this study. To specifically target PiCo we must not only inhibit Vglut2 neurons but neurons that express both ChAT and Vglut2. To our knowledge there are no inhibitory DREADD or opsin techniques for cre/FlpO to specifically target these neurons. As stated above, non-experts in the field do not appreciate this technical nuance. However, we have begun to develop novel techniques necessary to inhibit these specific neurons which will be published in the future.

      • In almost all the data presented, the authors observed different patterns of changes in the motor submental and laryngeal responses to PICO activation, including that animals submitted to CIH (6%) presented a "normal" motor response. However, the authors did not discuss the possible explanations and functional implications of this variability.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since we are not using any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not included any functional implications. We have added the following to the manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns.”

      • In Figure 4, the authors need to present low magnification sections showing the PICO transfected neurons as well as the absence of transfection in the ventral respiratory column. The authors could also check the scale since the cAmb seems very small.

      Thank you, added different histology images to have a more comparable cAmb. As well as added lower magnification to show absence of transfection in the VRC.

      • Finally, the title does not reflect the study. The present study did not demonstrate that PICO is a swallow pattern generator.

      We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

    1. Re-establishing the three ordersI will now go back through these orders and show how the worldview I have espoused in this essay may be able to re-invigorate them. a. The nomological orderIn the worldview I’ve put forward in this essay, there is a different kind of nomological order. Here there is also an affinity, or deep continuity, between how the mind works and the structure of reality. As I argued in section 4, relevance realization, i.e., the process by which we become more behaviorally attuned to the world, is a particular manifestation of the general process by which the universe at large is continually being created and complexified. In a previous essay I showed that there is a great deal of overlap between relevance realization and the modern science of consciousness. I think Jordan Peterson was right when said that “we are really reflective, including in our consciousness, of something about the structure of reality itself.”Or, as John Vervaeke and colleagues put it, there really are “fundamental principles by which knowledge and reality co-operate” (Vervaeke et al., 2017), and this constitutes a kind of nomological order. b. The narrative orderThe Christian-Aristotelian narrative order was participatory. We were participating in the process by which the kingdom of heaven would be built on earth. In the worldview I’ve put forward in this essay, there is no final “goal” towards which the universe is aiming. Rather, the process itself is the goal. This constitutes an infinite game rather than a finite game. Although we are not participating in a narrative that brings about some final state of utopia, we are capable of participating in a process that is of ultimate value, both for ourselves and for the world at large. Vervaeke and colleagues said that the narrative order:…provided an overarching story into which the minutia of the cosmos―individuals and their own stories―could fit and belong. Further, it introduced the idea that the agency of persons could intervene in the cycle of repetition and meaningfully impact the course of cosmic history.What I am arguing for is not far off from that. Our individual stories do fit into the overarching story of the cosmos (which is, as Azarian suggested, a never-ending story of continual self-organization and complexification). Our actions — every decision we make — can therefore meaningfully impact the course of cosmic history. That constitutes a kind of narrative order. c. The normative orderThe normative order consisted of a connection between ontology and values. In the worldview I have put forward in this essay, there is also a connection between ontology and values. In section 6 I argued that our participation in the process of complexification is biologically and psychologically optimal. This process therefore constitutes an ontological structure that simultaneously informs us about the nature of the good. Ontologically speaking, this process underlies reality as we know it. Normatively, our participation in this process is of ultimate value. This constitutes a kind of normative order. In sum, the worldview put forward in this essay may be able to re-invigorate the three orders, the loss of which precipitated the “meaning crisis” in Western culture.

      Summary

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a valuable finding on the possible use of vilazodone in the management of thrombocytopenia through regulating 5-HT1A receptor signaling. The evidence supporting the claims of the authors is solid, with the combined use of computational methods and biochemical assays. The work will be of broad interest to scientists working in the field of thrombocytopenia.

      Public Review:

      Reviewer #1 (Public Review):

      Summary:

      This is well-performed research with solid results and thorough controls. The authors did a good job of finding the relationship between the 5-HT1A receptor and megakaryocytopoiesis, which demonstrated the potential of vilazodone in the management of thrombocytopenia. The paper emphasizes the regulatory mechanism of 5-HT1A receptor signaling on hematopoietic lineages, which could further advance the field of thrombocytopenia for therapeutic purposes.

      Strengths:

      This is comprehensive and detailed research using multiple methods and model systems to determine the pharmacological effects and molecular mechanisms of vilazodone. The authors conducted in vitro experiments using HEL and Meg-01 cells and in vivo experiments using Zebrafish and Kunming-irradiated mice. The experiments and bioinformatics analysis have been performed with a high degree of technical proficiency. The authors demonstrated how vilazodone binds to 5-HTR1A and regulates the SRC/MAPK pathway, which is inhibited by particular 5-HTR1A inhibitors. The authors determined this to be the mechanistic underpinning for the effects of vilazodone in promoting megakaryocyte differentiation and thrombopoiesis.

      Weaknesses:

      (1) Which database are the drug test sets and training sets for the creation of drug screening models obtained from? What criteria are used to grade the results?

      Response: Thank you for your thoughtful comment. The database is built by our laboratory. Firstly, we collected 39 small molecule compounds that can promote MK differentiation or platelet formation and 691 small molecule compounds that have no obvious effect on MK differentiation or platelet formation to buiid the datbase. Then, the data of the remaining 713 types of small molecule compounds were utilized as the Training set, and the Molecular Descriptors of 2 types of active and 15 types of inactive small molecule compounds were randomly picked as the Validation set. With regard to the activity evaluation criteria, the prediction score for each molecule was between 0 and 1, and the model decision was made with a threshold of 0.5. The molecule with a score above the 0.5 threshold was identified as a megakaryopoiesis inducer (1).

      Reference:

      (1) Mo Q, Zhang T, Wu J, et al. Identification of thrombopoiesis inducer based on a hybrid deep neural network model. Thromb Res. 2023;226:36-50. doi:10.1016/j.thromres.2023.04.011

      (2) What is the base of each group in Figure 3b for the survival screening of zebrafish? The positivity rate of GFP-labeled platelets is too low, as indicated by the quantity of eGFP+ cells. What gating technique was used in Figure 3e?

      Response: We are deeply grateful for the insightful feedback you have provided regarding Figure 3 and the assessment of zebrafish model. We used 50 zebrafish embryos per group to evaluate VLZ toxicity, and we think this is a suitable and fair baseline. Our gating procedure is clearly depicted in the resulting diagram. Since our goal was to evaluate the fluorescence intensity quantitatively, we isolated the entire zebrafish cell. Since the amount of eGFP+ in various zebrafish tissues found in other literature is likewise quite low and we are unsure of the typical eGFP+ threshold for zebrafish (1, 2), we think this finding should be fair given that each group's activities in the experiment were conducted in parallel.

      Reference:

      (1) Yang L, Wu L, Meng P, et al. Generation of a thrombopoietin-deficient thrombocytopenia model in zebrafish. J Thromb Haemost. 2022; 20(8): 1900-1909. doi:10.1111/jth.15772

      (2) Fallatah W, De Silva IW, Verbeck GF, Jagadeeswaran P. Generation of transgenic zebrafish with 2 populations of RFP- and GFP-labeled thrombocytes: analysis of their lipids. Blood Adv. 2019;3(9):1406-1415. doi:10.1182/bloodadvances.2018023960

      (3) In Figure 4C, the MPV values of each group of mice did not show significant downregulation or upregulation. The possible reasons for this should be explained.

      Response: Thank you for your thoughtful comment. Megakaryocytes build pseudopodia, which form extensions that release proplatelets into the bone marrow sinusoids. Proplatelets convert into barbell-shaped proplatelets to form platelets in an integrin αIIbβIII mediated process (1-2). Platelet size is established by microtubule and actin-myosin-sceptrin cortical forces which determine platelet size during the vascular formation of barbell proplatelets (3). Conversion is regulated by the diameter and thickness of the peripheral microtubule coil. Proplatelets can also be formed from proplatelets in the circulation (4). Megakaryocyte ploidy correlates with platelet volume following a direct nonlinear relationship to mean platelet volumes (5). Usually there is an equilibrium between platelet generation and clearance from the circulation (normal turnover) controlled by thrombopoietin. When healthy humans receive thrombopoietin, their platelet size decreases (6). Proplatelet formation is dynamic and influenced by platelet turnover (7) which increases upon increased platelet consumption and/or sequestration. In our study, the MPV values of each group of mice did not show significant downregulation or upregulation, from our point of view, there are several possible reasons for these results.

      (1) Mice in a radiation-damaged state may result in a decrease in platelet count, but at the same time stimulate the bone marrow to release young and larger platelets, thus keeping the MPV relatively stable.

      (2) After radiation injury, bone marrow cells were suppressed, resulting in a decrease in the number of platelets produced, but MPV remained unchanged, possibly because the direct effects of radiation on the bone marrow caused thrombocytopenia, but not necessarily the average platelet size.

      Reference:

      (1) Thon JN, Italiano JE. Platelet formation. Semin Hematol. 2010(3):220-226. doi: 10.1053/j.seminhematol.2010.03.005.

      (2) Larson MK, Watson SP. Regulation of proplatelet formation and platelet release by integrin alpha IIb beta3. Blood. 2006(5):1509-1514. doi: 10.1182/blood-2005-11-011957.

      (3) Thon JN, Macleod H, Begonja AJ, et al., Microtubule and cortical forces determine platelet size during vascular platelet production. Nat. Commun. 2012(3):852. doi: 10.1038/ncomms1838.

      (4) Machlus KR, Thon JN, Italiano JE Jr. Interpreting the developmental dance of the megakaryocyte: a review of the cellular and molecular processes mediating platelet formation. Br. J. Haematol. 2014(2):227-36. doi: 10.1111/bjh.12758.

      (5) Bessman JD. The relation of megakaryocyte ploidy to platelet volume. Am. J. Hematol. 1984(2):161-170. doi: 10.1002/ajh.2830160208.

      (6) Harker LA, Roskos LK, Marzec UM, et al., Effects of megakaryocyte growth and development factor on platelet production, platelet life span, and platelet function in healthy human volunteers. Blood. 2000(8):2514-2522. doi: 10.1182/blood.V95.8.2514.

      (7) Kowata S, Isogai S, Murai K, et al., Platelet demand modulates the type of intravascular protrusion of megakaryocytes in bone marrow. Thromb. Haemost. 2014(4):743-756. doi: 10.1160/TH14-02-0123.

      (4) The PPI diagram and the KEGG diagram in Figure 6 both provide a possible mechanism pathway for the anti-thrombocytopenia effect of vilazodone. How can the authors analyze the differences in their results?

      Response: We are appreciated your valuable comments. PPI (Protein-Protein Interaction) refers to the interaction between proteins. Inside cells, proteins interact with each other to perform various biological functions, influencing cell signaling, metabolic pathways, cell cycle, and more. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database that integrates information on genomes, chemicals, and biological systems. In pharmacoinformatic, KEGG pathways are often used to understand the molecular mechanisms of specific diseases or biological processes. KEGG contains the interrelationships between genes, proteins, and metabolites, helping to reveal key nodes in biological processes. PPI information can be integrated with data from KEGG pathways, such as metabolic and signaling pathways, to gain a more comprehensive understanding of the role of protein-protein interactions in cellular processes and biological functions. For example, by analyzing nodes in the PPI network, proteins associated with a specific disease can be identified, and further examination of these proteins' locations in KEGG pathways can reveal molecular mechanisms underlying the onset and development of the disease. However, this method also has some limitations:

      Uncertainty (1): The construction of protein-protein interaction networks and drug interaction networks involves many assumptions and speculations. The edges of these networks may be based on experimental data but can also rely on bioinformatics predictions. Therefore, the accuracy of predictions is limited by the quality and reliability of the data used during network construction.

      Insufficient data (2): Despite the availability of a large amount of bioinformatics data for network construction, interactions between some proteins and drugs may still lack sufficient experimental data. This data insufficiency can result in inaccuracies in network predictions.

      Dynamics and temporal-spatial changes (3): The dynamics and temporal-spatial changes in biological systems are crucial for drug effects. Pharmacoinformatic may struggle to capture these changes as it often relies on static network representations, overlooking the temporal and dynamic nature of biological systems.

      Reference:

      (1) Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics. 2020(1):442. doi: 10.1186/s12859-020-03773-2.

      (2) Zhang S, Zhao H, Ng MK. Functional module analysis for gene coexpression networks with network integration. IEEE/ACM Trans. Comput. Biol. Bioinform. 2015(5):1146-1160. doi: 10.1109/TCBB.2015.2396073.

      (3) Cinaglia P, Cannataro M. A method based on temporal embedding for the pairwise alignment of dynamic networks. Entropy (Basel). 2023(4):665. doi: 10.3390/e25040665.

      (5)-HTR1A protein expression is measured only in the Meg-01 cells assay. Similar quantitation through western blot is not shown in other cell models.

      Response: Your insightful criticism and recommendation to use different cell models in order to obtain a more accurate depiction of 5-HTR1A protein expression are greatly appreciated. We completely concur that using this strategy would greatly increase the validity of our research. However, establishing a primary megakaryocyte model requires specialized expertise and technical resources, which unfortunately are not readily available to us within the given timeframe. Nevertheless, we acknowledge the limitations of Meg-01 cells, which may exhibit distinct properties compared to true megakaryocytes. To mitigate this concern, we have ensured robust experimental design and rigorous data analysis to interpret our findings within the context of these model cell lines. We believe our results still provide valuable insights into megakaryocyte differentiation and address an important biological question.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to understand the mechanism of how a drug candidate, VLZ, works on a receptor, 5-HTR1A, by activating the SRC/MAPK pathway to promote the formation of platelets.

      Strengths:

      The authors used both computational and experimental methods. This definitely saves time and funds to find a useful drug candidate and its therapeutic marker in the subfield of platelets reduction in cancer patients. The authors achieved the aim of explaining the mechanism of VLZ in improving thrombocytopenia by using two cell lines and two animal models.

      Weaknesses:

      Only two cell lines, HEL and Meg-01 cells, were evaluated in this study. However, using more cell lines is really depending on the workflow and the grant situations of the current research team.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. We fully agree that CD34+ hematopoietic stem/progenitor cells or primary megakaryocytes would provide a more accurate representation of in vitro megakaryopoiesis compared to HEL and Meg-01 cells, which possess limited potential for this process. We acknowledge that our current study did not include experiments with these preferred cell models. This is because our laboratory is still actively developing the technical expertise and resources required for establishing and maintaining primary megakaryocyte and CD34+ cell cultures. Despite the limitations of the current study, we believe the results using HEL and Meg-01 cells provide valuable preliminary insights into the potential effects of VLZ on megakaryocyte differentiation. We are actively working to overcome these limitations and plan to incorporate these more advanced models in our future investigations.

      Reviewer #1 (Recommendations For The Authors):

      I think the authors can enhance the mechanism study by developing more reliable models and methodologies. The connection to clinical research should be strengthened at the same time.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. Despite the limitations, we are committed to expanding our research in the future by incorporating your suggestion and establishing a primary megakaryocyte model to further validate our findings and strengthen our conclusions. At the same time, we wholeheartedly concur with your suggestion to combine clinical research. Unfortunately, VLZ is not a first-line treatment for depression in China, and getting blood samples from the matching number of patients for analysis is a challenge. To give additional experimental support for the medication, we have attempted to improve the data in vivo as much as feasible, including by implementing the intervention in normal mice. Our findings should also contribute to the theoretical underpinnings of this medication and aid in its practical application.

      Reviewer #2 (Recommendations For The Authors):

      Issues the authors need to address:

      Figure 7: Why the band intensity of GAPDH in b or e is much greater than that in f, g, or h?

      Response: Thank you for your careful observation and insightful comment regarding Figure 7. Because the concentration of each batch of protein samples is different, sometimes the GAPDH band strength is increased by the large loading volume. Other factors that may influence the GAPDH band strength include the instrument's contrast adjustment during exposure and the use of different numbers of holes for electrophoresis. Meanwhile, the original three replicate results of all WB results will be provided in the supplementary materials.

      Finally, we sincerely thank you for providing us with this opportunity to make a further revision and modification of our manuscript, and your valuable and scientific comments are useful for the great improvement of our manuscript!

    1. Book Summary:PART 1: FUNDAMENTAL TECHNIQUES IN HANDLING PEOPLEPrinciple 1: Don't Criticise, Condemn or ComplainCriticism is futile, it makes the other person strive to justify himselfCriticism doesn't correct a situationWhen you give a person criticism, they will never make lasting changes in the things you criticised them forDon't criticise anyone; "they are just what we would be in similar circumstances"📝Action Step: Ponder and journal on all the instances when you criticised someone on something they valued or were making progress in (e.g. studies, business, sport). Journal on why you said that, really get to the roots of your beliefs. Go and message the person you criticised and tell them you're sorry. Next time don't criticise ANYONE."Don't complain about the snow on your neighbour's roof, when your own is unclean"🤔Action Step: Think of all the times when you complained in the last week or so. Write it down/type it out, then write next to the complain, what an alternative for the complain could be. Next time NEVER complain."I will speak ill of no man ... and speak all the good I know of everybody"Principle 2: Give Honest And Sincere AppreciationHumans all want to have the feeling of importance in societyAndrew Carnegie praised his associates publicly and privately to handle them better"Don't be afraid of enemies that attack you, be afraid of friends that flatter you"If someone makes a mistake, don't condemn them, appreciate their good points, and reward them through praise🗣️Action Step: The next time you see someone making progress or working really hard, go and give them a compliment (give them honest and sincere appreciation) - Go to AG wins and comment on a win —> DO THIS RN OR YOUR A JEFFREY"Every man I meet is superior to me in some way, in that way I learn of him"Principle 3: Arouse In The Other Person An Eager WantThe only way to influence other is to talk about what they want and show them how to get it💡Action Step: The next time you come across a situation where you have to make someone do something under your responsibility/leadership, ponder for a second, "How can I make this person want to do it?", really get into their shoes - journal/ponder on it, then apply it to the person in real life — or, if you sell a product, ask yourself, "How can I make this person want to buy it?", use the feedback and apply it"If there's a secret to success, it's the ability to get into the POV of the other person and see thingsPART 2: 6 WAYS TO MAKE PEOPLE LIKE YOUPrinciple 1: Be Genuinely Interested In The Other PersonYou can make more friends in 2 months by becoming interested in others, than you can in 2 years by being interested in yourselfMake yourself do things for others — things that require time, thoughtfulness/unselfishness😢Action Step: Whenever you see someone that is in need of help in their life, or is struggling, go and give them advice. Be genuinely interested in helping them improve rather than helping yourself —> Do this in AG right NOW."We are interested in others when they are interested in us"Principle 2: SmileWhat one wears on one's face is far more important that the clothes on one's backHappiness doesn't depend on outer conditions, it depends on inner conditions😀Action Step: Start SMILING RIGHT NOW, Literally, Just put a smirk on your face and wear it for the rest of the day (see how people respond to it)"There is nothing good or bad, it is thinking that makes it so"Principle 3: Remember A Person's Name To That Person Is The Sweetest Sound🤝Action Step: Whenever you meet someone new, find out their complete name and associate it with an image in your headYour name to you is more important than 1000 other names of othersPrinciple 4: Be A Good Listener, Encourage Others To TalkListening is one of the highest compliments we can pay to anybodyGood conversationalist = Good Listener (be attentive)To be interesting, be interested🗣️Action Step: The next time you socialise with someone, make them to 80% of the talk, ask them open-ended questions, and let them freely answer (follow the 80/20 principle)Principle 5: Talk In Terms Of The Other Persons Interest💡Action Step: When talking to someone else, talk about something that they're interested in (e.g. self-improvement, sports), then let the conservation freely flow on that topic, pick their brain on that topic, ask them questionsPrinciple 6: Make The Other Person Feel Appreciated And ImportantAlways make the other person feel appreciated and importantUse phrases like, "I'm sorry to trouble you", "Would you be kind as to ____", "Would you mind"🤷‍♂️Action Step: The next time you have to call someone, or tell someone to move, use of the phrases abovePART 3: HOW TO WIN PEOPLE TO YOUR WAY OF THINKINGPrinciple 1: The Only Way To Get The Best Out Of An Argument Is To Avoid It, You Can't WinWhy argue?"A man convinced against his will, is of the same opinion still""Hatred is never ended by hatred, but by love"😠Action Step: The next time you're talking to someone and you notice them starting to escalate into an argument, end it right there by showing love (e.g. give them a compliment, express gratitude)Principle 2: Show Respect For The Other's Opinion, Never Say "You're Wrong"If you're going to prove something, don't let anyone know it"Be wiser than other person if you can, but do not tell them so"If someone says something wrong say, "I thought otherwise", "I may be wrong ____"Telling someone directly that they're wrong can cause a lot of damage💬Action Step: When you're in a discussion with someone, let's say one of your JEFFREY friends at school, he says Junk FOOD is fine, instead of saying "you're wrong", use one of the phrases above, repeat in a much friendlier tonePrinciple 3: If You Are Wrong, Admit Quickly And EmphaticallyAdmit quickly that the other person is right and you are wrong in a friendly toneYou need to have courage to have the ability to criticise yourself🤨Action Step: The next time you find yourself having made a mistake in front of others, admit it straight away in a friendly manner. Make sure you don't cause damage to others while doing so.Principle 4: Begin In A Friendly Way"A drop of honey catches more flies than gallon of gall"Always begin the conversation in a friendly manner and friendly tone💭Action Step: The next time you have a conversation with someone, start the conversation with a positive vibe, and friendly tone.Principle 5: Get The Other Person Saying "Yes" "Yes" ImmediatelyDon't start a convo with things you differ from, start with things you agree onAt all costs, keep the person from saying "no" at the startIt is much more profitable to set things from the other person's view point and make them say "yes"🙌Action Step: After bringing the positive vibe to the conversation, start talking about things you agree on to the other person, and ask them questions which deliberately provoke a "yes" response. Brainstorm a little on this in your brain before proceeding the person.Principle 6: Let The Other Person Do A Great Deal Of The TalkingEncourage them to talk, if you disagree, hold silent, listen with an open mind"If you want enemies, excel your friends; if you want friends; let your friends excel you" - keep quiet about your accomplishments, don't talk about them, unless somebody asks🏆Action Step: Follow the 80/20 rule when talking in convo, only talk about the other person, their interests, don't show off in the conversation to look cool (e.g. saying you earn $10k/m online), keep quiet, remain humble in the conversationPrinciple 7: Let The Other Person Feel The Idea Is TheirsMaking someone feel that the idea is theirs is like giving them a compliment💡Action Step: The next time you come up with a great idea and you implement it, and it gives your reasonable success, thank the friend that helped you generate the idea (e.g. tag someone in AG because they helped you start a profitable business)Principle 8: Try Honestly To See Things From The Other Person's POVPeople may be totally wrong, but don't condemn them, try to understand them, their situation🧐Action Step: The next time you're in a conversation, and someone has said something that is completely wrong, and you thought to yourself "why did he/she say that!" - empathise their situation and see things from their POV (e.g. say to yourself, "I would've done the same if I was in that situation)Principle 9: Be Sympathetic With The Other Persons's POV3/4 of people which you meet crave sympathy, go give it to themPut yourself in the shoes of the other person at the start of a conversation, or deal😊Action Step: Another tip to just keep at the back of your head is to see things from the other person's POV, have sympathy for the situation their own. Really put your shoes in the other person, make yourself feel that you're the other person, see things from a new REALITY.Principle 10: Appeal To The Nobler MotivesAlways choose a nobler motive when you assume something about othersBe the kind of leader who appeals to what really matters and, even when the feedback is tough, reminds people why they're really therePrinciple 11: Dramatise Your IdeasTruth isn't enough, the truth has to be made vivid, interesting dramatic🕺Action Steps💡Make your ideas more obvious, interesting, and vivid to peopleUse drama and showmanship to capture attention and imagination to make your ideas more impressiveWhen presenting an idea, make it more exciting than it really isPrinciple 12: Throw Down A Challenge"The way to get things done is to throw down a competition"🥵Action Step: When you're doing something that many others are doing (e.g. participating in a challenge), ask someone participating and throw down a challenge to them (e.g. whoever finishes the challenge first wins)PART 4: BE A LEADER - HOW TO CHANGE PEOPLE WITHOUT GIVING OFFENCEPrinciple 1: Begin With Praise And Honest AppreciationAppreciate the person first before bringing up your problem for resolution🗣️Action Steps:e.g. if someone did a random act of kindness for youTell the person that you appreciate the actTell them how it made you feel goodCongratulate and tell them that it was beyond expectationsPrinciple 2: Call Attention To People's Mistakes IndirectlyWhen indirectly criticising someone, never use the word "but", use "and" insteadThis technique works well for sensitive people who resent criticism💭Action Step: Praise a quality, and also a quality that you want to see the improvement in of someone else (e.g. if someone doesn't keep his house clean, say, "I appreciate the effort you put in to make the house clean")Principle 3: Talk About Your Own Mistakes Before Criticising The Other PersonTalk about your own shortcomings, before judging someone (e.g. asking them to improve)😆Action Step: If again you want to see a direct improvement in someone, before telling them, talk about your own mistakes in that area you want to see improvement in from the other person, tell them a joke about you, a story about the mistakes you madePrinciple 4: Ask Questions Instead Of Giving Direct OrdersAlways give people the opportunity to do things by themselves through questionsResentment is caused by a brash order that may last a long time😤Action Step: When you need something done by someone else, don't give them a direct order. Give the person an opportunity to do things by asking questions (questions must be relevant to the task that you need done)Principle 5: Let The Other Person Save FaeFinding faults in the other person will make them resent you❌Action Step: Instead of directly pointing out the faults in the other person, let them save face and find their own mistakes (or point it out indirectly)Principle 6: Praise The Slightest Improvement, And Praise Every ImprovementFaults start to disappear after you give praise😊Action Step: When you see someone making progress, or you see growth, praise them on their hard work, and praise the improvementPrinciple 7: Give The Other Person A Fine Reputation To Live Up To💡Action Step: If you want to improve a person in a certain area, act as though that trait was already one of his or her outstanding characteristics (e.g. make it seem as if they already have that trait)Principle 8: Use Encouragement, Make The Fault Seem Easy To CorrectLet the other person know that you have faith in their ability to performa task💪🏿Action Step: When you see a fault, and they're trying their best to fix it, let them know that you have full faith in themPrinciple 9: Make The Other Person Happy About Doing The Thing You SuggestGive some reward for performing what you want to the other person, and take away a little for something which they do not doRules for making other person happy about thing you suggest:Be sincere, do not promise anything you can't deliverKnow exactly what it is you want the other person to doBe empathetic, ask yourself what it is the other person really wantsConsider the benefits the person will receive from doing what you suggestMatch those benefits to the other person's wantsWhen you make your request, put in a form that will convey to the other person the idea that he personally will benefit from

      how to win friends and influence people summary

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer_01

      Major comments:

      1. The authors cite that acetylated and tyrosinated microtubules have different spatial and compartmental distribution in dendrites and axons and investigate the distribution in the AIS of nonAcD cells and AcD cells, as well as the stem dendrites. However, they just show one example of two different cells (Figure 2D and E) without any statistical analysis. Either, they should remove this part or provide a thorough quantification. Reply: The spatial and compartmentalized distribution of stable and dynamic MTs in the dendrites and axons of nonAcD neurons has been extensively studied and reviewed (see Kapitein & Hoogenraad, 2011; Katrukha et al., 2021; Tas et al., 2017 for reference). However, the organization of the MT cytoskeleton in AcD neurons is still unknown. Here, we provide the very first evidence on the distribution of tyrosinated and acetylated MTs in AcD neurons, as well as data on MT orientations. We agree with the reviewer that to make our results on the spatial organization of these post-translational modifications in AcD neurons more complete, we need to provide a more thorough quantification analysis.

      To achieve this, we plan to perform immunostainings on DIV10 neurons using antibodies against tyrosinated (tyr) and acetylated (ac-) tubulin to label dynamic and stable MTs, respectively. Subsequently, we will conduct high-resolution 3D confocal imaging and measure fluorescent intensity to illustrate the abundance and staining patterns of tyr- and ac- MTs in the axons and dendrites of AcD neurons. Since the spatial distribution of tyr- and ac-MTs is distinguishable with confocal microscopy, we will retain STED examples in the figures but conduct new analyses on confocal imaging data. We will measure the total fluorescent intensity of tyr- and ac- MTs in different compartments of AcD neurons and normalize it to the size of the measured area. We will then compare the normalized intensity values between the axons and dendrites of AcD neurons to examine whether there is a specific distribution pattern of stable and dynamic MTs. We will analyse at least 3 independent primary culture preparations with a minimum of 30 cells. Using the same dataset, we will also quantify the percentage of AcD neurons with ac-MTs specifically elongating into the axon compared to AcD.

      The authors use EGFP-Rab3A vesicle to investigate anterograde transport at the axon and dendrites. They find a slightly faster transport of these vesicles at the AIS of AcD cells and conclude the axonal cargos in general are transported faster across the AIS in AcD cells. In my opinion, this generalization based on one type of vesicle is too farfetched.

      Reply: The Rab3A protein is associated with pre-synaptic vesicles that are transported by KIF1A and KIF1Bβ, members of the kinesin-3 family, towards pre-synaptic buttons (see Guedes-Dias & Holzbaur, 2019; Niwa et al., 2008 for reference). Since KIF1A and KIF1Bβ are common motor proteins that mediate MT-based transport of different types of vesicles (e.g., synaptic vesicles and dense-core vesicles, see Carabalona et al., 2016; Helmer & Vallee, 2023 for reference), we reasoned that Rab3A should be a representative marker for an axonal cargo. However, this indeed does not rule out whether the faster trafficking effect we saw is specific to presynaptic vesicles, as different types of vesicles tend to recruit different modulators that could lead to different trafficking features.

      To address this question, we will perform a live-imaging experiment including two additional organelle marker proteins, Neuropeptide Y (NPY) and Lysosome-associated membrane protein 1 (Lamp1). NPY is transported into the axon via KIF1A and KIF1Bβ-mediated dense-core vesicles (see Helmer & Vallee, 2023; Lipka et al., 2016 for reference). Lamp1 is associated with lysosomes and a range of endocytic organelles that recruit both kinesin-1 and kinesin-3, and are transported into both axons and dendrites (as reviewed in Cabukusta & Neefjes, 2018). By introducing two additional types of vesicles, we should be able to answer whether AcD neurons, in general, tend to transport cargoes into the axon faster than nonAcD neurons.

      __Minor comments: __

      In the introduction, the authors describe how synaptic inputs are received at the dendrites and propagated to the soma in the form of membrane depolarizations. They should add 'excitatory' to synaptic inputs or also describe the impact of inhibitory synaptic inputs at the dendrites.

      In my opinion, Figure 2 could be presented in a slightly better way. The lower part of panel A better fits to panel B, which is next to the upper part of panel A. I understand that the authors systematically present their data first for nonAcD cells and then for AcD cells. However, in this special case it is a little bit more difficult to read the current figure in that order. The results displayed in Figure 4 are presented in a slightly confusing order. The authors jump from 4D to 4G, then to 4I and 4E, 4H, 4F. Similarly, 4M and N are addressed before 4O and P to finally get to 4K and L. It would be beneficial to present and address the data in a stringent way.

      Reply: Thank you for the suggestions on how to improve the data representation in the figures. We will change Figures 2 and 4 and make adjustments in the text upon revision since we also plan to include additional data.

      Reviewer_02

      Major comments:

      1. The authors suggest that there is reduced Na+ channel density at AcD AIS compared to other AIS arising from the cell body. This is not convincing. Immunostaining for Na+ channels is notoriously difficult and sensitive to fixation since the epitopes of the anti-Pan Nav antibodies are highly sensitive to fixation. In addition, this is based on immunofluorescence intensity quantification. Since the mechanism of localization is through binding to AnkG, the authors should also measure other AIS proteins like AnkG, b4 spectrin, and Nfasc. Do these change? If all uniformly change I would be much more inclined to accept the conclusion. If they do not change, it still doesn't rule out the concern about fixation conditions and slight differences in the cultures. The authors indicate there is about a 40% reduction in fluorescence intensity. That is quite large. This big difference should also be confirmed in brain sections. Reply: The potential fixation issue and antibody sensitivity on Na+ channel staining are indeed valid considerations, and we are aware of them. However, it should be noted that we used pan-Na+ channel antibodies that were previously characterised and widely used in literature (see Solé et al., 2019; Yang et al., 2020 for references). Furthermore, our samples underwent the same fixation and staining protocol, and comparable numbers of AcD and nonAcD neurons were imaged from the same preparation and coverslip for each experiment. Imaging settings were also kept constant. Any loss of Na+ channel staining at the AIS due to fixation should affect both neuron types and therefore our conclusion is justified. Nevertheless, the reviewer's point regarding other AIS components is valid and will be investigated further in the revised manuscript.

      Following the reviewer's suggestion to further strengthen our conclusion, we will measure the intensity of AnkG, βIV-spectrin, and neurofascin in DIV21 AcD and nonAcD neurons. We will compare a minimum of 3 independent cultures, each containing at least 10 cells of each type per culture.

      We agree with the reviewer that confirming observed differences in Na+ channel staining using brain slices would be beneficial. However, conducting such experiments presents several challenges. Firstly, one approach could involve immunostaining with antibodies against AIS marker AnkG, in combination with somatodendritic marker MAP2 and pan-Nav. However, this method lacks the advantage of clearly identifying neuronal morphology as seen in dissociated cultures, making the outcome unclear and difficult for analysis and interpretation. Alternatively, the use of Thy1-GFP rats, where a subset of neurons is labelled with GFP, could allow for morphological studies. Unfortunately, we do not have access to this rat line, and the process of importing it, obtaining permits, and establishing a colony is beyond the timeframe for manuscript revision. Additionally, while pan-Nav antibodies have shown reliability in dissociated cultures, their efficacy in tissue staining is less certain. We could provide example images upon request. Secondly, endogenously labelling of Na+ channels is another option, but remains a significant challenge. Recent developments in endogenous labelling, such as the CRISPR/Cas9-based method using pORANGE by Fréal et al. (Fréal et al., 2023), and the generation of Scn1a-GFP transgenic mice by Yamagata et al. (Yamagata et al., 2023), offer potential solutions. However, the labelling efficiency of pORANGE is uncertain, and both methods are time-consuming and cannot be completed within the three-month revision period.

      As an alternative, we propose emphasising that our results are based on in vitro experiments and discussing the advantages and limitations of this approach in the discussion section.

      The analysis of inhibitory synapse differences at the AIS are also not compelling - this is a limitation of the culture system. The authors have no control over the density of inhibitory neurons in the culture well. This interaction is not intrinsic to the AcD neuron, but rather a feature of neuron-neuron interactions which should only be modelled in the animal.

      Reply: The reviewer is correct in pointing out that establishing inhibitory synapses at the AIS is not an intrinsic feature of AcD neurons; it depends on the network and should be modelled in animals. We will include this limitation of the cell culture model in the discussion section in the revised manuscript. We also understand the reviewer's concern that the lower amount of inhibitory synapses at AcD neuron AIS might be due to uneven density of inhibitory neurons between cultures. Nonetheless, assuming that the number of inhibitory neurons is constant between preparations, it is an interesting observation that AcD neurons form fewer inhibitory synapses at the AIS. This may be related to the features of the AIS and its morphology and should be further investigated.

      To make our study more comprehensive and also address the reviewer's concern regarding the presence of inhibitory neurons, we will perform immunostainings in dissociated cultures (40.000 cells per 18 mm coverslip, same as in experiments with synapse quantification) with antibodies against pCaMKIIa, an excitatory neuron marker, and GAD1, a marker for inhibitory neurons. Then, we will quantify the density of inhibitory neurons in the culture. We will perform measurements from 3-6 independent cultures by analysing large fields of view in different areas of a coverslip (20-30 neurons per area) to determine if the density of inhibitory neurons varies between cultures as well as preparations. Furthermore, as also requested by reviewer 4, we will perform new immunostainings where pre- and post-synaptic markers (VGAT and Gephyrin) will be included in the same sample together with the AIS (AnkG or Neurofascin) and dendritic marker (MAP2). Synapses that contain pre- and post-synaptic components will be analysed and included in the revised version of the manuscript.

      Finally, the major limitation of this study is that it is performed in vitro. Surprisingly, the authors actually argue this is a feature of their system. While it is true some of the questions can be addressed perfectly well in vitro, many cannot. In the first paragraph of the results the authors state an advantage of their system is that there are no microenvironments to influence the development of the AcDs. I'm afraid I view this as a drawback. The authors suggest this is an opportunity to examine intrinsic mechanisms of development - true, but it also foregoes the opportunity to determine if the outcomes are different from what occurs in vivo. To this point, the authors report that only 15-20% of the population of hippocampal neurons in culture are AcD neurons. But in their introduction they cite other literature indicating 50% of hippocampal neurons in vivo are AcD neurons - this suggests that the environment of the hippocampus in vivo influences whether a neuron becomes an AcD neuron or not.

      Reply: The reviewer is right in pointing that the in vivo environment could indeed affect AcD neuron development, and we also find this to be a very interesting topic to investigate in the future. Even more intriguingly, as shown in a preprint by Lehmann et al. (doi: https://doi.org/10.1101/2023.07.31.551236), network activity stimulates neurons to acquire AcD morphology. While it is true that the impact of the microenvironment on AcD neuron development cannot be studied in dissociated cultures, our in vitro data undoubtedly support the fact that hippocampal neurons can intrinsically develop into AcD morphology independent of the in vivo environment. As also mentioned in the next point, our statement "...their development must be driven by genetically encoded factors rather than specific..." might sound too definitive and therefore eliminate possible effects from the microenvironment. We will revise this part. Although it is highly desirable to move cell biological studies from neuronal cell cultures to tissue, to date, it is still very challenging to perform many of experiments which we did in this study in slices or living animals due to a lack of appropriate technologies and tools. We are convinced that many basic biological questions can be and should be studied in simplified culturing models because they are truly fundamental, they should also be reproducible in these models.

      To address the reviewer's question regarding the percentage difference between our data and the previous study by Thome et al. (2014), several factors should be considered. First, as noted by the reviewer, our results were obtained from an in vitro system, which is not directly comparable to the in vivo model system used in Thome et al.'s study (Thome et al., 2014). Second, the age of the neurons quantified in our developmental experiments is DIV5 and DIV7. This young age disparity could contribute to the percentage difference, as Thome et al. analyzed neurons from P28-35 adult animals, where 50% of the AcD neuron population was observed, specifically in the CA1 region. Third, it's important to note that in other hippocampal regions, the percentage of AcD neurons is lower (approximately 20-30%). Since our hippocampal primary cultures contain neurons from all hippocampal regions, this may have averaged out our quantification of AcD neuron percentage. Additionally, in the study by Benavides-Piccione et al. (Benavides-Piccione et al., 2020), they reported 20% AcD neurons in the CA1 region of hippocampi isolated from 8-week-old mouse pups, a number similar to what we observed in vitro. Interestingly, Thome et al. reported that in P8 pups, AcD neuron population in hippocampal CA1 region is 30%. This number increased to 50% in adult animals at age of P28-35, suggesting there is perhaps an age dependent increase of AcD neuron population. This could be an additional reason of why we only saw 15-20% of AcD neurons in our in vitro system, regardless of the in vivo environment.

      In the revised version, we will clarify these points in the introduction and discussion sections. Additionally, we will quantify the proportion of AcD neurons in mature DIV21 dissociated hippocampal cultures and compare it to DIV7 cultures to assess whether there is an increase in the AcD population over time. We believe that this experiment, combined with the explanations provided above, will sufficiently address the reviewer's question. However, it is important to acknowledge that the establishment of neuronal networks in vitro differ from those in vivo. Therefore, there may be potential differences in the outcomes.

      I appreciated the balanced discussion of whether this is a stochastic or genetically programmed process. This could have been emphasized earlier in the results since the authors invoke the concept that "...their development must be driven by genetically encoded factors rather than specific...". The authors have not shown this and cannot show it in this system. Indeed, as stated in point 4 above, I think their data argue against a simple genetic program.

      Reply: As suggested by the reviewer and noted in point 4, we will revise the section on AcD neuron development in our manuscript to emphasize that hippocampal neurons may adopt AcD morphology through genetic or stochastic mechanisms. While we acknowledge that environmental and activity factors may also influence this process, particularly in mature neurons, our study focuses on developing neurons where genetic and stochastic factors are likely to be predominant. This conclusion is supported by the observation that neurons develop into AcD morphology in vitro, where environmental and activity patterns do not mimic those of in vivo systems.

      Indeed, our current manuscript does not explore genetic factors involved in AcD neuron development. To address this question, one approach could be to label AIS markers endogenously in dissociated cultures using the PORANGE method (see Willems et al., 2020 for reference) or utilize AnkG-GFP transgenic mice (Fréal et al., 2023; Thome et al., 2023) along with a volume marker like mRuby or GFP. This would allow for the identification of AcD and nonAcD neurons in vivo and in vitro, followed by single-cell transcriptomics analysis to uncover potential genetic factors. Subsequently, candidate genes could be manipulated to demonstrate their essential role in AcD neuron development. However, such experiments require significant time and resources beyond the scope of our current revision timeframe. Nonetheless, this question presents an exciting direction for future research.

      Reviewer 3

      Major comments:

      1. The authors classify neurons into axon-carrying dendrite (AcD) and non-AcD neurons by measuring the stem dendrite length (> 3 µm). I could not find the validity for this cut-off. The non-AcD neurons in Fig. 6B appear more AcD to this reviewer, and, in addition, other researchers have proposed a third category of 'shared root' neurons (doi: 10.7554/eLife.76101). For purposes of reproducibility and transparency, please provide first a comprehensive overview of the entire population of morphologies (i.e. all cells in control conditions). The distances from the soma could be plotted in histogram (etc.) and authors may want to think about independent supporting evidence for the cut-off to classify AcD and non-AcD neurons. Reply: Concerning the validity of AcD neuron classification, we did measure the length of the stem dendrite, as shown in Figure S4G, with an average distance of around 10 µm. However, we admit that this information is presented relatively late in the manuscript. To address the reviewer's criticism, in the revised version, we will include a supplementary figure displaying a gallery of representative images of both AcD and nonAcD neurons analyzed in our study (please refer to Hodapp et al., 2022; Fig S1 C&D; Fig S3 as an example). Given the sample size of AcD and nonAcD neurons in our study, including all images would result in a very large figure (for example, Figure 1: DIV5: 83 AcD neurons out of 427 cells, DIV7: 47 AcD neurons out of 387 cells). We will only show representative examples of AcD neurons in the gallery. Additionally, as suggested, we will plot the length of the stem dendrite (or axon distance) of AcD neurons as a histogram to demonstrate that the AcD neurons included in our study indeed have a stem dendrite longer than 3 µm. To further validate the used classification method, we will measure the diameter of the stem dendrite in all analyzed AcD neurons and then compare the distance between the soma and the start of the axon in each analyzed AcD neuron to the diameter of its stem dendrite. As described by Hodapp et al. (Hodapp et al., 2022; Fig S1A), AcD neurons are expected to have a stem dendrite longer than their diameter.

      We have considered having independent evidence to support the classification of nonAcD and AcD neurons. However, the method used by Thome et al. and Wahle et al. for AcD and nonAcD neuron classification is well established and widely accepted (see Thome et al., 2014; Wahle et al., 2022 for references). Similar standards were also employed by Benavides-Piccione et al. (Benavides-Piccione et al., 2020). Introducing independent evidence could potentially raise further doubts, so we have chosen to maintain consistency with previous studies.

      As for the "shared root" neurons described by Wahle et al., we did not analyze this category separately and included them in the nonAcD subtype. Nonetheless, it is an interesting direction to explore in the future. For completeness, we will discuss this point in the revised manuscript.

      Related to point #1 the primary hippocampal neuron system is excellent for cell biological questions but comes with the drawback of imaginative morphologies including neurons with multiple axons and AISs. It is not mentioned here but literature indicates up to 20% of neurons have two axons (e.g. doi: 10.1007/s12264-017-0169-3, 10.1083/jcb.200707042). How did the authors classify the double axon cells? Since the main hypothesis is the existence of an intrinsic program for AcD neurons (p. 5 top), the two axons from one neuron should develop similarly. The authors can easily test this with the data.

      Reply: We appreciate the reviewer's comment regarding the choice of the model system for this type of study. Indeed, as they pointed out, in primary cultures, some neurons develop more than one axon. Since we did not find any supporting evidence from the literature reporting that hippocampal neurons have multiple axons in vivo, we only analyzed neurons with one axon for both AcD and nonAcD neurons. We will clarify this in our method section of the revised manuscript.

      Some interpretations about function are not correct and the authors should reconsider these. A role of cisternal organelles on neuronal excitability remains to be demonstrated (and see doi.org/10.1002/cne.21445 showing there is none). In addition, the statement that lower fluorescence intensity of Pan-Nav1 is indicating reduced excitability is flawed. Antibody staining does not scale linearly with voltage-gated sodium channel density and since the AIS of AcD neurons is further from the soma it is most likely smaller in diameter which may account for apparent fluorescent differences. For biophysical reasons (for details I refer to 10.3389/fncel.2019.00570, 10.1016/j.conb.2018.02.016 and 10.7554/eLife.53432) smaller diameter axons will be easier to depolarize by depolarizing voltage-gated channels or excitatory synapses. Finally, in AcD neurons the AIS distance from the soma poses all sorts of interesting cable properties with the soma and the local dendritic membrane and the electrotonic properties alone suffice to make these neurons more excitable.

      Reply: The reviewer brings up very valid and important points that we will address in the revised manuscript. First, we will rephrase and adjust our interpretations regarding the functions of the cisternal organelle in the AIS. As also mentioned by reviewer #2, we are aware that antibody staining does not properly reflect Na+ channel density. As discussed above, we will also measure other AIS proteins that anchor Na+ channels to see if there are any correlations in fluorescence intensity between them and Nav1. We agree with the reviewer that AcD neuron's AIS could have a smaller diameter, resulting in fewer Na+ channels. Indirect evidence is already available in the study of Benavides-Piccione et al., showing a smaller axon diameter in AcD neurons compared to nonAcD neurons in both human and mouse brain sections (Figure S4). To test this in our model system, we propose to measure the AIS diameter in AcD neurons. If this is indeed the case, we will indicate it in our revised manuscript and edit the section on Na+ channels.

      Exploring the biophysical properties of the AIS and axons of AcD neurons is indeed a highly interesting direction to pursue and is the project in its own. It would necessitate the use of computational modeling approaches, which require considerable time and resources that are not feasible within the timeframe of this revision.

      Comparing AcD and non-AcD neurons for AIS plasticity is an excellent idea but the present statistical design is not suitable for answering this question. The authors should directly compare non-AcD and AcD neurons within a two-way ANOVA design, asking the question whether the independent variable axon type is significantly different and interacts with plasticity.

      Related points: 'AIS distance' in Figure 7 seems to refer to something else than distance from soma (Figure 1). Please clarify. What were the absolute distances from the soma for the AcD neurons and was this dependent on treatment?

      Reply: We appreciate reviewer's comment and in the revised version we will perform the analysis using two-way ANOVA.

      Regarding the terminology and definitions used in our manuscript, the "AIS distance" refers to the measurement between the start of the AIS and the axon initiating point, as depicted in Figure S4 of the manuscript. We adopted this parameter from the previous study by Grubb et al. (Grubb & Burrone, 2010), ensuring consistency in our investigation of AIS plasticity. For AcD neurons, where the axon branches out from the dendrite, we defined the AIS distance as the length between the start of the AIS and the border of the stem dendrite, as illustrated in Figure S4B.

      In Figure 1, the term "distance from soma" represents the length of stem dendrite and used for AcD and nonAcD neuron classification. As shown in Figure S4G, the absolute distance from the soma for AcD neurons is approximately 10 µm and remains consistent across treatments. We will explain these points more clearly in the revised manuscript.

      Minor comments:

      1. At p. 7 is stated that "The percentage of none-AcD forming collaterals at DIV1 is much lower than for AcD neurons" but statistical support is lacking. The conclusion in the next line is that "AcD neurons follow consensus development". That is puzzling given the difference just mentioned before. Please clarify. Reply: We will provide statistical support for comparing collateral formation between nonAcD and AcD neurons at DIV1.

      Regarding the second point concerning consensus development, we were referring to the general developmental sequence of AcD neurons, as described by Dotti et al. (see Dotti et al., 1988 for reference), where neurons typically first establish an axon and then dendrites. This sequence is not necessary related to collateral formation, which indeed differs between nonAcD and AcD neurons. The ability to form collaterals may come from local differences in microtubule (MT) and actin dynamics at AcD neuron precursor axons, but it does not alter the fact that AcD neurons initially establish an axon and subsequently dendrites. We will clarify it in the revised manuscript.

      A study not cited in this manuscript showed distinct dendritic morphologies (doi: 10.1073/pnas.1607548113) and AcD interneurons are different for their axonal arborization (doi: 10.1242/dev.202305). Differences in growth of branch arborization could hint to subtypes. Are the AcD and non-AcD neurons different in their adult morphology? A detailed account of the axonal and dendritic trees would strengthen the data.

      Reply: Thank you for pointing this out. We will include this citation. In the study by Hodapp et al., it was shown that AcD and nonAcD neurons exhibit similar dendritic morphology and do not differ in spine density, number of dendritic branches, and total dendritic length. However, in hippocampal AcD neurons, the AcD occupies 35% of the total basal dendrite length, which is larger than basal dendrites in nonAcD neurons, suggesting that AcD neurons do possess specific features in their dendritic trees.

      Regarding the axons of AcD neurons, there is currently no detailed study available, and it would be more appropriate to investigate neuronal connectivity through tracing studies in animals rather than in primary cultures. Therefore, this question falls outside the scope of the current manuscript.

      Some key references are not included here, and a number of these are mentioned above. In the context of the detailed MT and Rab3A vesicle and cargo transport studies, please acknowledge some of the pioneering work of Alan Peters revealing the ultrastructure of axons emerging from dendrites. See Figs. 5-7 in Peters, Proskauer and Kaiserman-Abramof IR., J Cell Biol 39:604 (1968). What is the identity of the neurons? It makes a difference if the cells are interneurons or pyramidal neurons, CA1 or CA3-like. For plasticity experiments the authors uses cells as independent measurements, but this is inflating the power. How many cultures were used?

      Reply: Thank you for pointing this out; we will include the suggested references in the revised manuscript. In our study, we focused on excitatory neurons from the hippocampus. We distinguished neuron types morphologically or with the inhibitory neuron marker GAD1. Identifying CA1, CA2, CA3, and DG subtypes in dissociated culture is more challenging, and this would be an interesting avenue to explore in an in vivo system. Here, we focused on fundamental cell biology aspects related to the AIS structure and its trafficking barrier function, which should be similar in all these neuron types. While there may be subtype-specific differences in AIS plasticity, investigating this is beyond the scope of our manuscript.

      For the plasticity experiments, we used a total of 3 independent cultures, from which we collected a comparable number of neurons. In response to the reviewer's concern, we will also plot the mean of each culture to illustrate the variability of our data points.

      Reviewer 4

      Major comments:

      1. A general limitation of this study is the low N for some critical experiments. In several experiments, individual cells become an N, therefore boosting the power of the analysis when in reality, due to the known heterogeneity of AIS length, position, and general cell morphology in vitro, the aim should be to compare means across animals / preparations, each consisting of a comparable number of individual cells. This is especially important for the analyses of COs, axo-axonic synapses and channel expression at the AIS. Reply: We would like to mention that this is a cell biological study where neurons are grown in dissociated cultures. To prepare one such culture, we typically use hippocampi from 6-8 E18 rat embryos, which are then mixed in one suspension before plating. The cells are then plated on coverslips in a 12-well plate format. When referring to replicates, for all experiments except for the longitudinal study of 5-day-long time-lapse imaging of developmental sequences (Figure 1), we used between 3 to 6 independent preparations. From each preparation, we took a comparable number of cells derived from 4-6 different coverslips. For each experiment, we measured more than a hundred cells, which is standard practice in the field. To address the issue with individual measurements, in the revised manuscript, we will additionally plot the means of each independent preparation.

      Such critical parameters as e.g. synaptic innervation at the AIS are investigated in a way that does not support the clear statements given, e.g. "The AIS of AcD neurons receives fewer inhibitory inputs" (Highlights statement) or "AcD neurons have less inhibitory synapses at the AIS" (header of Fig. 6). The overall number of analyzed cells is low (3 and 4 preparations, respectively and approximately 50-cells for each marker). The combination of a pre- and postsynaptic marker for inhibitory / excitatory neurons is a solid decision, but the analysis is not done based on the close approximation of these markers, in 3D, along an AIS, but rather in maxIPs and without any regard of whether pre-and postsynaptic markers are actually close to each other not. The expression of these markers alone just points towards the epitopes being expressed, but are they localized to each other in such a manner that they could form bona fide synapses? The methods are not totally clear on the image depth (tile scans with 5 µm in z will not provide the detail of information to resolve synapses, so how did the authors address the subcellular analysis here and for the CO and VGSCs?). And generally, were Nyquist conditions taken into consideration throughout the study? This can be clarified in text and does not require additional experiments.

      Reply: The overall number of cells for quantifying inhibitory synapses along the AIS was approximately 80 cells for each synaptic marker. To clarify this, we will indicate the number of cells in the figure legend of our revised manuscript and will additionally plot mean values across independent preparations.

      In the current manuscript, our main goal was to provide an initial quantitative measurement of AIS features in AcD neurons to see if they differ from nonAcD neurons. Hence, maxIPs are sufficient for this purpose as they summarize the 3D information. To make our study more comprehensive, following the reviewer's suggestion, we will conduct additional experiments to co-label pre- and post-inhibitory synapses at the AIS with VGAT and gephyrin, respectively. Then, we will image samples in 3D to measure the density as well as the distance between pre- and post-synapses at the AIS of AcD neurons and compare them to nonAcD neurons.

      The Nyquist condition was taken into consideration throughout the study. The pixel size of our data collection was 0.081 µm for the laser scanning microscope, as indicated in our methods section. Given the optical setup of our microscope and the fluorophores used to label target proteins (information available in the methods section of our manuscript), the acceptable Nyquist lateral sampling size (or pixel size, in other words) for confocal images is between 0.083 to 0.093 µm and 0.2 µm in the z-plane. In our data collection for laser scanning confocal images, the z-step size was 0.5 µm (see methods section of our manuscript), which is indeed undersampling the data. However, this should not significantly affect our analysis based on maxIPs. The new stainings with matched pre- and post-synaptic markers will be imaged with a smaller z-step (0.2 µm) and then reconstructed in 3D.

      The chapter on AIS plasticity is certainly an interesting addition to the study, but is a bit superficial, yet reaches strong conclusions ("More importantly, it further indicates that the AIS of AcD neurons is insensitive to activity changes"). This is based on un-physiological concentrations of KCl, and certainly not on network manipulation that truly tests synaptic activity. It also comes back to the 1st point above. A suggestion would be to edit the conclusion.

      Reply: KCl treatment globally depolarizes the membrane potential of neurons, leading to an increase in intracellular calcium via voltage-sensitive calcium channels as well as NMDA and AMPA receptors (Rienecker et al., 2020). This protocol has been used in several initial studies describing the plasticity of the AIS (see Evans et al., 2013, 2017; Grubb & Burrone, 2010; Jamann et al., 2021; Muir & Kittler, 2014; Wefelmeyer et al., 2015 for references). Moreover, as shown by Evans et al. and Grubb et al. (see Evans et al., 2013; Grubb & Burrone, 2010 for references), AIS plasticity is not abolished by TTX, which blocks Na+ channels, but is prevented by L-type calcium channel blockers. This suggests that the occurrence of AIS plasticity is independent of action potentials but more sensitive to calcium-related pathways downstream of membrane potential depolarization and post-synaptic activation. Hence, we believe our results are indicative of how the AIS would react when calcium signaling pathways are altered by activity levels. To address the reviewer's concern, we will focus our conclusion more on membrane potential depolarization and calcium signalling and edit out statements.

      As discussed above in response to reviewer #3, the quantification of AIS plasticity includes 3 independent preparations, comprising approximately 200 neurons in total. To prevent inflation of statistical power in the analysis, we will also plot the means and standard error of the mean (SEM) for each independent experiment and assess whether any differences persist.

      The rationale behind looking at the cisternal organelle (CO) in this study is outlined in the Introduction, where the authors state that "...... and is responsible for calcium handling". What is "calcium-handling" and where is the evidence cited? Furthermore, in the Results, they state that "...both compounds (VGSCs and COs) are critical for the AIS to regulate neuronal excitability". While this is the case for VGSCs, there is no conclusive evidence in the literature whether of not the CO is "critical" for neuronal excitability. In fact, a number of neurons have no CO in the AIS (as much as 50% of all AIS in mouse primary visual cortex for example do not express synpo at the AIS at all, Schlüter et al., 2017). The CO can therefore not be as critical for AP initiation as the authors state. Furthermore, the authors state that "AIS plasticity in excitatory neurons is triggered by calcium signaling". While certainly shown and adequately cited here, other factors (independent of calcium) can also play a role, therefore this statement is a bit absolute and should be edited accordingly.

      Reply: Thank you for constructive editorial suggestions. Regarding the first question on calcium handling, we were referring to Ca2+ storage and release mechanisms. Benedeczky et al. already showed the existence of SERCA-type Ca2+ pumps at the membrane of the cisternal organelle (CO) to demonstrate the involvement of Ca2+ sequestering/storage by the CO at the AIS (Benedeczky et al., 1994). Although indirect, Sánchez-Ponce et al. showed the presence of IP3R, which promotes Ca2+ release from internal storage, at the AIS and partially colocalizes with synaptodin (Sánchez-Ponce et al., 2011). This is also the same case for the Ca2+-binding protein annexin 6. Together, this evidence indicates a putative role of the CO in regulating Ca2+ dynamics (storage/release) at the AIS. Since Ca2+ levels have a significant impact on action potential generation and timing at the AIS (see Bender & Trussell, 2009; Yu et al., 2010 for references), and therefore should be strictly regulated, it is likely that the CO at the AIS is important for regulating neuronal excitability by controlling Ca2+ dynamics. However, as mentioned by the reviewer, there are no conclusive pieces of evidence showing the relationship between the CO and neuron excitability regulation. We will edit our statement accordingly.

      In contrast to the findings of Schlüter et al. (Schlüter et al., 2019), which were conducted in the mouse primary visual cortex, Sánchez-Ponce et al. showed that nearly 90% of hippocampal neurons contain synaptopodin, the CO marker protein, at the AIS. Furthermore, Schlüter et al. also demonstrated that in the other 50% of neurons containing COs at the AIS, the COs change size during visual deprivation, and their presence correlates with AIS length changes as well as eye-opening. These observations do suggest that COs are related to neuronal activity. However, this correlation and the formation of COs may be specific to neuro subtypes or require certain triggers. This is another interesting direction to explore, and we will include it in the discussion of the revised manuscript.

      Regarding the last point on Ca2+ and AIS plasticity, we were not excluding other factors that could potentially participate in AIS plasticity and will also discuss it in the revised version.

      The Introduction ends with the rationale of the study, namely that the authors seek to ....."provide a detailed characterization of the AIS, including its structural and functional properties....". Structure is investigated, but function is limited to the barrier function of the AIS. Since the authors provide no electrophysiology that would really dissect AIS function, I suggest to rephrase this part and focus on transport.

      Reply: As suggested, we will certainly emphasize the cargo barrier function of the AIS in AcD neurons in our introduction. But we would like to keep the term "AIS function", because it has already been nicely demonstrated electrophysiologically by previous studies that the plasticity effect of the AIS is very important for maintaining cellular homeostasis.

      The Discussion is more a list of future plans than a context to current data. The authors could move some of the new questions they identify into an "outlook" section at the end? Also, again have a critical look at the literature that is cited and which statements are accurate.

      For example, the 2nd phrase in the Discussion states that is was shown that AcD neurons have a "role in memory consolidation", referenced to Hodapp et al., 2022. However, that paper does not provide direct evidence of such a role for AcD neurons. The statement "Collectively, our data provide new insights into the development of AcD neurons and demonstrate that there are differences in AIS functionality between AcD and nonAcD neurons", is not correct. AIS function was not investigated outside of the axonal barrier, and here, the AcD and nonAcD cells do not differ. Also, although the Discussion is geared towards excitatory / glutamatergic neurons, it has been shown by others that interneurons show an even stronger trend to exhibit AcD morphology (work by the Wahle lab and others). This is not clear from the current text (also compare "...AcD neurons being a different subtype if pyramidal neuron").

      Further original publications should be included in the paragraph highlighting patch-clamp recordings (see above). In the same context, the statement "...showed that rapid AID plasticity occurs mainly in hippocampal dentate gyrus cells but not in principal excitatory neurons" is not accurate (see Kim, Kuba, Jamann and others). Generally, the Introduction and Discussion would benefit from a very clear distinction between studies done in vitro versus those done ex vivo or in vivo. This needs to be stated in the Abstract as well.

      Methods: For the imaging of synapses, the CO and VGSCs, it is not clear to me from the methods whether Nyquist conditions were applied to produce data that can support the quantification of nanoscale structures. Basing the analysis and interpretation of channel expression on fluorescence intensity profiles is problematic (variance in staining quality from samples to sample, lack of an internal standard). This should be noted in the text. In the text, the first two references given for "Induction of plasticity" do not reference the correct papers.

      Reply: Thank you for the valuable suggestions; we will incorporate them into the revised version of the manuscript. The structure will undoubtedly benefit from these improvements. We will also have a further look into our interpretation of the literatures as well as citations during our revision time frame.

      Regarding methods, as stated in response to the second point raised by this reviewer, we ensured that the Nyquist condition was adhered to throughout the study. The pixel size, z-step size, and optical setup of the microscopes used were already indicated in our methods section. With respect to Na+ channel staining, we were indeed aware of the potential issues posed by the experimental setup, and we will explicitly mention this in our revised manuscript. Additionally, we plan to measure other AIS scaffolding and membrane proteins that anchor Na+ channels to assess for potential changes, which could indirectly support our Na+ channel staining results.

      Finally, the text is lacking a discussion of limitations of the study, especially from a methodological point of view. In the Abstract/Summary already, the authors could point out that this is a pure in vitro study. Interestingly, to this day, AIS relocation during plasticity events has only been shown in cell culture systems, and not in vivo. Therefore, this needs to be put into context here - the chosen system is great for the type of imaging approach presented here, but may look at a type of AIS plasticity that is not seen in vivo.

      Reply: These are very good points. We will include the limitations of the study in the discussion. Indeed, due to technical and methodological challenges, the relocation of the AIS has not yet been demonstrated using animal models. However, in the study by Wefelmeyer et al. (Wefelmeyer et al., 2015), a similar relocation of the AIS resulting from chronic stimulation was observed in hippocampal organotypic slices, and it was accompanied by reduced excitability of neurons. Furthermore, in the same study, neurons with axons/AIS originating from basal dendrites were also mentioned. However, the measurement of chronic AIS plasticity in their study was not performed based on different classes of neuron types. Hence, our work complements their results. Given that the network connectivity of organotypic slices is much closer to real physiological conditions, it is likely that similar plastic adaptations could occur in vivo.

      __Minor comments __

      1. How does intrinsic neuronal activity play into developmental programs in vitro? Electrical activity in maturing neurons is a major part of how networks are shaped, and cells differentiate. This is not genetically encoded per se, but has been shown to be a major driving force of neuronal development in vivo. Is this reflected in the culture setting in any way? And have the authors considered testing early changes in activity patterns in their cultures to see whether AcDs and nonAcDs develop in similar percentages? To clarify, I am not asking for additional experiments. Reply: It is indeed a valid point that activity can influence neuronal morphology. Lehmann et al. (pre-print, doi: https://doi.org/10.1101/2023.07.31.551236) have recently demonstrated that increased network activity leads to more excitatory principal neurons adopting AcD morphology. However, our developmental data were collected from DIV0 to DIV5, an age at which dissociated neurons do not yet form functional excitatory synapses. Therefore, it is highly unlikely that network activity plays a role in shaping AcD neuron development during this early stage.

      The authors may want to add a bit of a technical discussion on the choice of KCl and TTX as triggers for plasticity, especially at the non-physiological concentrations offered here and elsewhere (15 mM KCl).

      Reply: We appreciate the reviewer for pointing this out. We will add this in our revised manuscript.

      Some key statements would benefit from citing the appropriate original literature (some examples would be the original work by Kole, Bender and Brette on the role of the AIS in AP initiation; original work by D'Este and Letterier on the dendritic and axonal scaffold using nanoscopy; work by Kim, Kuba and Jamann on AIS plasticity in vitro and in vivo that is critical for a more informed discussion of AIS plasticity here, and others)

      Reply: These are very good points, we will make suggested edits in the revised version.

      In the Introduction, the authors word their text explicitly for excitatory neurons. However, AIS plasticity has also been observed in interneurons (work by the Grubb lab for example), and axo-axonic synapses are in fact not all inhibitory - this is in important factor to consider given the embryonic state of the culture material. Does the DIV maturation reflect how axo-axonic synapses "switch" from excitatory to inhibitory in vivo (also see work of the Burrone lab)? Can the conclusions form the paper really be drawn based on this type of system?

      Reply: The AIS plasticity was indeed also observed in inhibitory interneurons (see Chand et al., 2015 for reference) and show opposite phenotypes compared to excitatory neurons. Also related to major comment #5, we did take the potential influence of AcD interneurons on the outcome of AIS plasticity experiment into consideration. Therefore, we also did a control experiment where inhibitory interneurons were labelled with GAD1 after chronic KCl treatment and these neurons were excluded from the analysis. Consistently, we got the same results that excitatory AcD neurons do not undergo chronic AIS plasticity. We will include this data in our revised manuscript. Further, in our current manuscript, we decided to focus on excitatory AcD neurons not only because they are the major functional unit in neuronal circuits, but also because the majority of the electrophysiological features were studied in excitatory AcD neurons. But we agree with the reviewer that AcD interneuron is definitely an interesting subject for follow up research in the future.

      As mentioned by the reviewer, Pan-Vazquez et al. (Pan-Vazquez et al., 2020) nicely showed that axo-axonic synapses made by GABAergic Chandelier cells (ChCs) depolarise neurons in brain slices obtained from P12-18 animals. But this effect is reversed in slices obtained from older animals (>>P40). Of note, their results were based on cortical neurons but not hippocampal neurons, hence cell type specificity should be considered. More importantly, previous study reported that this conversion or switch of GABAergic interneurons from excitatory to inhibitory occurs on hippocampal neurons in P12-13 animals (Leinekugel et al., 1995). In dissociated hippocampal neurons from E18 rat embryos, this switch of GABAergic interneurons takes place on DIV9-11 and completes on DIV19, which should have a comparable neuronal developmental stage as the P12-13 in in vivo system (see Ganguly et al., 2001 for reference). Therefore, the conclusion could be drawn in an in vitro system, but it certainly needs to be validated in in vivo system.

      The authors state that "less COs account for higher intrinsic excitability". Why is that the case?

      Reply: According to Yu et al. and Bender et al., Ca2+ transient at the AIS regulates the generation of action potentials (APs). For instance, reducing Ca2+ transient at the AIS by blocking Ca2+ channels with either mibefradil (a T-type Ca2+ channel antagonist) or Ni2+ (which blocks R- and T-type channels) decreased the number of spikelets evoked by EPSP-like current injection and delayed the timing of spike generation (please see Bender & Trussell, 2009 for details). Therefore, we speculate that Ca2+ transients are less affected when there are fewer cisternal organelles (COs) at the AIS, which could have a more direct impact on AP initiation. However, this is just our hypothesis, and there is indeed no direct evidence showing that COs regulate Ca2+ dynamics. We will discuss this in the revised manuscript.

      Last but not least, some very recent studies on AcD biology (Stevens, Thome, Lehmann, Wahle) is available online also on preprint servers and may provide additional support for the current study.

      Reply: We will check these pre-prints and include relevant information into the revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The author should evaluate the possibility of naturally occurring arrhythmia due to the geometry of the tissues, by using voltage or calcium dye.

      Answer: We thank the reviewer for this suggestion. We have performed new experiments using a voltage-sensitive fluorescent dye (i.e. FluoVolt) with data reported in the new Figure 4 + new results section “arrhythmia analysis”. Briefly, we found that our ring-shaped tissues are compatible with live fluorescence imaging. We were then able to show that our cardiac tissues beat regularly, without naturally occurring arrhythmias or extra beats. We could not detect any re-entrant waves in our tissues in the conditions offered by the speed of our camera. A specific paragraph has also been added to the discussion.

      (2) There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. We are currently expanding to other cell lines with improvement in survival (see https://insight.jci.org/articles/view/161356). We confirm that the rings do not detach. The pillar was specifically designed to avoid this (See figure 1B).

      As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance.

      Answer: We propose a system and a measurement method that do not need calibration. Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A). There is thus no influence of the distance of the camera lens.

      In order to evaluate the consistency of the mechanical properties of the hydrogel, we reproduced the experiment pictured in Figure1-Supplement 1, and measured the Young’s Modulus of three different gel solutions on different days. In the three experiments performed, we found values of 10.0-12.2 kPa, resulting in a final average value of 11.2 (+/- 0.6) kPa, coherent with the value reported in the article. We are therefore confident that the mechanical properties are consistent across and within wells. More extensive mechanical characterization of the molded gels would require the access to an Atomic Force Microscope (AFM), and is considered in the future.

      The author should address the longevity and reproducibility issues, by working on the calibration of camera lens position/distance to tissues and further optimizing the seeding conditions with hydrogels such as collagen or fibrin, and/or making sure the PEG gels have high reproducibility and consistency.

      Answer: This paper report seminal data that will serve as a foundation for further use of the platform. This platform (including the design, approach and choice of polymers) allows a fast and reproducible formation of an important number of cardiac tissues (up to 21 per well in a 96-well format, meaning a potential total of about 2,000 tissues) with a limited number of cells.

      (3) The evaluation of the arrhythmia should be more extensively explained and demonstrated.

      Answer : See answer to comment 1

      (4) The results of isoproterenol should be checked as non-paced tissues should have increased beating frequency with increasing dosages. Dofetilide does not typically have a negative inotropic effect on the tissues. Please check on the cell viability before and after dosing

      Answer : We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training. We agree that above a concentration of 10nM, dofetilide shows cardiotoxicity in our tissues as tissues completely stop beating.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the general comments in the public review, I have the following specific suggestions to the authors, that would help improve the manuscript.

      (1) Please describe the protocol for preparation of cardiac rings (shown in Figure 1C) in more detail. In particular, please describe how the tissues were transferred from the mold into the 96-well plate and how are they positioned and characterized during the study.

      Answer: There is no transfer of the tissues as they directly form in the well, that is pre-equipped with the molded PEG gel (See Figure 1B and methods section). The in situ analysis is a strong asset of this platform.

      (2) Please clarify the timepoints in this study. The overall schematic in Figure 1 C shows that the rings were formed on day 22 and then studied for 14 days, while Figure 2B shows data over 20 days following seeding, and Figure 3 shows data 14 days after seeding. It appears that these were separate studies (optimization of myocyte/fibroblast ratio followed by the main study.

      Answer: Figure 1C is showing the timeline including the cardiomyocytes differentiation. hiPSC-CMs are indeed seeded in the wells 22 days after starting the differentiation, which represent the Day0 for tissue formation. We apologize for the confusion.

      (3) Please explain if the number of rings per well (Figure 2) was used as the only criterion for selecting the myocyte/fibroblast ratio, and if so, why. Were these rings also characterized for their structural and contractile properties?

      Answer: Figure 2 supplement 1 report the contractility data according to the different tested ratios, and show no differences. The number for generated ring-shaped tissues was indeed the only criterion retained.

      (4) Please provide rationale for using the dermal rather than cardiac fibroblasts.

      Answer: We had previous experience generating EHTs using dermal fibroblasts which are easier to obtain commercially. Our approach could in theory also work using cardiac fibroblasts, which we have not tested in the present study.

      (5) Figure 2 panels C-E show an interesting segregation of cardiomyocytes into a thin cylindrical layer that does not appear to contain fibroblasts and a shorter and thicker cylinder containing fibroblasts mixed with occasional myocytes. Please specify at which time point this structure forms, and how does it change over time in culture? At which time point were the images taken? It would be helpful to include serial images taken over 1-14 days of study.

      Answer: We thank the reviewer for this interesting comment. We have performed additional immunostainings (reported in Figure 2 supplement 3) on tissues at Day 1 and day 7 after seeding. The segregation appears in the 7 first days. It appears that 1 day after seeding the fibroblasts are not yet attached, although the cardiac fiber has already started to be formed. Seven days after seeding, fibroblasts are fully spread and attached, and the contractile ring is formed and well-aligned. Brightfield images are reported in Figure 1E.

      (6) In the cardiomyocyte region (Figure 2D) the cells staining for troponin seem to be only at the surfaces. The thickness of the layer is only about 30-40 µµ, so one would assume that cell viability was not an issue. Please specify and discuss the composition of this region.

      Answer: We agree but we think this is a technical issue as at the center of the tissue, tissue thickness will limit laser penetration, although at the surface (inner our outer), the laser infiltrates easily between the tissue and the PEG. Moreover, we see on the zoomed view of the tissue in Figure 2 Supplement 2 that we have a staining inside the cardiac fiber, which just appears less strong due to tissue thickness.

      (7) Please also discuss segregation in terms of possible causes and the implications of apparently very limited contact between the two cell types, i.e., how representative is this two-region morphology of native heart tissue. Also, it would be interesting to know how the segregation has changed with the change in myocyte/fibroblast ratio.

      Answer: We are not sure there is a very limited contact as the use of fibroblasts is critical to ensure the formation of tissues (i.e. no tissues can be formed if we avoid the use of fibroblasts). We agree that these ring-shaped cardiac tissues are not especially representative of a native heart tissue in terms of interactions between several cell types. They were developed as a surrogate for physiopathological and pharmacological experiments (see a recent application in https://insight.jci.org/articles/view/161356)

      (8) There is interest and demonstrated ability to culture engineered cardiac tissues over longer periods of time. Please comment what was the rationale for selecting 14-day culture and if the system allows longer culture durations.

      Answer: In line with this comment, we have studied the contractile parameters of our rings 28 days after seeding and compared to their contractile parameters at D14. We found a slight increase for all the parameters, which is significant for the maximum contraction speed. Nevertheless, the data is much more variable and the number of tissues is lower (29 for D14 against 17 for D28). Therefore, we demonstrated that long-term culture of our tissues is possible, however not yet optimized. Hence, the following physiological and pharmacological tests have been done at D14.

      (9) Figure 3 documents the development of contractile parameters over 14 days of culture. Would it be possible to replace the arbitrary units with the actual values? Also, would it be possible to include the corresponding images of the rings taken at the same time points, to show the associated changes in ring morphologies.

      Answer: Contraction amplitude is expressed as a ratio between the contracted / relaxed areas (See figure 3 A): it is a ratio, thus without unit. Corresponding images can be seen in Figure 1 E.

      (10) The measured contraction stress, strain, and the speeds of contraction and relaxation improve from day 1 to day 7 and then plateau (Figure 3, Supplemental Figure 3. Please discuss this result.

      Answer: The new immunostainings performed on tissues at Day 1 and Day 7 show the progressive alignment of the cardiomyocytes and the muscular fibers, with an almost complete organization at Day 7.

      (11) The beating frequency does not appear to markedly change over time, while Figure 3B shows strong statistical significance (***) throughout the 14-day period. Please check/confirm.

      Answer: We confirm this result.

      (12) Please comment on the lack of effect of isoproterenol on beating frequency.

      Answer: We agree with this reviewer on the principle. However, we have repeated the experiments and we confirm our results, i.e. increasing concentrations of isoproterenol induced a trend towards increase in the contraction force and significantly increased contraction and relaxation speeds without change in the beat rate (Figure 5C). We do not have a definitive explanation for this observation. Our hypothesis is that this increase in contraction and relaxation speeds induced by isoproterenol is translated, on average in our study, into an increase in contractile force rather than in an increase in contraction frequency. This may depend on the cell line used, and is very well illustrated in a recent paper from Mannhardt and colleagues (Stem cell reports. 2020; 15(4):983–998). Of the 10 different cell lines tested in engineered heart tissues, all show an increase in contraction and relaxation speeds after isoproterenol administration, but this is translated either into an increase in contractile force (4 cell lines) or into a shortening of the beat (3 cell lines), and only 2 cell lines show an increase in both parameters. Indeed, since iPSC-CMs are immature cardiac cells, it is rare to obtain a positive force-frequency relationship without any maturation medium or mechanical or electrical training.

      (13) Please compare the contractile function of cardiac tissues measured in this study with data reported for other iPSC-derived tissue models.

      Answer : A specific paragraph tackles this aspect in the discussion

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewers’ Public Comments

      We are grateful for the reviewers’ comments. We have modified the manuscript accordingly and detail our responses to their major comments below.

      (1) Reviewer 2 was concerned that transformation of continuous functional data into categorical form could reduce precision in estimating the genetic architecture.

      We agree that transforming continuous data into categories may reduce resolution, but it also improves accuracy when the continuous data are affected by measurement noise. In our dataset, many genotypes are at the lower bound of measurement, and the variation in measured fluorescence among these genotypes is largely or entirely caused by measurement noise. By transforming to categorical data, we dramatically reduced the effect of this noise on the estimation of genetic effects. We modified the results and discussion sections to address this point.

      (2) Reviewer 2 asked about generalizability of our findings.

      Because our paper is the first use of reference-free analysis of a 20-state combinatorial dataset, generalizability is at this point unknown. However, a recent manuscript from our group confirms the generality of the simplicity of genetic architecture: using reference-free methods to analyze 20 published combinatorial deep mutational scans, several of which involve 20-state libraries, we found that main and pairwise effects account for virtually all of the genetic variance across a wide variety of protein families and types of biochemical functions (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057). Concerning the facilitating effect of epistasis on the evolution of new functions, we speculate that this result is likely to be general: we have no reason to think that the underlying cause of this observation – epistasis brings genotypes with different functions closer in sequence space to each other and expands the total number of functional sequences – arises from some peculiarity of the mechanisms of steroid receptor DBD folding or DNA binding. However, we acknowledge that our data involve sequence variation at those sites in the protein that directly mediate specific protein-DNA contact; it is plausible that sites far from the “active site” may have weaker epistatic interactions and therefore have weaker effects on navigability of the landscape. We have addressed these issues in the discussion.

      (3) Reviewer 3 asked “in which situation would the authors expect that pairwise epistasis does not play a crucial role for mutational steps, trajectories, or space connectedness, if it is dominant in the genotype-phenotype landscape?”

      The question addressed in our paper is not whether epistasis shapes steps, trajectories or connectedness in sequence space but how it does so and what its particular effects are on the evolution of new functions. The dominant view in the field has been that the primary role of epistasis is to block evolutionary paths. We show, however, that in multi-state sequence space, epistasis facilitates rather than impedes the evolution of new functions. It does this by increasing the number of functional genotypes and bringing genotypes with different functions closer together in sequence space. This finding was possible because of the difference in approach between our paper and prior work: most prior work considered only direct paths in a binary sequence space between two particular starting points – and typically only considering optimization of a single function – whereas we studied the evolution of new functions in a multi-state amino acid space, under empirically relevant epistasis informed by complete combinatorial experiments. The result is a clear demonstration that the net effect of real-world levels of epistasis on navigability of the multidimensional sequence landscape is to make the evolution of new functions easier, not harder.

      (4) Reviewer 3 asked for “an explanation of how much new biological results this paper delivers as compared with the paper in which the data were originally published.”

      Starr 2017 did not use their data to characterize the underlying genetic architecture of function by estimating main and epistatic effects of amino acid states and combinations; it also did not evaluate the importance of epistasis in generating functional variants, determining the transcription factor’s specificity, or shaping evolutionary navigability on the landscape.

      (5) Reviewer 3 requested an explanation of how the results would have been (potentially) different if a reference-based approach were used, and how reference-based analysis compares with other reference-free approaches to estimating epistasis.

      This topic has been covered in detail in a recent manuscript from our group (Park et al. Biorxiv 2023.09.02.556057). Briefly, reference-free approaches provide the most efficient explanation of an entire genotype-phenotype map, explaining the maximum amount of genetic variance and reducing sensitivity to experimental noise and missing genotypes compared to reference-based approaches. Reference-based approaches tend to infer much more epistasis, especially higher-order epistasis, because measurement error and local idiosyncrasy near the wild-type sequence propagate into spurious high-order terms. Reference-based analyses are appropriate for characterizing only the immediate sequence neighborhood of a particular “wild-type” protein of interest. Reference-free approaches are therefore best suited to understanding genotype-phenotype landscapes as a whole. We have clarified these issues in the revised discussion.

      (6) Reviewer 3 suggested that the comparison between the full and main-effects-only model should involve a re-estimation of main effects in the latter case.

      This is indeed what we did in our analysis. We have clarified the description in the results and methods sections to make this clear.

      (7) Reviewer 3 asked about the applicability of the approach to data beyond those analyzed in the present study and requirements to use it.

      Our approach could be used for any combinatorial DMS dataset in which the phenotypic data are categorical (or can be converted to categorical form). Complete sampling is not required: a virtue of reference-free analysis is that by averaging the estimated effects of states and combinations over all variants that contain them, reference-free analysis is highly robust to missing data (except at the highest possible order of epistasis, where only a single variant represents a high-order effect) as long as variant sampling is unbiased with respect to phenotype. All the required code are publicly available at the github link provided in this manuscript. We have also described a general form of reference-free analysis for continuous data and applied it to 20 protein datasets in a recent publication (Park et al. Biorxiv 2023.09.02.556057).

      (8)Reviewer 3 suggested that the text could be shortened and made less dense.

      We agree and have done a careful edit to streamline the narrative.

      Response to Reviewers’ Non-Public Recommendations

      (1) Reviewer 1 noted that specific epistatic effects might in some cases produce global nonlinearities in the genotype-phenotype relationship. They then asked how our results might change if we did not impose a nonlinear transformation as part of the genotype-phenotype model. The reviewer’s underlying concern was that the non-specific transformation might capture high-order specific epistatic effects and thus reducing their importance.

      Because our data are categorical, we required a model that characterizes the effect of particular amino acid states and combinations on the probability that a variant is in a null, weak, or strong activation class. A logistic model is the classic approach to this kind of analysis. The model structure assumes that amino acid states and combinations have additive effects on the log-odds of being in one functional class versus the lower functional class(es); the only nonlinear transformation is that which arises mathematically when log-odds are transformed into probability through the logistic link function. Thinking through the reviewer’s comment, we have concluded that our model does not make any explicit transformation to account for nonlinearity in the relationship between the effects of specific sequence states/combinations and the measured phenotype (activation class). If additional global nonlinearities are present in the genotype-phenotype relationship – such as could be imposed by limited dynamic range in the production of the fluorescence phenotype or the assay used to measure it – it is possible that the sigmoid shape of the logistic link function may also accommodate these nonlinearities. We have noted this part in the revised manuscript.

      (2) Reviewer 1 observed that our model seems to prefer sets of several pairwise interactions among states across sites rather than fewer high-order interactions among those same states.

      This finding arises because the pattern of phenotypic variation across genotypes in our dataset is consistent with that which would be produced by pairwise interactions rather than by high-order interactions. In a reference-free framework, these patterns are distinct from each other: a group of second-order terms cannot fit the patterns produced by high-order epistasis, and high-order terms cannot fit the pattern produced by pairwise interactions. Similarly, main-effect terms cannot fit the pattern of phenotypes produced by a pairwise interaction, and a pairwise epistatic term cannot fit the pattern produced by main effects of states at two sites. For example, third-order terms are required when the genotypes possessing a particular triplet of states deviate from that expected given all the main and second-order effects of those states; this deviation cannot be explained by any combination of first- and second-order effects.

      We explain this point in detail in our recent manuscript (Park Y, Metzger BPH, Thornton JW. 2023. The simplicity of protein sequence-function relationships. BioRxiv, 2023.09.02.556057) and we summarize it here. Consider the simple example of two sites with two possible states (genotypes 00, 01, 10, and 11). If there are no main effects and no pairwise effects, this architecture will generate the same phenotype for all four variants – the global average (or zero-order effect). If there are pairwise effects but no main effects, this architecture will generate a set of phenotypes on which the average phenotype of genotypes with a 0 at the first site (00 and 01) equals the global average – as does the average of those with 0 at the second site (00 and 10). The epistatic effect causes the individual genotypes to deviate from the global average. This pattern can be fit only by a pairwise epistatic term, not by first-order terms. Conversely, if there are main effects but no pairwise effects, then the average phenotype of genotypes 00 and 01 will deviate from the global average (by an amount equal to the first-order effect), as will the average of (00 and 10): the phenotype of each genotype will be equal to the sum of the relevant first-order effects for the state it contains. This pattern cannot be fit by second-order model terms. The same logic extends to higher orders: a cluster of second-order terms cannot explain variation generated by third-order epistasis, because third-order variation is by definition is the deviation from the best second-order model.

      (3) Reviewer 1 suggested several places in the text where citations to prior work would be appropriate.

      We appreciate these suggestions and have modified the manuscript to refer to most of these works.

      (4) Reviewer 1 pointed to the paper of Gong et al eLife 2013 and asked whether it is known how robust the proteins in our study are to changes in conformation/stability compared to other proteins, and whether this might impact the likelihood of observing higher-order epistasis in this system.

      The DBDs that we study here are very stable, and previous work shows that mutations affect DNA specificity primarily by modifying the DBD’s affinity rather than its stability (McKeown et al., Cell 2014). Additionally, Gong et al.’s findings pertain to a globally nonlinear relationship between stability and function, which arises from the Boltzmann relationship between the energy of folding and occupancy of the folded state. Because our data are categorical – based on rank-order of measured phenotype rather than fluorescence as a continuous phenotype – the kind of global nonlinearity observed in Gong’s study are not expected to produce spurious estimates of epistasis in our work. We have modified the discussion to discuss the point.

      (5) Reviewer 1 asked a) why the epistatic models produce landscapes on which variants have fewer neighbors on average than main-effects only models and b) why the average distance from all ERE-specific nodes to all SRE-specific nodes is greater with epistasis (but the average distance from ERE to nearest SRE is lower with epistasis).

      In the main effects-only landscape, the functional genotypes are relatively similar to each other, because each must contain several of the states that contribute the most to a positive genetic score. Moreover, ERE-specific nodes are similar to each other, and SRE-specific nodes are similar to each other, because each must contain one or more of a relatively small number of specificity-determining states. When epistasis is added to the genetic architecture, two things happen: 1) more genotypes become functional because there are more combinations that can exceed the threshold score to produce a functional activator and 2) these additional functional variants are more different from each other – in general, and within the classes of ERE- or SRE-specific variants – because there are now more diverse combinations of states that can yield either phenotype. As a result, a broader span of sequence space is occupied, but ERE- and SRE-specific variants are more interspersed with each other. This means that the average distance between all pairs of nodes is greater, and this applies to all ERE-SRE pairs, as well. However, the interspersing means that the closest single SRE to any particular ERE is closer than it was without epistasis. We have added this explanation to the main text.

      (6) Reviewer 2 asked us to explain why average path length increases with pairwise epistasis as the strength of selection for specificity increases.

      This behavior occurs because of the existence of a local peak in the pairwise model. Genotypes on this peak contained few connections to other genotypes, all of which were less SRE specific. Thus, with strong selection, i.e. high population size, the simulations became stuck on the local peak, cycling among the genotypes many times before leaving, resulting in a large increase in the mean step number. As shown in the rest of the figure, when the longest set of paths are removed, there are still differences in the average number of steps with and without epistasis. This issue is described in the methods section.

      (7) Reviewers made several suggestions for clarity in the text and figures.

      We have modified the paper to address all of these comments.

      (8) Reviewer 3 stated that the code should be available.

      The code is available at https://github.com/JoeThorntonLab/DBD.GeneticArchitecture.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors were trying to understand the relationship between the development of large trunks and longirrostrine mandibles in bunodont proboscideans of Miocene, and how it reflects the variation in diet patterns.

      Strengths:

      The study is very well supported, written, and illustrated, with plenty of supplementary material. The findings are highly significant for the understanding of the diversification of bunodont proboscideans in Asia during Miocene, as well as explaining the cranial/jaw disparity of fossil lineages. This work elucidates the diversification of paleobiological aspects of fossil proboscideans and their evolutionary response to open environments in the Neogene using several methods. The authors included all Asian bunodont proboscideans with long mandibles and I suggest that they should use the expression "bunodont proboscideans" instead of gomphotheres.

      Weaknesses:

      I believe that the only weakness is the lack of discussion comparing their results with the development of gigantism and long limbs in proboscideans from the same epoch.

      Thank you for your comprehensive review and positive feedback on our study regarding the co-evolution of feeding organs in bunodont proboscideans during the Miocene. We appreciate your suggestion, and have decided to use the term "bunodont elephantiforms" (for more explicit clarification, we use elephantiforms to exclude some early proboscideans, like Moeritherium, ect.) instead of "gomphotheres," and we will make this change in our revised manuscript. We also appreciate the potential weakness you mentioned regarding the lack of discussion comparing our results with the development of gigantism and long limbs in proboscideans from the same epoch. We agree with the reviewer’s suggestion, and we are aware that gigantism and long limbs are potential factors for trunk development. Gigantism resulted in the loss of flexibility in elephantiforms, and long limbs made it more challenging for them to reach the ground. A long trunk serves as compensation for these limitations. limb bones were rare to find in our material, especially those preserved in association with the skull.

      Reviewer #2 (Public Review):

      This study focuses on the eco-morphology, the feeding behaviors, and the co-evolution of feeding organs of longirostrine gomphotheres (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) which are characterised by their distinctive mandible and mandible tusk morphologies. They also have different evolutionary stages of food acquisition organs which may have co-evolve with extremely elongated mandibular symphysis and tusks. Although these three longirostrine gomphothere families were widely distributed in Northern China in the Early-Middle Miocene, the relative abundances and the distribution of these groups were different through time as a result of the climatic changes and ecosysytems.

      These three groups have different feeding behaviors indicated by different mandibular symphysis and tusk morphologies. Additionally, they have different evolutionary stages of trunks which are reflected by the narial region morphology. To be able to construct the feeding behavior and the relation between the mandible and the trunk of early elephantiformes, the authors examined the crania and mandibles of these three groups from the Early and Middle Miocene of northern China from three different museums and also made different analyses.

      The analyses made in the study are:

      (1) Finite Element (FE) analysis: They conducted two kinds of tests: the distal forces test, and the twig-cutting test. With the distal forces test, advantageous and disadvantageous mechanical performances under distal vertical and horizontal external forces of each group are established. With the twig-cutting test, a cylindrical twig model of orthotropic elastoplasity was posed in three directions to the distal end of the mandibular task to calculate the sum of the equivalent plastic strain (SEPS). It is indicated that all three groups have different mandible specializations for cutting plants.

      (2) Phylogenetic reconstruction: These groups have different narial region morphology, and in connection with this, have different stages of trunk evolution. The phylogenetic tree shows the degree of specialization of the narial morphology. And narial region evolutionary level is correlated with that of character-combine in relation to horizontal cutting. In the trilophodont longirostrine gomphotheres, co-evolution between the narial region and horizontal cutting behaviour is strongly suggested.

      (3) Enamel isotopes analysis: The results of stable isotope analysis indicate an open environment with a diverse range of habitats and that the niches of these groups overlapped without obvious differentiation.

      The analysis shows that different eco-adaptations have led to the diverse mandibular morphology and open-land grazing has driven the development of trunk-specific functions and loss of the long mandible. This conclusion has been achieved with evidence on palaecological reconstruction, the reconstruction of feeding behaviors, and the examination of mandibular and narial region morphology from the detailed analysis during the study.

      All of the analyses are explained in detail in the supplementary files. The 3D models and movies in the supplementary files are detailed and understandable and explain the conclusion. The conclusions of the study are well supported by data.

      We appreciate your detailed and insightful review of our study. Your summary accurately captures the essence of our research, and we are pleased to note that multiple research methods were used to demonstrate our conclusions. Your recognition of the evidence-based conclusions from paleoecological, feeding behavior reconstruction, and morphological analyses reinforces the validity of our findings. Once again, we appreciate your time and thoughtful reviews.

      Reviewer #1 (Recommendations For The Authors):

      Thank you very much for the invitation to review this amazing manuscript. It is very well written and supported, and I have only minor suggestions to improve the text:

      (1) Some references are not in chronological sequence in the text, and this should be reviewed.

      We greatly appreciate the positive comments of the reviewer. We revised the reference of the manuscript as the reviewer’s suggestion.

      (2) I suggest the use of the expression "bunodont proboscideans" instead of Gomphotheres because there is no agreement if Amebelodontidae and Choerolophodontidae are within Gomphotheriidae, as well as some brevirrostrine bunodont proboscideans from South America. So I think it is ok to use "Gomphotheriidae", but not gomphotheres to refer to all bunodont proboscideans included in the study.

      The reviewer is correct. Using “gomphotheres” to refer to these three groups is inappropriate. We have replaced “gomphotheres” with "bunodont elephantiforms" throughout the entire manuscript. Here, we use “elephantiforms”, not “proboscideans”, to avoid confusion with some early proboscidean members like Moeritherium, ect.

      (3) I was expecting some discussion on the development of large trunks related to the gigantism in these bunodont proboscideans, regarding the huge skulls and the columnar limbs.

      We appreciate this suggestion, and we are aware that gigantism is a potential factor for trunk development. It is difficult to compare the three groups (Amebelodontidae, Choerolophodontidae, and Gomphotheriidae) in terms of their weight and limb bone length, because in our material, limb bones were rarely found, especially those associated with cranial material. Nevertheless, at this stage, all elephantiforms had significantly enlarged cranial sizes and limb bone lengths compared to early members like Phiomia. Gigantism caused the loss of flexibility in elephantiforms, and even the long limbs made it more difficult for an elephantiform to reach the ground. A long trunk compensates for this evolutionary change. Exploring these aspects further is a part of our future work.

      (4) The reference to Alejandro et al should be replaced by Kramarz et al (and the correct surname of the authors). The name and surname of this reference need to be corrected. The correct names are Kramarz, A., Garrido, A., Bond, M. 2019. Please correct this in the text too.

      We thank the reviewer for catching this error. This reference has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      I believe your paper will lead to other studies on other Proboscidean groups on the evolution of the mandible and trunk. There are some corrections in the text:

      • In line 199 in the text in pdf, "Tassy, 1994" should be "Tassy, 1996".

      • In line 241, "studied" should be "studies"

      • In line 313, "," after the word "tool" should be "."

      We appreciate the reviewer for pointing these errors out and have revised these based on the suggestions.

      • In the References, you write "et al." in some references. You should write the names of all of the authors.

      • In the References: "Lister AM. 2013" and "Shoshani&Tassy" are not referenced in the text.

      • In the References: "Tassy P. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn." should be "Tassy P. 1994. Gaps, parsimony, and early Miocene elephantoids (Mammalia), with a re-evaluation of Gomphotherium annectens (Matsumoto, 1925). Zool. J. Linn. 112, 1-2, 101-117" and replaced before "Tassy P. 1996".

      We appreciate the reviewer’s suggestions and have revised these references.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      The authors provided experimental data in response to my comments/suggestions in the revision. Overall, most points were appropriate and satisfactory, but some issues remain.

      (1) It is not fully addressed how atypical survivors are generated independently of Rad52-mediated homologous recombination.

      The newly provided data indicate that the formation of atypical telomeres is independent of the Rad52 homologous recombination pathway.

      "The atypical telomeres clones exhibit non-uniform telomere pattern", but the TG-hybridized signals after XhoI digestion are clear and uniform.

      "Atypical telomere" clones may carry circular chromosomes embedded with short TG repeats, rather than linear chromosomes. In other words, atypical telomeres may differ from telomeres, the ends of chromosomes. Is atypical telomere formation dependent on NHEJ? Given that "two chromosomes underwent intra-chromosomal fusions" (Line 248), are atypical telomere clones detected frequently in SY13 cells containing two chromosomes?

      We thank the reviewer’s questions. Frankly, we have not been able to determine the chromosome structures in these so-called "atypical survivors". As we mentioned in the manuscript, there could be mixed telomere structures, e.g. TG tract amplification, intro-chromosome telomere fusion and inter-chromosome telomere fusion. Worse still, these 'atypical survivors' may not have maintained a stable genome, and their karyotype may have undergone stochastic changes during passages. To avoid misunderstanding, we change the term "atypical" to "uncharacterized" in the revised manuscript.

      We have previously shown that deletion of YKU70 does not affect MMEJ-mediated intra-chromosome fusion in single-chromosome SY14 cdc13Δ cells (Wu et al., 2020). In SY12 cells, double knockout of TLC1 and YKU resulted in synthetic lethality, and we were unable to continue our investigation. The result of synthetic lethality of TLC1 and YKU70 double deletion was shown in the Figure 7B in the reviewed preprint version 1, and the result was not included in the reviewed preprint version 2 in accordance with the reviewer's instructions.

      "Atypical” survivors could be detected in SY13 cells (Figure 1D), but the frequency of their formation in the SY13 strain appeared to be lower than in SY12. As one can imagine, SY13 contains two chromosomes and its survivors should have a higher frequency of intra-chromosome fusions.

      (2) From their data, it is possible that X and Y elements influence homologous recombination, type 1 and type 2 (type X), at telomeres. In particular, the presence of X and Y elements appears to be important for promoting type 1 recombination. In other words, although not essential, subtelomeres have some function in maintaining telomeres. I suggest that the authors include author response image 4 in the text. They could revise their conclusion and the paper title accordingly.

      According to this suggestion, we have included author response image 4 in the revised manuscript as Figure 2E, Figure 5D, Figure 6C and Figure 6E. Accordingly, we have changed the title as “Elimination of subtelomeric repeat sequences exerts little effect on telomere essential functions in Saccharomyces cerevisiae”.

      (3) Minor points: The newly added data indicate that X survivors are generated in a type 2-dependent manner. The authors could discuss how Y elements were eroded while retaining X elements (line 225, Figure 2A).

      Thank this reviewer’s suggestion. We have discussed it in the revised manuscript (p.13 line 244-245). When telomere was deprotected, chromosome end resection took place. Since SY12 only has one Y’-element, it is hard to search homology sequences to repair the Y’-element in XVI-L. When the X-element in XVI-L was exposed by further resection, it is easier to find homology sequences to repair. So, in Type X survivor the Y’-element was eroded while retaining X-element.

      Reviewer #2

      I would like to congratulate the authors for their work and the efforts they put in improving the manuscript. The major criticism I had previously, ie testing the genetic requirements for the survivor subtypes, has been met. Below are a few minor comments that don't necessarily require a response.

      (1) I think the Author response image 6 could have been included in the manuscript. I understand that the authors don't want to overinterpret survivor subtype frequencies, but this figure would have suggested some implication of Rad51 in the emergence of survivors even in the absence of Y' elements. At this stage, however, it is up to the authors, and leaving this figure out is also fine in my opinion.

      According to the suggestion, the author response image 6 has been presented as Figure 6—figure supplement 7.

      (2) Chromosome circularization seems to rely on microhomologies. Previously, the authors proposed that SY14 circularization depended on SSA (Wu et al. 2020), but here, since circularization appears to be Rad52-independent, it is likely to be based on MMEJ rather than SSA (although there are contradictory results on Rad52's role in MMEJ in the literature).

      Yes, we mentioned it in the revised manuscript.

      (3) p. 28 lines 511-513: "The erosion sites and fusion sequences differed from those observed in SY12 tlc1Δ-C1 cells (Figure 2D), suggesting the stochastic nature of chromosomal circularization": I don't think they are necessarily stochastic, because the sequences beyond the telomeres are now modified, the available microhomologies have changed as well.

      We agreed with your opinion. In different chromosomes, there tend to be some hotspots for chromosome fusion. For example, in Figure 6C and 6F the resection site in Chr1 and Chr2 was the same in SY12XYΔ+Y tlc1Δ-C1 and SY12XYΔ tlc1Δ-C1. So, we speculate that there are some hotspots for chromosome fusion, but which site the cell will choose in one round chromosome fusion event is stochastic.

      (4) Typos and other errors:

      • p. 3 line 52: "subtelomerice" and "varies" are mispelled.

      • p. 5 line 78: "processes" should be "process".

      • Supp files are mislabelled (the numbers do not correspond to file name).

      • Supp file 2: how come SY12 has only one Y' element and SY13 has two?

      • p. 10 line 175: "emerging" should be "emergence".

      • p.15 line 276: "counter-selected" should be "being counter-selected" or "counterselection".

      • p. 29 line 523: "the formation of them" should be "their formation".

      • p. 37 line 653: "could have been an ideal tool": the sentence is grammatically incorrect. Writing "AND could have been an ideal tool" is enough to make it structurally correct.

      Thanks for pointing these errors out. We have corrected them in the revised manuscript. For the question “how come SY12 has only one Y' element and SY13 has two?” we were not sure at this moment. We speculated that one of the Y’ might be lost during genetic engineering of the chromosomes by CRISPR–Cas9 system.

      Reviewer #3

      The authors included statistical analyses of the qPCR data (Fig 4B) as requested, but did not comment on the striking difference in expression of MPH3 and HSP32 in the SY12 strain compared to BY4742. An improvement of the manuscript is the inclusion of rad52 tlc1 strains in their analyses, demonstrating that the "atypical and circular survivors" arose independently of homologous recombination. In addition, by analyzing rad51 and rad50 mutant strain they could demonstrate that the "type X" survivors had similar molecular requirements to type II survivors. Overall, the revised submission improves the article.

      We thank the reviewer’s comments and suggestions. The SY12 strain (with three chromosomes) exhibited lower expression levels of both MPH3 and HSP32 compared to the parental strain BY4742 (with 16 chromosomes). We speculated that with the reduced chromosome numbers, the silencing proteins appeared to no longer be titrated by other telomeres that have been deleted. We have added these comments in the revised manuscript.

      Wu, Z.J., Liu, J.C., Man, X., Gu, X., Li, T.Y., Cai, C., He, M.H., Shao, Y., Lu, N., Xue, X., et al. (2020). Cdc13 is predominant over Stn1 and Ten1 in preventing chromosome end fusions. Elife 9.

    1. Reviewer #2 (Public Review):

      In the manuscript "Full-length direct RNA sequencing uncovers stress-granule dependent RNA decay upon cellular stress", Dar, Malla, and colleagues use direct RNA sequencing on nanopores to characterize the transcriptome after arsenite and oxidative stress. They observe a population of transcripts that are shortened during stress. The authors hypothesize that this shortening is mediated by the 5'-3' exonuclease XRN1, as XRN1 knockdown results in longer transcripts. Interestingly, the authors do not observe a polyA-tail shortening, which is typically thought to precede decapping and XRN1-mediated transcript decay. Finally, the authors use G3BP1 knockout cells to demonstrate that stress granule formation is required for the observed transcript shortening.

      The manuscript contains intriguing findings of interest to the mRNA decay community. That said, it appears that the authors at times overinterpret the data they get from a handful of direct RNA sequencing experiments. To bolster some of the statements additional experiments might be desirable.

      A selection of comments:

      (1) Considering that the authors compare the effects of stress, stress granule formation, and XRN1 loss on transcriptome profiles, it would be desirable to use a single-cell system (and validated in a few more). Most of the direct RNAseq is performed in HeLa cells, but the experiments showing that stress granule formation is required come from U2OS cells, while short RNAseq data showing loss of coverage on mRNA 5'ends is reanalyzed from HEK293 cells. It may be plausible that the same pathways operate in all those cells, but it is not rigorously demonstrated.

      (2) An interesting finding of the manuscript is that polyA tail shortening is not observed prior to transcript shortening. The authors would need to demonstrate that their approach is capable of detecting shortened polyA tails. Using polyA purified RNA to look at the status of polyA tail length may not be ideal (as avidity to oligodT beads may increase with polyA tail length and therefore the authors bias themselves to longer tails anyway). At the very least, the use of positive controls would be desirable; e.g. knockdown of CCR4/NOT.

      (3) The authors use a strategy of ligating an adapter to 5' phosphorylated RNA (presumably the breakdown fragments) to be able to distinguish true mRNA fragments from artifacts of abortive nanopore sequencing. This is a fantastic approach to curating a clean dataset. Unfortunately, the authors don't appear to go through with discarding fragments that are not adapter-ligated (presumably to increase the depth of analysis; they do offer Figure 1e that shows similar changes in transcript length for fragments with adapter, compared to Figure 1d). It would be good to know how many reads in total had the adapter. Furthermore, it would be good to know what percentage of reads without adapters are products of abortive sequencing. What percentage of reads had 5'OH ends (could be answered by ligating a different adapter to kinase-treated transcripts). More read curation would also be desirable when building the metagene analysis - why do the authors include every 3'end of sequenced reads (their RNA purification scheme requires a polyA tail, so non-polyadenylated fragments are recovered in a non-quantitative manner and should be discarded).

      (4) The authors should come to a clear conclusion about what "transcript shortening" means. Is it exonucleolytic shortening from the 5'end? They cannot say much about the 3'ends anyway (see above). Or are we talking about endonucleolytic cuts leaving 5'P that then can be attached by XRN1 (again, what is the ratio of 5'P and 5'OH fragments; also, what is the ratio of shortened to full-length RNA)?

      (5) The authors should clearly explain how they think the transcript shortening comes about. They claim it does not need polyA shortening, but then do not explain where the XRN1 substrate comes from. Does their effect require decapping? Or endonucleolytic attacks?

      (6) XRN1 KD results in lengthened transcripts. That is not surprising as XRN1 is an exonuclease - and XRN1 does not merely rescue arsenite stress-mediated transcript shortening, but results in a dramatic transcript lengthening.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editors for their careful reading of our manuscript and for the detailed and constructive feedback on our work. Please find attached the revised version of the manuscript. We performed an extensive revision of the manuscript to address the issues raised by the referees. We provide new analyses (regarding the response consistency and the neural complexity), added supplementary figures and edits to figures and texts. Based on the reviewers’ comments, we introduced several major changes to the manuscript.

      Most notably, we

      • added a limitation statement to emphasize the speculative nature of our interpretation of the timing of word processing/associative binding

      • emphasized the limitations of the control condition

      • added analyses on the interaction between memory retrieval after 12h versus 36h

      • clarified our definition of episodic memory

      • added detailed analyses of the “Feeling of having heard” responses and the confidence ratings

      We hope that the revised manuscript addresses the reviewers' comments to their satisfaction. We believe that the revised manuscript has been significantly improved owing to the feedback provided. Below you can find a point-by-point response to each reviewer comment in blue. We are looking forward that the revision will be published in the Journal eLife.

      Reviewer #1 (Public Review):

      The authors show that concurrently presenting foreign words and their translations during sleep leads to the ability to semantically categorize the foreign words above chance. Specifically, this procedure was successful when stimuli were delivered during slow oscillation troughs as opposed to peaks, which has been the focus of many recent investigations into the learning & memory functions of sleep. Finally, further analyses showed that larger and more prototypical slow oscillation troughs led to better categorization performance, which offers hints to others on how to improve or predict the efficacy of this intervention. The strength here is the novel behavioral finding and supporting physiological analyses, whereas the biggest weakness is the interpretation of the peak vs. trough effect.

      R1.1. Major importance:

      I believe the authors could attempt to address this question: What do the authors believe is the largest implication of this studies? How far can this technique be pushed, and how can it practically augment real-world learning?

      We revised the discussion to put more emphasis on possible practical applications of this study (lines 645-656).

      In our opinion, the strength of this paper is its contribution to the basic understanding of information processing during deep sleep, rather than its insights on how to augment realworld learning. Given the currently limited data on learning during sleep, we believe it would be premature to make strong claims about potential practical applications of sleep-learning. In addition, as pointed out in the discussion section, we do not know what adverse effects sleep-learning has on other sleep-related mechanisms such as memory consolidation.

      R1.2. Lines 155-7: How do the authors argue that the words fit well within the half-waves when the sounds lasted 540 ms and didn't necessarily start right at the beginning of each half-wave? This is a major point that should be discussed, as part of the down-state sound continues into the up-state. Looking at Figure 3A, it is clear that stimulus presented in the slow oscillation trough ends at a time that is solidly into the upstate, and would not neurolinguists argue that a lot of sound processing occurs after the end of the sound? It's not a problem for their findings, which is about when is the best time to start such a stimulus, but it's a problem for the interpretation. Additionally, the authors could include some discussion on whether possibly presenting shorter sounds would help to resolve the ambiguities here.

      The word pairs’ presentations lasted on average ~540 ms. Importantly, the word pairs’ onset was timed to occur 100 ms before the maximal amplitude of the targeted peaks/troughs.

      Therefore, most of a word’s sound pattern appeared during the negative going half-wave (about 350ms of 540ms). Importantly, Brodbeck and colleagues (2022) have shown that phonemes are continuously analyzed and interpreted with delays of about 50-200 ms, peaking at 100ms delay. These results suggest that word processing started just following the negative maximum of a trough and finished during the next peak. Our interpretation (e.g. line 520+) suggests that low-level auditory processing reaches the auditory cortex before the positive going half-wave. During the positive going half-wave the higher-level semantic networks appear the extract the presented word's meaning and associate the two simultaneously presented words. We clarified the time course regarding slow-wave phases and sound presentation in the manuscript (lines 158-164). Moreover, we added the limitation that we cannot know for sure when and in which slow-wave phase words were processed (lines 645-656). Future studies might want to look at shorter lasting stimuli to narrow down the timing of the word processing steps in relation to the sleep slow waves.

      R1.3. Medium importance:

      Throughout the paper, another concern relates to the term 'closed-loop'. It appears this term has been largely misused in the literature, and I believe the more appropriate term here is 'real-time' (Bergmann, 2018, Frontiers in Psychology; Antony et al., 2022, Journal of Sleep Research). For instance, if there were some sort of algorithm that assessed whether each individual word was successfully processed by the brain during sleep and then the delivery of words was subsequently changed, that could be more accurately labelled as 'closed-loop'.

      We acknowledge that the meaning of “closed-loop” in its narrowest sense is not fulfilled here. We believe that “slow oscillation phase-targeted, brain-state-dependent stimulation” is the most appropriate term to describe the applied procedure (BSDBS, Bergmann, 2018). We changed the wording in the manuscript to brain-state-dependent stimulation algorithm. Nevertheless, we would like to point out that the algorithm we developed and used (TOPOSO) is very similar to the algorithms often termed closed-loop algorithm in memory and sleep (e.g. Esfahani et al., 2023; Garcia-Molina et al., 2018; Ngo et al., 2013, for a comparison of TOPOSO to these techniques see Wunderlin et al., 2022 and for more information about TOPOSO see Ruch et al., 2022).

      R1.4. Figure 5 and corresponding analyses: Note that the two conditions end up with different sounds with likely different auditory complexities. That is, one word vs. two words simultaneously likely differ on some low-level acoustic characteristics, which could explain the physiological differences. Either the authors should address this via auditory analyses or it should be added as a limitation.

      This is correct, the two conditions differ on auditory complexities. Accordingly, we added this issue as another limitation of the study (line 651-653). We had decided for a single word control condition to ensure that no associative learning (between pseudowords) could take place in the control condition because this was the critical learning process in the experimental condition. We would like to point out that we observed significant differences in brain responses to the presentation of word-pairs (experimental condition) vs single pseudowords (control condition) in the Trough condition, but not the Peak condition. If indeed low-level acoustic characteristics explained the EEG differences occurring between the two conditions then one would expect these differences occurring in both the trough and the peak condition because earlier studies showed that low-level acoustic processing proceeds in both phases of slow waves (Andrillon et al., 2016; Batterink et al., 2016; Daltrozzo et al., 2012).

      R1.5. Line 562-7 (and elsewhere in the paper): "episodic" learning is referenced here and many times throughout the paper. But episodic learning is not what was enhanced here. Please be mindful of this wording, as it can be confusing otherwise.

      The reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g., Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013).

      We stand by our claim that sleep-learning was of episodic nature. Here we use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000) and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). We revised the manuscript to clarify that and how our definition differs from traditional definitions. Please see reviewer comment R3.1 for a more extensive answer.

      Reviewer #2 (Public Review):

      In this project, Schmidig, Ruch and Henke examined whether word pairs that were presented during slow-wave sleep would leave a detectable memory trace 12 and 36 hours later. Such an effect was found, as participants showed a bias to categorize pseudowords according to a familiar word that they were paired with during slow-wave sleep. This behavior was not accompanied by any sign of conscious understanding of why the judgment was made, and so demonstrates that long-term memory can be formed even without conscious access to the presented content. Unconscious learning occurred when pairs were presented during troughs but not during peaks of slow-wave oscillations. Differences in brain responses to the two types of presentation schemes, and between word pairs that were later correctly- vs. incorrectly-judged, suggest a potential mechanism for how such deep-sleep learning can occur.

      The results are very interesting, and they are based on solid methods and analyses. Results largely support the authors' conclusions, but I felt that there were a few points in which conclusions were not entirely convincing:

      R2.1. As a control for the critical stimuli in this study, authors used a single pseudoword simultaneously played to both ears. This control condition (CC) differs from the experimental condition (EC) in a few dimensions, among them: amount of information provided, binaural coherence and word familiarity. These differences make it hard to conclude that the higher theta and spindle power observed for EC over CC trials indicate associative binding, as claimed in the paper. Alternative explanations can be made, for instance, that they reflect word recognition, as only EC contains familiar words.

      We agree. In the revised version of the manuscript, we emphasise this as a limitation of our study (line 653-656). Moreover, we understand that the differences between stimuli of the control and the experimental condition must not rely only on the associative binding of two words. We cautioned our interpretation of the findings.

      Interestingly, EC vs CC exhibits differences following trough- but not peak targeting (see R1.4). If indeed all the EC vs CC differences were unrelated to associative binding, we would expect the same EC vs CC differences when peaks were targeted. Hence, the selective EC vs CC differences in the trough condition suggest that the brain is more responsive to sound, information, word familiarity and word semantics during troughs, where we found successful learning, compared to peaks, where no learning occurred. Troughtargeted word pairs (EC) versus foreign words (CC) enhanced the theta power 336 at 500 ms following word onset and this theta enhancement correlated significantly with interindividual retrieval performance indicating that theta probably promoted associative learning during sleep. This correlation was insignificant for spindle power.

      R2.2. The entire set of EC pairs were tested both following 12 hours and following 36 hours. Exposure to the pairs during test #1 can be expected to have an effect over memory one day later, during test #2, and so differences between the tests could be at least partially driven by the additional activation and rehearsal of the material during test #1. Therefore, it is hard to draw conclusions regarding automatic memory reorganization between 12 and 36 hours after unconscious learning. Specifically, a claim is made regarding a third wave of plasticity, but we cannot be certain that the improvement found in the 36 hour test would have happened without test #1.

      We understand that the retrieval test at 12h may have had an impact on performance on the retrieval test at 36h. Practicing retrieval of newly formed memories is known to facilitate future retrieval of the same memories (e.g. Karpicke & Roediger, 2008). Hence, practicing the retrieval of sleep-formed memories during the retrieval test at 12h may have boosted performance at 36h.

      However, recent literature suggests that retrieval practice is only beneficial when corrective feedback is provided (Belardi et al., 2021; Metcalfe, 2017). In our study, we only presented the sleep-played pseudowords at test and participants received no feedback regarding the accuracy of their responses. Thus, a proper conscious re-encoding could not take place. Nevertheless, the retrieval at 12h may have altered performance at 36h in other ways. For example, it could have tagged the reactivated sleep-formed memories for enhanced consolidation during the next night (Rabinovich Orlandi et al., 2020; Wilhelm et al., 2011).

      We included a paragraph on the potential carry-over effects from retrieval at 12h on retrieval at 36h in the discussion section (line 489-496; line 657-659). Furthermore, we removed the arguments about the “third wave of plasticity”.

      R2.3. Authors claim that perceptual and conceptual processing during sleep led to increased neural complexity in troughs. However, neural complexity was not found to differ between EC and CC, nor between remembered and forgotten pairs. It is therefore not clear to me why the increased complexity that was found in troughs should be attributed to perceptual and conceptual word processing, as CC contains meaningless vowels. Moreover, from the evidence presented in this work at least, I am not sure there is room to infer causation - that the increase in HFD is driven by the stimuli - as there is no control analysis looking at HFD during troughs that did not contain stimulation.

      With the analysis of the HFD we would like to provide an additional perspective to the oscillation-based analysis. We checked whether the boundary condition of Peak and Trough targeting changes the overall complexity or information content in the EEG. Our goal was to assess the change in neural complexity (relative to a pre-stimulus baseline) following the successful vs unsuccessful encoding of word pairs during sleep.

      We acknowledge that a causal interpretation about HFD is not warranted, and we revised the manuscript accordingly. It was unexpected that we could not find the same results in the contrast of EC vs CC or correct vs incorrect word pairs. We suggest that our signal-to noise ratio might have been too weak.

      One could argue that the phase targeting alone (without stimulation) induces peak/trough differences in complexity. We cannot completely rule out this concern. But we tried to use the EEG that was not influenced by the ongoing slow-wave: the EEG 2000-500ms before the stimulus onset and 500-2000ms after the stimulus onset. Therefore, we excluded the 1s of the targeted slow-wave, hoping that most of the phase inherent complexity should have faded out (see Figure 2). We could not further extend the time window of analysis due to the minimal stimulus onset interval of 2s. Of course we cannot exclude that the targeted Trough impacted the following HFD. We clarified this in the manuscript (line 384-425).

      Furthermore, we did find a difference of neural complexity between the pre-stimulus baseline and the post-stimulus complexity in the Peak condition but not in the Trough condition (we now added this contrast to the manuscript, line 416-419). Hence, the change in neural complexity is a reaction to the interaction of the specific slow-wave phase with the processing of the word pairs. Even though these results cannot provide unambiguous, causal links, we think they can figure as an important start for other studies to decipher neural complexity during slow wave sleep.

      Reviewer #3 (Public Review):

      The study aims at creating novel episodic memories during slow wave sleep, that can be transferred in the awake state. To do so, participants were simultaneously presented during sleep both foreign words and their arbitrary translations in their language (one word in each ear), or as a control condition only the foreign word alone, binaurally. Stimuli were presented either at the trough or the peak of the slow oscillation using a closed-loop stimulation algorithm. To test for the creation of a flexible association during sleep, participant were then presented at wake with the foreign words alone and had (1) to decide whether they had the feeling of having heard that word before, (2) to attribute this word to one out of three possible conceptual categories (to which translations word actually belong), and (3) to rate their confidence about their decision.

      R3.1. The paper is well written, the protocol ingenious and the methods are robust. However, the results do not really add conceptually to a prior publication of this group showing the possibility to associate in slow wave sleep pairs of words denoting large or small object and non words, and then asking during ensuing wakefulness participant to categorise these non words to a "large" or "small" category. In both cases, the main finding is that this type of association can be formed during slow wave sleep if presented at the trough (versus the peak) of the slow oscillation. Crucially, whether these associations truly represent episodic memory formation during sleep, as claimed by the authors, is highly disputable as there is no control condition allowing to exclude the alternative, simpler hypothesis that mere perceptual associations between two elements (foreign word and translation) have been created and stored during sleep (which is already in itself an interesting finding). In this latter case, it would be only during the awake state when the foreign word is presented that its presentation would implicitly recall the associated translation, which in turn would "ignite" the associative/semantic association process eventually leading to the observed categorisation bias (i.e., foreign words tending to be put in the same conceptual category than their associated translation). In the absence of a dis-confirmation of this alternative and more economical hypothesis, and if we follow Ocam's razor assumption, the claim that there is episodic memory formation during sleep is speculative and unsupported, which is a serious limitation irrespective of the merits of the study. The title and interpretations should be toned down in this respect

      Our study conceptually adds to and extends the findings by Züst et al. (a) by highlighting the precise time-window or brain state during which sleep-learning is possible (e.g. slow-wave trough targeting), (b) by demonstrating the feasibility of associative learning during night sleep, and (c) by uncovering the longevity of sleep-formed memories.

      We acknowledge that the reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticised (e.g, (Dew & Cabeza, 2011; Hannula et al., 2023; Henke, 2010; Reder et al., 2009; Shohamy & Turk-Browne, 2013). We stand by our claim that sleep-learning was of episodic nature. We use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000), and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). The core computational features of episodic memory are 1) rapid learning, 2) association formation, and 3) a compositional and flexible representation of the associations in long-term memory.

      Therefore, we revised the manuscript to emphasize how our definition differs from traditional definitions (line 64).

      For the current study, we designed a retrieval task that calls on the core computational features of episodic memory by assessing flexible retrieval of sleep-formed compositional word-word associations. Reviewer 3 suggests an alternative interpretation for the learning observed here: mere perceptual associations between foreign words and translations words are stored during sleep, and semantic associations are only inferred at retrieval testing during ensuing wakefulness. First, these processing steps would require the rapid soundsound associative encoding, long-term storage, and the flexible sound retrieval, which would still require hippocampal processing and computations in the episodic memory system. Second, this mechanism seems highly laborious and inefficient. The sound pattern of a word at 12 hours after learning triggers the reactivation of an associated sound pattern of another word. This sound pattern then elicits the activation of the translation words’ semantics leading to the selection of the correct superordinate semantic category at test.

      Overall, we believe that our pairwise-associative learning paradigm triggered a rapid conceptual-associative encoding process mediated by the hippocampus that provided for flexible representations of foreign and translation words in episodic memory. This study adds to the existing literature by examining specific boundary conditions of sleep-learning and demonstrates the longevity (at least 36 hours) of sleep-learned associations.

      Other remarks:

      R3.2. Lines 43-45 : the assumption that the sleeping brain decides whether external events can be disregarded, requires awakening or should be stored for further consideration in the waking state is dubious, and the supporting references date from a time (the 60') during which hypnopedia was investigated in badly controlled sleep conditions (leaving open the doubt about the possibility that it occurred during micro awakenings)

      We revised the manuscript to add timelier and better controlled studies that bolster the 60ties-born claim (line 40-51). Recently, it has been shown that the sleeping brain preferentially processes relevant information. For example the information conveyed by unfamiliar voices (Ameen et al., 2022), emotional content (Holeckova et al., 2006; Moyne et al., 2022), our own compared to others’ names (Blume et al., 2018).

      R3.3. 1st paragraph, lines 48-53 , the authors should be more specific about what kind of new associations and at which level they can be stored during sleep according to recent reports, as a wide variety of associations (mostly elementary levels) are shown in the cited references. Limitations in information processing during sleep should also be acknowledged.

      In the lines to which R3 refers, we cite an article (Ruch & Henke, 2020) in which two of the three authors of the current manuscript elaborate in detail what kind of associations can be stored during sleep. We revised these lines to more clearly present the current understanding of the potential and the limitations of sleep-learning (line 40-51). Although information processing during sleep is generally reduced (Andrillon et al., 2016), a variety of different kinds of associations can be stored, ranging from tone-odour to word-word association (Arzi et al., 2012, 2014; Koroma et al., 2022; Züst et al., 2019).

      R3.4. The authors ran their main behavioural analyses on delayed retrieval at 36h rather than 12h with the argument that retrieval performance was numerically larger at 36 than 12h but the difference was non-significant (line 181-183), and that effects were essentially similar. Looking at Figure 2, is the trough effect really significant at 12h ? In any case, the fact that it is (numerically) higher at 36 than 12h might suggest that the association created at the first 12h retrieval (considering the alternative hypothesis proposed above) has been reinforced by subsequent sleep.

      The Trough effect at 12h is not significant, as stated on line 185 (“Planned contrasts against chance level revealed that retrieval performance significantly exceeded chance at 36 hours only (P36hours = 0.036, P12hours = 0.094).”). It seems that our wording was not clear. Therefore, we refined the description of the behavioural analysis in the manuscript (lines 188-193).

      In brief, we report an omnibus ANOVA with a significant main effect of targeting type (Trough vs Peak, main effect Peak versus Trough: F(1,28) = 5.237, p = 0.030, d = 0.865). Because Trough-targeting led to significantly better memory retention than Peak-targeting, we computed a second ANOVA, solely including participants with through-targeted word-pair encoding. The memory retention in the Trough condition is above chance (MTrough = 39.11%, SD = 10.76; FIntercept (1,14) = 5.660, p = 0.032) and does not significantly differ between the 12h and 36h retrieval (FEncoding-Test Delay (1,14) = 1.308, p = 0.272). However, the retrieval performance at 36h numerically exceeds the performance at 12h and the direct comparison against chance reveals that the 36h but not the 12h retrieval was significant (P36hours = 0.036, P12hours = 0.094). Hence, we found no evidence for above chance performance at the 12h retrieval and focused on the retrieval after 36h in the EEG analysis.

      We agree with the reviewer that the subsequent sleep seems to have improved consolidation and subsequent retrieval. We assume that the reviewer suggests that participants merely formed perceptual associations during sleep and encoded episodic-like associations during testing at 12h (as pointed out in R 3.1). However, we believe that it is unlikely that the awake encoding of semantic associations during the 12h retrieval led to improved performance after 36h. We changed the discussion regarding the interaction between retrieval at 12h and 36h (line 505-512, also see R 2.2)

      R3.5> In the discussion section lines 419-427, the argument is somehow circular in claiming episodic memory mechanisms based on functional neuroanatomical elements that are not tested here, and the supporting studies conducted during sleep were in a different setting (e.g. TMR)

      Indeed, the TMR and animal studies are a different setting compared to the present study. We re-wrote this part and only focused on the findings of Züst and colleagues (2019), who examined hippocampal activity during the awake retrieval of sleep-formed memories (lines 472-482). Additionally, we would like to emphasise that our main reasoning is that the task requirements called upon the episodic memory system.

      R3.6. Supplementary Material: in the EEG data the differentiation between correct and incorrect ulterior classifications when presented at the peak of the slow oscillation is only significant in association with 36h delayed retrieval but not at 12h, how do the authors explain this lack of effect at 12 hour ?

      We assume that the reviewer refers to the TROUGH condition (word-pairs targeted at a slow-wave trough) and not as written to the peak condition. We argue that the retention performance at 12h is not significantly above chance (M12hours = 37.4%, P12hours = 0.094).

      Hence, the distinction between “correctly” and “incorrectly” categorised word pairs was not informative for the EEG analysis during sleep. For whatever reason the 12h retrieval was not significantly above chance, the less successful memory recall and thus a less balanced trial count makes recall accuracy a worse delineator for separating EEG trials then the recall performance after 36 hours.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor importance:

      Abstract: The opening framing is confusing here and in the introduction. Why frame the paper in the broadest terms about awakenings and threats from the environment when this is a paper about intersections between learning & memory and sleep? I do understand that there is an interesting point to be made about the counterintuitive behavioral findings with respect to sleep generally being perceived as a time when stimuli are blocked out, but this does not seem to me to be the broadest points or the way to start the paper. The authors should consider this but of course push back if they disagree.

      We understand the reviewer’s criticism but believe that this has more to do with personal preferences than with the scientific value or validity of our work. We believe that it is our duty as researchers to present our study in a broader context because this may help readers from various fields to understand why the work is relevant. To some readers, evidence for learning during sleep may seem trivial, to others, it may seem impossible or a weird but useless conundrum. By pointing out potential evolutionary benefits of the ability to acquire new information during sleep, we help the broad readership of eLife understand the relevance of this work.

      Lines 31-32: "Neural complexity" -> "neural measures of complexity" because it isn't clear what "neural complexity" means at this point in the abstract. Though, note my other point that I believe this analysis should be removed.

      To our understanding, “neural complexity” is a frequently used term in the field and yields more than 4000 entries on google scholar. Whereas ‘neural measures of complexity’ only finds 3 hits on google scholar [September 2023]. In order to link our study with other studies on neural complexity, we would like to keep this terminology. As an example, two recent publications using “neural complexity” are Lee et al. (2020) and Frohlich et al. (2022).

      Lines 42-43: The line of work on 'sentinel' modes would be good to cite here (e.g., Blume et al., 2017, Brain & Language).

      We added the suggested citation to the manuscript (lines 52).

      Lines 84-90: While I appreciate the authors desire to dig deep and try to piece this all together, this is far too speculative in my opinion. Please see my other points on the same topic.

      In this paragraph, we point out why both peaks and troughs are worth exploring for their contributions to sensory processing and learning during sleep. Peaks and troughs are contributing mutually to sleep-learning. Our speculations should inspire further work aimed at pinning down the benefits of peaks and troughs for sleep-learning. We clarified the purpose and speculative nature of our arguments in the revised version of the manuscript.

      Line 109: "outlasting" -> "lasting over" or "lasting >"

      We changed the wording accordingly.

      Line 111: I believe 'nonsense' is not the correct term here, and 'foreign' (again) would be preferred. Some may be offended to hear their foreign word regarded as 'nonsense'. However, please let me know if I have misunderstood.

      We would like to use the linguistic term “pseudoword” (aligned with reviewer 2’s comment) and we revised the manuscript accordingly.

      Figure 1A: "Enconding" -> "Encoding"

      Thank you for pointing this out.

      Lines 201-2: Were there interactions between confidence and correctness on the semantic categorization task? Were correct responses given with more confidence than incorrect ones? This would not necessarily be a problem for the authors' account, as there can of course be implicit influences on confidence (i.e., fluency).

      As is stated in the results section, confidence ratings did not differ significantly between correct and incorrect assignments (Trough condition: F(1,14) = 2.36, p = 0.15); Peak condition: F(1,14) = 0.48, p = 0.50).

      Line 236: "Nicknazar" -> "Niknazar"

      Thank you for pointing this out.

      Line 266: "profited" -> "benefited"

      We changed the wording accordingly.

      Lines 280-4: There seems some relevance here with Malerba et al. (2018) and her other papers to categorize slow oscillations.

      Diving into the details on how to best categorise slow oscillations is beyond the scope of this manuscript. Here, we build on work from the field of microstate analyses and use two measures to describe and quantify the targeted brain states: the topography of the electric field (i.e., the correlation of the electric field with an established template or “microstate”), and the field strength (global field power, GFP). While the topography of a quasi-stable electric field reflects activity in a specific neural network, the strength (GFP) of a field most likely mirrors the degree of activation (or inactivity) in the specific network. Here, we find that consistent targeting of a specific network state yielding a strong frontal negativity benefitted learning during sleep. For a more detailed explanation of the slow-wave phase targeting see (Ruch et al., 2022).

      Lines 343-6: Was it intentional to have 0.5 s (0.2-0.7 s) surrounding the analysis around 500 ms but only 0.4 s (0.8-1.2 s) surrounding the analysis around 1 s? Could the authors use the same size interval or justify having them be different?

      We apologise for the misleading phrasing and we clarified this in the revised manuscript. We applied the same procedure for the comparison of later correctly vs incorrectly classified pseudowords as we did for the comparison between EC and CC. Hence, we analysed the entire window from 0s to 2.5s with a cluster-based permutation approach. Contrary to the EC vs CC contrast, no cluster remained significant for the comparison of the subsequent memory effect. By mistake we reported the wrong time window. In the revised manuscript, the paragraph is corrected (lines 364-369).

      Line 356-entire HFD section: it is unclear what's gained by this analysis, as it could simply be another reflection of the state of the brain at the time of word presentation. In my opinion, the authors should remove this analysis and section, as it does not add clarity to other aspects of the paper.

      (If the authors keep the section) Line 361-2 - "Moreover, high HFD values have been associated with cognitive processing (Lau et al., 2021; Parbat & Chakraborty, 2021)." This statement is vague. Could the authors elaborate?

      Please see our answer to Reviewer 2 (2.3) for a more detailed explanation. In brief, we would like to keep the analysis with the broad time window of -2 to -0.5 and from 0.5 to 2 s.

      Lines 403-4: How was it determined that these neural networks mediated both conscious/unconscious processes? Perhaps the authors meant to make a different point, but the way it reads to me is that there is evidence that some neural networks are conscious and others are not and both forms engage in similar functions.

      We revised the manuscript to be more precise and clear: “The conscious and unconscious rapid encoding and flexible retrieval of novel relational memories was found to recruit the same or similar networks including the hippocampus(Henke et al., 2003; Schneider et al., 2021). This suggests that conscious and unconscious relational memories are processed by the same memory system.” (p. 22, top).

      Lines 433-41: Performance didn't actually significantly increase from 12 to 36 hours, so this is all too speculative in my opinion.

      We removed the speculative claim that performance may have increased from the retrieval at 12 hours to the retrieval at 36 hours.

      Line 534: "assisted by enhanced" -> "coincident with". It's unclear whether theta reflects successful processing as having occurred or whether it directly affects or assists with it.

      We have adjusted the wording to be more cautious, as suggested (line 588).

      Line 572-4: Rothschild et al. (2016) is relevant here.

      Unfortunately, we do not see the relevance of this article within the context of our work.

      Line 577 paragraph: The authors may consider adding a note on the importance of ethical considerations surrounding this form of 'inception'.

      We extended this part by adding ethical considerations to the discussion section (Stickgold et al., 2021, line 657).

      Line 1366: It would be better if the authors could eventually make their data publicly available. This is obviously not required, but I encourage the authors to consider it if they have not considered it already.

      In my opinion, the discussion is too long. I really appreciate the authors trying to figure out the set of precise times in which each level of neural processing might occur and how this intersects with their slow oscillation phase results. However, I found a lot of this too speculative, especially given that the sounds may bleed into parts of other phases of the slow oscillation. I do not believe this is a problem unique to these authors, as many investigators attempting to target certain phases in the target memory reactivation literature have faced the same problem, but I do believe the authors get ahead of the data here. In particular, there seems to be one paragraph in the discussion that is multiple pages long (p. 22-24). This paragraph I believe has too much detail and should be broken up regardless, as it is difficult for the reader to follow.

      Considering the recent literature, we believe this interpretation best explains the data. As argued earlier, we believe that a speculative interpretation of the reported phenomena can provide substantial added value because it inspires future experimental work. We have improved the manuscript by clearly distinguishing between data and interpretation. We do declare the speculative nature of some offered interpretations. We hope that these speculations, which are testable hypotheses (!), will eventually be confirmed or refuted experimentally.

      Reviewer #2 (Recommendations For The Authors):

      I very much enjoyed the paper and think it describes important findings. I have a few suggestions for improvement, and minor comments that caught my eye during reading:

      (1) I was missing an analysis of CC ERP, and its comparison to EC ERP.

      We added this analysis to the manuscript (line 299-301). The comparison of CC ERP with EC ERP did not yield any significant cluster for either the peak (cluster-level Monte Carlo p=0.54) or the trough (cluster-level Monte Carlo p>0.37). We assume that the noise level was too high for the identification of differences between CC and EC ERP.

      (2) Regarding my public review comment #2, some light can be shed on between-test effects, I believe, using an item-based analysis - looking at correlations between items' classifications in test #1 and test #2. The assumption seems to be that items that were correct in test #1 remained correct in test #2 while other new correct classifications were added, owing to the additional consolidation happening between the two tests. But that is an empirical question that can be easily tested. If no consistency in item classification is found, on the other hand, or if only consistency in correct classification is found, that would be interesting in itself. This item-based analysis can help tease away real memory from random correct classification. For instance, the subset of items that are consistently classified correctly could be regarded as non-fluke at higher confidence and used as the focus of subsequent-memory analysis instead of the ones that were correct only in test #2.

      Thanks, we re-analysed the data accordingly. Participants were consistent at choosing a specific object category for an item at 12 hours and 36 hours (consistency rate = 47% same category, chance level is 1/3). Moreover, the consistency rate did not differ between the Trough and the Peak condition (MTrough = 47.2%, MPeak = 47.0%, P = 0.98). The better retrieval performance in the Trough compared to the Peak condition after 36 hours is due to: A) if participants were correct at 12h, they chose again the correct answer at 36h (Trough: 20% & Peak: 14%). B) Following an incorrect answer at 12h, participants switched to another object category at 36h (Trough: 72%, Peak: 67%). C) If participants switched the object category following an incorrect answer at 12h, they switched more often to the correct category at 36h in the trough versus the peak condition (Trough: in 56% & Peak: 53%). Hence, the data support the reviewer’s assumption: items that were correct after 12 hours remained correct after 36 hours, while other new correct classifications were generated at 36h owing to the additional consolidation happening between the two tests. We added this finding to the manuscript (line 191-200, Figure S6):

      Author response image 1.

      As suggested, we re-analysed the ERP with respect to the subsequent memory effect. This time we computed four conditions according to the reviewer’s argument about consistently correctly classified pseudowords, presented in the figure below: ERP of trials that were correctly classified at 36h (blue), ERP of trials that were incorrectly classified at 36h (light blue), ERP of trials that were correctly classified twice (brown) and ERP of trials that were not correctly classified twice (orange, all trials that are not in brown). Please note that the two blue lines are reported in the manuscript and include all trials. The brown and the orange line take the consistency into account and together include as well all trials.

      Author response image 2.

      By excluding even more trials from the group of correct retrieval responses, the noise level gets high. Therefore, the difference between the twice-correct and the not-twice-correct trials is not significant (cluster-level Monte Carlo p > 0.27). Because the ERP of twice-correct trials seems very similar to the ERP of the trials correctly classified at 36h at frontal electrodes, we assume that our ERP effect is not driven by a few extreme subjects. Similarly, not-twicecorrect trials (orange) have a stronger frontal trough than the trials incorrectly classified at 36h (light blue).

      (3) In a similar vein, a subject-based analysis would be highly interesting. First and foremost, readers would benefit from seeing the lines that connect individual dots across the two tests in figures 2B and 2C. It is reasonable to expect that only a subset of participants were successful learners in this experiment. Finding them and analyzing their results separately could be revealing.

      We added a Figure S1 to the supplementary material, providing the pairing between performance of the 12h and the 36h retrieval.

      It is an interesting idea to look at successful learners alone. We computed the ERP of the subsequent memory effect for those participants, who had an above change retrieval accuracy at 36h. The result shows a similar effect as reported for all participants (frontal cluster ~0-0.3s). The p-value is only 0.08 because only 9 of 15 participants exhibited an above chance retrieval performance at 36 hours.

      Author response image 3.

      ERP effect of correct (blue) vs incorrect (light blue) pseudoword category assignment of participants with a retrieval performance above chance at 36h (SD as shades):

      We prefer to not include this data in the manuscript, but are happy to provide it here.

      (4) I wondered why the authors informed subjects of the task in advance (that they will be presented associations when they slept)? I imagine this may boost learning as compared to completely naïve subjects. Whether this is the reason or not, I think an explanation of why this was done is warranted, and a statement whether authors believe the manipulation would work otherwise. Also, the reader is left wondering why subjects were informed only about test #1 and not about test #2 (and when were they told about test #2).

      Subjects were informed of all the tests upfront. We apologize for the inconsistency in the manuscript and revised the method part. The explanation of why participants were informed is twofold: a) Participants had to sleep with in-ear headphones. We wanted to explain to participants why these are necessary and why they should not remove them. b) We hoped that participants would be expecting unconsciously sounds played during sleep, would process these sounds efficiently and would remain deeply asleep (no arousals).

      (5) FoHH is a binary yes/no question, and so may not have been sensitive enough to demonstrate small differences in familiarity. For comparison, the Perceptual Awareness Scale (Ramsøy & Overgaard, 2004) that is typically used in studies of unconscious processing is of a 4-point scale, and this allows to capture more nuanced effects such as partial consciousness and larger response biases. Regardless, it would be informative to have the FoHH numbers obtained in this study, and not just their comparison between conditions. Also, was familiarity of EC and CC pseudowords compared? One may wonder whether hearing the pseudowords clearly vs. in one ear alongside a familiar word would make the word slightly more familiar.

      We apologize for having simplified this part too much in the manuscript. Indeed, the FoHH is comparable to the PAS. We used a 4-point scale, where participants rated their feeling of whether they have heard the pseudoword during previous sleep. In the revised manuscript, we report the complete results (line 203-223). The FoHH did not differ between any of the suggested contrasts. Thus, for both the peak and the trough condition, the FoHH did not differ between sleep-played vs new; correct EC trials vs new; correct vs incorrect EC trials; EC vs CC trials. To illustrate the results, a figure of the FoHH has been added to the supplement (Figure S4).

      (6) Similarly, it would be good to report the numbers of the confidence ratings in the paper as well.

      In the revised manuscript, we extended the description of the confidence rating results. We added the descriptive statistics (line 224-236) and included a corresponding figure in the supplement (Figure S5).

      Minor/aesthetic comments:

      We implemented all the following suggestions.

      (1) I suggest using "pseudoword" or "nonsense word" instead of "foreign word", because "foreign word" typically means a real word from a different language. It is quite confusing when starting to read the paper.

      After reconsidering, we think that pseudoword is the appropriate linguistic term and have revised the manuscript accordingly.

      (2) Lines 1000-1001: "The required sample size of N = 30 was determined based on a previous sleep-learning study". I was missing a description of what study you are referring to.

      (3) I am not sure I understood the claim nor the rationale made in lines 414-417. Is the claim that pairs did not form one integrated engram? How do we know that? And why would having one engram not enable extracting the meaning from a visual-auditory presentation of the cue? The sentence needs some rewording and/or unpacking.

      (4) Were categories counterbalanced (i.e., did each subjects' EC contain 9 animal words, 9 tool words and 9 place words)?

      (5) Asterisks indicating significant effects are missing from Figure 4 and S2.

      (6) Fig1 legend: "Participants were played with pairs" is ungrammatical.

      (7) Line 1093: no need for a comma.

      (8) Line 1336: missing opening parenthesis

      (9) Line 430: "observe" instead of "observed".

      (10) Line 466: two dots instead of one..

      Reviewer #3 (Recommendations For The Authors):

      Methods: 2 separate ANOVAs are performed (lines 160-185), but would not it make more sense to combine both in one ? If kept separated then a correction for multiple comparisons might be needed (p/2 = 0.025)

      We computed an omnibus ANOVA. In a next step, we examined the effect in the significant targeting condition by computing another ANOVA. For further explanations, see reviewer comment 3.4.

      References

      Ameen, M. S., Heib, D. P. J., Blume, C., & Schabus, M. (2022). The Brain Selectively Tunes to Unfamiliar Voices during Sleep. Journal of Neuroscience, 42(9), 1791–1803. https://doi.org/10.1523/JNEUROSCI.2524-20.2021

      Andrillon, T., Poulsen, A. T., Hansen, L. K., Léger, D., & Kouider, S. (2016). Neural Markers of Responsiveness to the Environment in Human Sleep. The Journal of Neuroscience, 36(24), Article 24. https://doi.org/10.1523/JNEUROSCI.0902-16.2016

      Arzi, A., Holtzman, Y., Samnon, P., Eshel, N., Harel, E., & Sobel, N. (2014). Olfactory Aversive Conditioning during Sleep Reduces Cigarette-Smoking Behavior. Journal of Neuroscience, 34(46), Article 46. https://doi.org/10.1523/JNEUROSCI.2291-14.2014

      Arzi, A., Shedlesky, L., Ben-Shaul, M., Nasser, K., Oksenberg, A., Hairston, I. S., & Sobel, N. (2012). Humans can learn new information during sleep. Nature Neuroscience, 15(10), Article 10. https://doi.org/10.1038/nn.3193

      Batterink, L. J., Creery, J. D., & Paller, K. A. (2016). Phase of Spontaneous Slow Oscillations during Sleep Influences Memory-Related Processing of Auditory Cues. Journal of Neuroscience, 36(4), 1401–1409. https://doi.org/10.1523/JNEUROSCI.3175-15.2016

      Belardi, A., Pedrett, S., Rothen, N., & Reber, T. P. (2021). Spacing, Feedback, and Testing Boost Vocabulary Learning in a Web Application. Frontiers in Psychology, 12. https://www.frontiersin.org/articles/10.3389/fpsyg.2021.757262

      Bergmann, T. O. (2018). Brain State-Dependent Brain Stimulation. Frontiers in Psychology, 9, 2108. https://doi.org/10.3389/fpsyg.2018.02108

      Blume, C., del Giudice, R., Wislowska, M., Heib, D. P. J., & Schabus, M. (2018). Standing sentinel during human sleep: Continued evaluation of environmental stimuli in the absence of consciousness. NeuroImage, 178, 638–648. https://doi.org/10.1016/j.neuroimage.2018.05.056

      Brodbeck, C., & Simon, J. Z. (2022). Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Frontiers in Neuroscience, 16. https://www.frontiersin.org/articles/10.3389/fnins.2022.828546

      Cohen, N. J., & Eichenbaum, H. (1993). Memory, Amnesia, and the Hippocampal System. A Bradford Book.

      Daltrozzo, J., Claude, L., Tillmann, B., Bastuji, H., & Perrin, F. (2012). Working memory is partially preserved during sleep. PloS One, 7(12), Article 12.

      Dew, I. T. Z., & Cabeza, R. (2011). The porous boundaries between explicit and implicit memory: Behavioral and neural evidence. Annals of the New York Academy of Sciences, 1224(1), 174–190. https://doi.org/10.1111/j.1749-6632.2010.05946.x

      Esfahani, M. J., Farboud, S., Ngo, H.-V. V., Schneider, J., Weber, F. D., Talamini, L. M., & Dresler, M. (2023). Closed-loop auditory stimulation of sleep slow oscillations: Basic principles and best practices. Neuroscience & Biobehavioral Reviews, 153, 105379. https://doi.org/10.1016/j.neubiorev.2023.105379

      Frohlich, J., Chiang, J. N., Mediano, P. A. M., Nespeca, M., Saravanapandian, V., Toker, D., Dell’Italia, J., Hipp, J. F., Jeste, S. S., Chu, C. J., Bird, L. M., & Monti, M. M. (2022). Neural complexity is a common denominator of human consciousness across diverse regimes of cortical dynamics. Communications Biology, 5(1), Article 1. https://doi.org/10.1038/s42003-022-04331-7

      Gabrieli, J. D. E. (1998). Cognitive neuroscience of human memory. Annual Review of Psychology, 87–115.

      Garcia-Molina, G., Tsoneva, T., Jasko, J., Steele, B., Aquino, A., Baher, K., Pastoor, S., Pfundtner, S., Ostrowski, L., Miller, B., Papas, N., Riedner, B., Tononi, G., & White, D. P. (2018). Closed-loop system to enhance slow-wave activity. Journal of Neural Engineering, 15(6), 066018. https://doi.org/10.1088/1741-2552/aae18f

      Hannula, D. E., Minor, G. N., & Slabbekoorn, D. (2023). Conscious awareness and memory systems in the brain. WIREs Cognitive Science, 14(5), e1648. https://doi.org/10.1002/wcs.1648

      Henke, K. (2010). A model for memory systems based on processing modes rather than consciousness. Nature Reviews Neuroscience, 11(7), Article 7. https://doi.org/10.1038/nrn2850

      Henke, K., Mondadori, C. R. A., Treyer, V., Nitsch, R. M., Buck, A., & Hock, C. (2003). Nonconscious formation and reactivation of semantic associations by way of the medial temporal lobe. Neuropsychologia, 41(8), Article 8. https://doi.org/10.1016/S0028-3932(03)00035-6

      Holeckova, I., Fischer, C., Giard, M.-H., Delpuech, C., & Morlet, D. (2006). Brain responses to a subject’s own name uttered by a familiar voice. Brain Research, 1082(1), 142–152. https://doi.org/10.1016/j.brainres.2006.01.089

      Karpicke, J. D., & Roediger, H. L. (2008). The Critical Importance of Retrieval for Learning. Science, 319(5865), 966–968. https://doi.org/10.1126/science.1152408

      Koroma, M., Elbaz, M., Léger, D., & Kouider, S. (2022). Learning New Vocabulary Implicitly During Sleep Transfers With Cross-Modal Generalization Into Wakefulness. Frontiers in Neuroscience, 16, 801666. https://doi.org/10.3389/fnins.2022.801666

      Lee, Y., Lee, J., Hwang, S. J., Yang, E., & Choi, S. (2020). Neural Complexity Measures. Advances in Neural Information Processing Systems, 33, 9713–9724. https://proceedings.neurips.cc/paper/2020/hash/6e17a5fd135fcaf4b49f2860c2474c7 c-Abstract.html

      Metcalfe, J. (2017). Learning from Errors. Annual Review of Psychology, 68(1), 465–489. https://doi.org/10.1146/annurev-psych-010416-044022

      Moscovitch, M. (2008). The hippocampus as a “stupid,” domain-specific module: Implications for theories of recent and remote memory, and of imagination. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 62, 62–79. https://doi.org/10.1037/1196-1961.62.1.62

      Moyne, M., Legendre, G., Arnal, L., Kumar, S., Sterpenich, V., Seeck, M., Grandjean, D., Schwartz, S., Vuilleumier, P., & Domínguez-Borràs, J. (2022). Brain reactivity to emotion persists in NREM sleep and is associated with individual dream recall. Cerebral Cortex Communications, 3(1), tgac003. https://doi.org/10.1093/texcom/tgac003

      Ngo, H.-V. V., Martinetz, T., Born, J., & Mölle, M. (2013). Auditory Closed-Loop Stimulation of the Sleep Slow Oscillation Enhances Memory. Neuron, 78(3), Article 3. https://doi.org/10.1016/j.neuron.2013.03.006

      O’Reilly, R. C., Bhattacharyya, R., Howard, M. D., & Ketz, N. (2014). Complementary Learning Systems. Cognitive Science, 38(6), 1229–1248. https://doi.org/10.1111/j.1551-6709.2011.01214.x

      O’Reilly, R. C., & Rudy, J. W. (2000). Computational principles of learning in the neocortex and hippocampus. Hippocampus, 10(4), 389–397. https://doi.org/10.1002/1098-1063(2000)10:4<389::AID-HIPO5>3.0.CO;2-P

      Rabinovich Orlandi, I., Fullio, C. L., Schroeder, M. N., Giurfa, M., Ballarini, F., & Moncada, D. (2020). Behavioral tagging underlies memory reconsolidation. Proceedings of the National Academy of Sciences, 117(30), 18029–18036. https://doi.org/10.1073/pnas.2009517117

      Reder, L. M., Park, H., & Kieffaber, P. D. (2009). Memory systems do not divide on consciousness: Reinterpreting memory in terms of activation and binding. Psychological Bulletin, 135(1), Article 1. https://doi.org/10.1037/a0013974

      Ruch, S., & Henke, K. (2020). Learning During Sleep: A Dream Comes True? Trends in Cognitive Sciences, 24(3), 170–172. https://doi.org/10.1016/j.tics.2019.12.007

      Ruch, S., Schmidig, F. J., Knüsel, L., & Henke, K. (2022). Closed-loop modulation of local slow oscillations in human NREM sleep. NeuroImage, 264, 119682. https://doi.org/10.1016/j.neuroimage.2022.119682

      Schacter, D. L. (1998). Memory and Awareness. Science, 280(5360), 59–60. https://doi.org/10.1126/science.280.5360.59

      Schneider, E., Züst, M. A., Wuethrich, S., Schmidig, F., Klöppel, S., Wiest, R., Ruch, S., & Henke, K. (2021). Larger capacity for unconscious versus conscious episodic memory. Current Biology, 31(16), 3551-3563.e9. https://doi.org/10.1016/j.cub.2021.06.012

      Shohamy, D., & Turk-Browne, N. B. (2013). Mechanisms for widespread hippocampal involvement in cognition. Journal of Experimental Psychology: General, 142(4), 1159–1170. https://doi.org/10.1037/a0034461

      Squire, L. R., & Dede, A. J. O. (2015). Conscious and Unconscious Memory Systems. Cold Spring Harbor Perspectives in Biology, 7(3), a021667. https://doi.org/10.1101/cshperspect.a021667

      Stickgold, R., Zadra, A., & Haar, A. J. H. (2021). Advertising in Dreams is Coming: Now What? Dream Engineering. https://dxe.pubpub.org/pub/dreamadvertising/release/1

      Tulving, E. (2002). Episodic Memory: From Mind to Brain. Annual Review of Psychology, 53(1), 1–25. https://doi.org/10.1146/annurev.psych.53.100901.135114

      Wilhelm, I., Diekelmann, S., Molzow, I., Ayoub, A., Mölle, M., & Born, J. (2011). Sleep Selectively Enhances Memory Expected to Be of Future Relevance. Journal of Neuroscience, 31(5), 1563–1569. https://doi.org/10.1523/JNEUROSCI.3575-10.2011

      Wunderlin, M., Koenig, T., Zeller, C., Nissen, C., & Züst, M. A. (2022). Automatized online prediction of slow-wave peaks during non-rapid eye movement sleep in young and old individuals: Why we should not always rely on amplitude thresholds. Journal of Sleep Research, 31(6), e13584. https://doi.org/10.1111/jsr.13584

      Züst, M. A., Ruch, S., Wiest, R., & Henke, K. (2019). Implicit Vocabulary Learning during Sleep Is Bound to Slow-Wave Peaks. Current Biology, 29(4), 541-553.e7. https://doi.org/10.1016/j.cub.2018.12.038

    1. The district and state and federal governments haveestablished our standards and handed our curriculumdown to us. These standards make up the goals estab-lished for all of our students. How we reach these goalsmay require different paths. The core of differentiatedinstruction is flexibility in content, process, and productbased on student strengths, needs, and learning styles.

      Reaching these goals does require an enormous amount of flexibility by educators. It is easy to make a to-do list, it is much more difficult to complete that to-do list. I think it would be helpful to have direct student incentives so that they can see the immediate reward for their efforts instead of the abstract idea that being more educated may make their lives more successful as adults.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review

      [...] A particular strength of the present study is the structural characterization of human PURA, which is a challenging target for structural biology approaches. The molecular dynamics simulations are state-of-the-art, allowing a statistically meaningful assessment of the differences between wild-type and mutant proteins. The functional consequences of PURA mutations at the cellular level are fascinating, particularly the differential compartmentalization of wild-type and mutant PURA variants into certain subcellular condensates.

      Weaknesses that warrant rectification relate to (i) The interpretation of statistically non-significant effects seen in the molecular dynamic simulations.

      We removed from the manuscript the sentence which indicated that we analyzed statistically non-significant effects. Therefore, the above statement has been resolved.

      (ii) The statistical analysis of the differential compartmentalization of PURA variants into processing bodies vs. stress granules, and

      We re-analyzed all cell-biological data and adjusted the statistical analysis of P-bodies and Stress-granule intensity analysis. The new, and improved statistics have replaced the original analyses in the corresponding figures (Figs. 1C and 2B).

      (iii) Insufficient documentation of protein expression levels and knock-down efficiencies.

      Quantification of protein expression levels by Western blotting is shown in Appendix Figure S1. Quantification of knock-down efficiencies by Western blot experiments (Appendix Figure S3).

      Recommendations for the authors: Reviewer #1

      Concerns and Suggested Changes

      (a) I have only one concern about the computational part and that is about statements such as "There are also large differences in the residue surrounding the mutation spot (residues 90 to 100), where the K97E mutant also shows much greater fluctuation. However, these differences are not significant due to the large standard deviations." If the differences are not statistically significant, then I would suggest either removing such a statement or increasing the statistics.

      We agree with the Reviewer’s comment. We removed this sentence from the text.

      Recommendations for the authors: Reviewer #2

      General Comments

      This is a challenging structural target and the authors have made considerable efforts to determine the effect of several mutations on the structure and function. Many of the constructs, however, could not be expressed and/or purified in bacteria. However, it is not clear to what extent other expression systems (e.g. Drosophila or human) were considered and if this would have been beneficial.

      We did not use other expression systems because the wild-type protein is well-behaved when expressed in E. coli. In case a mutant variant cannot be expressed or does not behave well in E. coli, this constitutes a clear indication that the respective mutation impairs the protein’s integrity. Thus, by using E. coli as a reference system for all the variants of PURA protein, we could assess the influence of the mutations on the structural integrity and solubility. Only for the variants that did not show impairment in E. coli expression, we continued to assess in more detail why they are nevertheless functionally impaired and cause PURA Syndrome.

      Concerns and Suggested Changes

      (a) The schematic in Figure 3A would have been helpful for interpreting the mutations discussed in Figures 1 and 2. I would suggest moving it earlier in the text.

      We changed the figure according to the Reviewer’s suggestion.

      (b) I believe the RNA used for binding studies in Figures 3C and D was (CGG)8. Are the two "free" RNA bands a monomer and a dimer (duplex?)?

      Although we do not know for certain, it is indeed likely that the two free RNA bands represent either different secondary structures of the free RNA or a duplex of two molecules. Of note, PURA binds to both “free” RNA bands, indicating that it either does not discriminate between them or melts double-stranded RNA in these EMSAs.

      There also seems to be considerable cooperativity in the binding, so I wonder if a shorter RNA oligonucleotide might facilitate the measurement of Kds.

      The length of the used RNA was selected based on the estimated elongated size of the full-length PURA and the presence of 3 PUR repeats. Assuming that one PUR repeat interacts with about 6-7 bases (data from the co-structure of Drosophila PURA with DNA; PDB-ID: 5FGP) and that full-length PURA forms a dimer consisting of three PUR repeats, the full-length protein in its extended form should cover a nucleic-acid stretch of about 24 bases.

      Also, it is not clear how the affinities were measured particularly for hsPURA III since free band is never fully bound at the highest protein concentration.

      It was not our goal to measure Kds for the interaction of PURA variants with RNA. The EMSA experiments were conducted to detect relative differences in the interaction between PURA variants and RNA. To estimate the differences, we measured total intensity of the bound (shifted) and unbound RNA. The intensities of the bands observed on the scanned EMSA gels were quantified with FUJI ImageJ software. We calculated the percentage of the shifted RNA and normalized it. hsPURA III fragment shows much lower affinity therefore it does not fully shift RNA with the highest protein concentration when compared to the full-length PURA and to PURA I-II.

      (c) Do the human PURA I+II and dmPURA I+ II crystallize in the same space group and have similar packing? Can the observed structural flexibility be due to crystal contacts?

      hsPURA I+II and dmPURA I+II crystallize in different space groups with different crystal packing. In both cases, the asymmetric unit contains 4 independent molecules with the flexible part of the structure composed of the β4 and β8 (β ridge) exposed to solvent. In the case of the Drosophila structure, we do not observe any flexibility of both β-strands. In contrast, for the human PURA structure the β ridge exhibits lots of flexibility and it adopts different conformations in all 4 molecules of the asymmetric unit. We observe similar flexibility of the β4 and β8 (β ridge) in the structure of K97E mutant which contains 2 molecules in the asymmetric unit. We would like to add that we expect crystal contacts to rather stabilize than destabilize domains.

      Similarly, can the conformations observed for the K97E mutant be partially explained by packing?

      Regarding the sequence shift observed for the β5 and β6 strands in hsPURA I+II K97E variant: although the β5 strand with shifted amino acid sequence is involved in the contact with the symmetry-related molecule with another β5 strand we don’t consider this interaction as a source of the shift. To be sure that the shift is not forced by the crystallization, we had performed NMR measurement which confirmed that in solution there is a strong change in the β-stands comparing WT and K97E mutant. This is an unambiguous indication that the structural changes observed in the crystal structure are also happening in solution. In addition, the MD simulations provide additional confirmation of our interpretation that K97E destabilizes the corresponding PUR domain. Taken together, we provide proof from three different angles that the observed differences indeed affect the integrity and hence function of the protein.

      (d) Perhaps, it is my misunderstanding, but I find the NMR data on the Arg sidechains for the K97E confusing. If they are visible for K97E and not WT, doesn't this indicate that there is an exchange between two conformations or more dynamics in the WT structure? This does not seem to be the opposite of the expectation if K97E is thought to have more conformational flexibility.

      Due to a technical issue (peak contour level), arginine side chain resonances were not clearly visible in the WT spectrum. The figure 5F has been updated. Now, they do correspond to those seen in the mutant spectrum. However, to prevent any confusion or mis/overinterpretation, we removed the sentence regarding arginine side chain: "Intriguingly, arginine side chain resonances Nε-Hε were only visible in the K97E variant, while they were broadened out in the wild-type spectrum."

      (e) The most speculative part of the paper is the interpretation of SG and PB localization of PURA in Fig 1 and 2. There is an important issue with the statistics that must be clarified because it would appear that statistical significance was determined using each SG or PB as an independent measurement. This is incorrect and significance should be measured by only using the means of three biological replicates. This is well described here. It is not clear at this time if the reported P values will be confirmed upon reanalysis, and this may require reinterpretation of the data.

      We are grateful for this clarifying comment and agree that the statistical analysis of P-body and stress granule was misleading. Of note, while the figures depicted all the values independent of the biological repeats, the statistical analyses were done on the mean value of each replicate of each cell line and not all raw data points.

      We prepared new Plots, only showing the mean value of each replicate, and also re-calculated P-values. The values have changed only slightly in this new analysis because we now also included the previously labeled outliers (red points) to better demonstrate that significance still exists even when considering them.

      In the new analysis of stress-granule association, only the value of the K97E mutant lost its significance, indicating that its association to stress granules is not lost. Therefore, we adjusted the following sentences in the manuscript.

      Results:

      Original: "While quantification showed a reduced association of hsPURA K97E mutant with G3BP1-positive granules (Fig 1B), the two other mutants, I206F and F233del, showed the same co-localization to stress granules as the wild type control."

      Corrected: "In all the patient-related mutations, no significant reduction in stress granule association was seen when compared to the wild type control (Fig 1C)."

      Original: "The observation that only one of the patient-related mutations of hsPURA, K97E, showed reduced stress granule association indicates that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      Corrected: "As we did not observe significant changes in the association of patient-related mutations of hsPURA to stress granules, it is suggested that that this feature may not constitute a major hallmark of the PURA syndrome. It should be noted however that this interpretation must be considered with some caution as the experiments were performed in a PURA wild-type background."

      (f) A western blot showing the level of overexpression of the PURA proteins should be shown in Figure 1 as well as the KD of endogenous PURA for Figure S2?

      As requested, a Western blot showing the level of overexpression of the different PURA proteins has been added as Appendix Figure S1.

      A Western blot of the siRNA-mediated knock-down experiments of PURA and their corresponding control has been added to Appendix Figure S3. Quantification of three biological repeats showed a significant reduction of PURA protein levels upon knock down.

      (g) While I appreciate that rewriting is time-consuming, I would recommend considering restructuring the manuscript because I think that it would aid the overall clarity. I think the foundation of the work is the structural characterization and would suggest beginning the paper with this data and the biochemical characterization. The co-localization with SGs and PBs and how this may be relevant to disease is much more speculative and is therefore better to present later. While I appreciate that the structural interpretation of why some mutants localize to PBs differently is not entirely clear, I do think that this would provide some context for the discussion.

      In the initial version of the manuscript we first presented the structural characterization of PURA and afterwards the co-localization with SGs and PBs. As this reviewer stated him-/herself in (e), we also noticed that the SG and PB interpretation is the most speculative part of this manuscript. We felt that having this at the end of the results section would weaken the manuscript. On the other hand, we consider that the structural interpretation of mutations is much stronger and has a greater impact for future research. After long discussion we decided to swap the order to leave the most important results for the end of the manuscript.

      Recommendations for the authors: Reviewer #3

      Concerns and Suggested Changes:

      (a) For the characterization of G3BP1-positive stress granules in HeLa cells upon depletion of PURA, it remains unclear what is the efficiency of siRNA? The authors should provide a western blot to indicate how much the endogenous levels were reduced.

      We completely agree with the stated concern and addressed it accordingly. We had performed this experiment prior to submission but for some unknown reason it was not included in the manuscript.

      The Western blot of siRNA-mediated knock-down experiments of PURA and their corresponding control is now shown in Appendix Figure S3. Quantification of three biological repeats, showed a significant reduction of PURA protein levels upon knock down.

      (b) How does knocking down PURA affect DCP1A-positive structures in HeLa cells? Would P bodies be formed even in the absence (or reduction) of total PURA?

      Indeed, the stated question is very interesting. In fact, we have already shown in our recent publication (Molitor et al., 2023) that a knock down of PURA in HeLa and NHDF cells leads to a significant reduction of P-bodies. We actually referred to this finding on page 6:

      "Since hsPURA was recently shown to be required for P-body formation in HeLa cells and fibroblasts (Molitor et al. 2023), PURA-dependent liquid phase separation could potentially also directly contribute to the formation of these granules."

      On the same page, we also refer to the underlying molecular mechanism:

      "However, when putting this observation in perspective with previous reports, it seems unlikely that P-body formation directly depends on phase separation by hsPURA, but rather on its recently reported function as gene regulator of the essential P-body core factors LSM14a and DDX6 (Molitor et al., 2023)."

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      These ingenious and thoughtful studies present important findings concerning how people represent and generalise abstract patterns of sensory data. The issue of generalisation is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception, learning, and cognitive science. The findings have the potential to provide compelling support for the outlined account, but there appear other possible explanations, too, that may affect the scope of the findings but could be considered in a revision.

      Thank you for sending the feedback from the three peer reviewers regarding our paper. Please find below our detailed responses addressing the reviewers' comments. We have incorporated these suggestions into the paper and provided explanations for the modifications made.

      We have specifically addressed the point of uncertainty highlighted in eLife's editorial assessment, which concerned alternative explanations for the reported effect. In response to Reviewer #1, we have clarified how Exp. 2c and Exp. 3c address the potential alternative explanation related to "attention to dimensions." Further, we present a supplementary analysis to account for differences in asymptotic learning, as noted by Reviewer #2. We have also clarified how our control experiments address effects associated with general cognitive engagement in the task. Lastly, we have further clarified the conceptual foundation of our paper, addressing concerns raised by Reviewers #2 and #3.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports a series of experiments examining category learning and subsequent generalization of stimulus representations across spatial and nonspatial domains. In Experiment 1, participants were first trained to make category judgments about sequences of stimuli presented either in nonspatial auditory or visual modalities (with feature values drawn from a two-dimensional feature manifold, e.g., pitch vs timbre), or in a spatial modality (with feature values defined by positions in physical space, e.g., Cartesian x and y coordinates). A subsequent test phase assessed category judgments for 'rotated' exemplars of these stimuli: i.e., versions in which the transition vectors are rotated in the same feature space used during training (near transfer) or in a different feature space belonging to the same domain (far transfer). Findings demonstrate clearly that representations developed for the spatial domain allow for representational generalization, whereas this pattern is not observed for the nonspatial domains that are tested. Subsequent experiments demonstrate that if participants are first pre-trained to map nonspatial auditory/visual features to spatial locations, then rotational generalization is facilitated even for these nonspatial domains. It is argued that these findings are consistent with the idea that spatial representations form a generalized substrate for cognition: that space can act as a scaffold for learning abstract nonspatial concepts.

      Strengths:

      I enjoyed reading this manuscript, which is extremely well-written and well-presented. The writing is clear and concise throughout, and the figures do a great job of highlighting the key concepts. The issue of generalization is a core topic in neuroscience and psychology, relevant across a wide range of areas, and the findings will be of interest to researchers across areas in perception and cognitive science. It's also excellent to see that the hypotheses, methods, and analyses were pre-registered.

      The experiments that have been run are ingenious and thoughtful; I particularly liked the use of stimulus structures that allow for disentangling of one-dimensional and two-dimensional response patterns. The studies are also well-powered for detecting the effects of interest. The model-based statistical analyses are thorough and appropriate throughout (and it's good to see model recovery analysis too). The findings themselves are clear-cut: I have little doubt about the robustness and replicability of these data.

      Weaknesses:

      I have only one significant concern regarding this manuscript, which relates to the interpretation of the findings. The findings are taken to suggest that "space may serve as a 'scaffold', allowing people to visualize and manipulate nonspatial concepts" (p13). However, I think the data may be amenable to an alternative possibility. I wonder if it's possible that, for the visual and auditory stimuli, participants naturally tended to attend to one feature dimension and ignore the other - i.e., there may have been a (potentially idiosyncratic) difference in salience between the feature dimensions that led to participants learning the feature sequence in a one-dimensional way (akin to the 'overshadowing' effect in associative learning: e.g., see Mackintosh, 1976, "Overshadowing and stimulus intensity", Animal Learning and Behaviour). By contrast, we are very used to thinking about space as a multidimensional domain, in particular with regard to two-dimensional vertical and horizontal displacements. As a result, one would naturally expect to see more evidence of two-dimensional representation (allowing for rotational generalization) for spatial than nonspatial domains.

      In this view, the impact of spatial pre-training and (particularly) mapping is simply to highlight to participants that the auditory/visual stimuli comprise two separable (and independent) dimensions. Once they understand this, during subsequent training, they can learn about sequences on both dimensions, which will allow for a 2D representation and hence rotational generalization - as observed in Experiments 2 and 3. This account also anticipates that mapping alone (as in Experiment 4) could be sufficient to promote a 2D strategy for auditory and visual domains.

      This "attention to dimensions" account has some similarities to the "spatial scaffolding" idea put forward in the article, in arguing that experience of how auditory/visual feature manifolds can be translated into a spatial representation helps people to see those domains in a way that allows for rotational generalization. Where it differs is that it does not propose that space provides a scaffold for the development of the nonspatial representations, i.e., that people represent/learn the nonspatial information in a spatial format, and this is what allows them to manipulate nonspatial concepts. Instead, the "attention to dimensions" account anticipates that ANY manipulation that highlights to participants the separable-dimension nature of auditory/visual stimuli could facilitate 2D representation and hence rotational generalization. For example, explicit instruction on how the stimuli are constructed may be sufficient, or pre-training of some form with each dimension separately, before they are combined to form the 2D stimuli.

      I'd be interested to hear the authors' thoughts on this account - whether they see it as an alternative to their own interpretation, and whether it can be ruled out on the basis of their existing data.

      We thank the Reviewer for their comments. We agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are incompatible with this alternative explanation.

      In Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is thus necessary to pay attention to both auditory dimensions and both visual dimensions to perform the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, L&S investigates the important general question of how humans achieve invariant behavior over stimuli belonging to one category given the widely varying input representation of those stimuli and more specifically, how they do that in arbitrary abstract domains. The authors start with the hypothesis that this is achieved by invariance transformations that observers use for interpreting different entries and furthermore, that these transformations in an arbitrary domain emerge with the help of the transformations (e.g. translation, rotation) within the spatial domain by using those as "scaffolding" during transformation learning. To provide the missing evidence for this hypothesis, L&S used behavioral category learning studies within and across the spatial, auditory, and visual domains, where rotated and translated 4-element token sequences had to be learned to categorize and then the learned transformation had to be applied in new feature dimensions within the given domain. Through single- and multiple-day supervised training and unsupervised tests, L&S demonstrated by standard computational analyses that in such setups, space and spatial transformations can, indeed, help with developing and using appropriate rotational mapping whereas the visual domain cannot fulfill such a scaffolding role.

      Strengths:

      The overall problem definition and the context of spatial mapping-driven solution to the problem is timely. The general design of testing the scaffolding effect across different domains is more advanced than any previous attempts clarifying the relevance of spatial coding to any other type of representational codes. Once the formulation of the general problem in a specific scientific framework is done, the following steps are clearly and logically defined and executed. The obtained results are well interpretable, and they could serve as a good stepping stone for deeper investigations. The analytical tools used for the interpretations are adequate. The paper is relatively clearly written.

      Weaknesses:

      Some additional effort to clarify the exact contribution of the paper, the link between analyses and the claims of the paper, and its link to previous proposals would be necessary to better assess the significance of the results and the true nature of the proposed mechanism of abstract generalization.

      (1) Insufficient conceptual setup: The original theoretical proposal (the Tolman-Eichenbaum-Machine, Whittington et al., Cell 2020) that L&S relate their work to proposes that just as in the case of memory for spatial navigation, humans and animals create their flexible relational memory system of any abstract representation by a conjunction code that combines on the one hand, sensory representation and on the other hand, a general structural representation or relational transformation. The TEM also suggests that the structural representation could contain any graph-interpretable spatial relations, albeit in their demonstration 2D neighbor relations were used. The goal of L&S's paper is to provide behavioral evidence for this suggestion by showing that humans use representational codes that are invariant to relational transformations of non-spatial abstract stimuli and moreover, that humans obtain these invariances by developing invariance transformers with the help of available spatial transformers. To obtain such evidence, L&S use the rotational transformation. However, the actual procedure they use actually solved an alternative task: instead of interrogating how humans develop generalizations in abstract spaces, they demonstrated that if one defines rotation in an abstract feature space embedded in a visual or auditory modality that is similar to the 2D space (i.e. has two independent dimensions that are clearly segregable and continuous), humans cannot learn to apply rotation of 4-piece temporal sequences in those spaces while they can do it in 2D space, and with co-associating a one-to-one mapping between locations in those feature spaces with locations in the 2D space an appropriate shaping mapping training will lead to the successful application of rotation in the given task (and in some other feature spaces in the given domain). While this is an interesting and challenging demonstration, it does not shed light on how humans learn and generalize, only that humans CAN do learning and generalization in this, highly constrained scenario. This result is a demonstration of how a stepwise learning regiment can make use of one structure for mapping a complex input into a desired output. The results neither clarify how generalizations would develop in abstract spaces nor the question of whether this generalization uses transformations developed in the abstract space. The specific training procedure ensures success in the presented experiments but the availability and feasibility of an equivalent procedure in a natural setting is a crucial part of validating the original claim and that has not been done in the paper.

      We thank the Reviewer for their detailed comments on our manuscript. We reply to the three main points in turn.

      First, concerning the conceptual grounding of our work, we would point out that the TEM model (Whittington et al., 2020), however interesting, is not our theoretical starting point. Rather, as we hope the text and references make clear, we ground our work in theoretical work from the 1990/2000s proposing that space acts as a scaffold for navigating abstract spaces (such as Gärdenfors, 2000). We acknowledge that the TEM model and other experimental work on the implication of the hippocampus, the entorhinal cortex and the parietal cortex in relational transformations of nonspatial stimuli provide evidence for this general theory. However, our work is designed to test a more basic question: whether there is behavioural evidence that space scaffolds learning in the first place. To achieve this, we perform behavioural experiments with causal manipulation (spatial pre-training vs no spatial pre-training) have the potential to provide such direct evidence. This is why we claim that:

      “This theory is backed up by proof-of-concept computational simulations [13], and by findings that brain regions thought to be critical for spatial cognition in mammals (such as the hippocampal-entorhinal complex and parietal cortex) exhibit neural codes that are invariant to relational transformations of nonspatial stimuli. However, whilst promising, this theory lacks direct empirical evidence. Here, we set out to provide a strong test of the idea that learning about physical space scaffolds conceptual generalisation.“

      Second, we agree with the Reviewer that we do not provide an explicit model for how generalisation occurs, and how precisely space acts as a scaffold for building representations and/or applying the relevant transformations to non-spatial stimuli to solve our task. Rather, we investigate in our Exp. 2-4 which aspects of the training are necessary for rotational generalisation to happen (and conclude that a simple training with the multimodal association task is sufficient for ~20% participants). We now acknowledge in the discussion the fact that we do not provide an explicit model and leave that for future work:

      “We acknowledge that our study does not provide a mechanistic model of spatial scaffolding but rather delineate which aspects of the training are necessary for generalisation to happen.”

      Finally, we also agree with the Reviewer that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      (2) Missing controls: The asymptotic performance in experiment 1 after training in the three tasks was quite different in the three tasks (intercepts 2.9, 1.9, 1.6 for spatial, visual, and auditory, respectively; p. 5. para. 1, Fig 2BFJ). It seems that the statement "However, our main question was how participants would generalise learning to novel, rotated exemplars of the same concept." assumes that learning and generalization are independent. Wouldn't it be possible, though, that the level of generalization depends on the level of acquiring a good representation of the "concept" and after obtaining an adequate level of this knowledge, generalization would kick in without scaffolding? If so, a missing control is to equate the levels of asymptotic learning and see whether there is a significant difference in generalization. A related issue is that we have no information on what kind of learning in the three different domains was performed, albeit we probably suspect that in space the 2D representation was dominant while in the auditory and visual domains not so much. Thus, a second missing piece of evidence is the model-fitting results of the ⦰ condition that would show which way the original sequences were encoded (similar to Fig 2 CGK and DHL). If the reason for lower performance is not individual stimulus difficulty but the natural tendency to encode the given stimulus type by a combo of random + 1D strategy that would clarify that the result of the cross-training is, indeed, transferring the 2D-mapping strategy.

      We agree with the Reviewer that a good further control is to equate performance during training. Thus, we have run a complementary analysis where we select only the participants that reach > 90% accuracy in the last block of training in order to equate asymptotic performance after training in Exp. 1. The results (see Author response image 1) replicates the results that we report in the main text: there is a large difference between groups (relative likelihood of 1D vs. 2D models, all BF > 100 in favour of a difference between the auditory and the spatial modalities, between the visual and the spatial modalities, in both near and far transfer, “decisive” evidence). We prefer not to include this figure in the paper for clarity, and because we believe this result is expected given the fact that 0/50 and 0/50 of the participants in the auditory and visual condition used a 2D strategy – thus, selecting subgroups of these participants cannot change our conclusions.

      Author response image 1.

      Results of Exp. 1 when selecting participants that reached > 90% accuracy in the last block of training. Captions are the same as Figure 2 of the main text.

      Second, the Reviewer suggested that we run the model fitting analysis only on the ⦰ condition (training) in Exp. 1 to reveal whether participants use a 1D or a 2D strategy already during training. Unfortunately, we cannot provide the model fits only in the ⦰ condition in Exp. 1 because all models make the same predictions for this condition (see Fig S4). However, note that this is done by design: participants were free to apply whatever strategy they want during training; we then used the generalisation phase with the rotated stimuli precisely to reveal this strategy. Further, we do believe that the strategy used by the participants during training and the strategy during transfer are the same, partly because – starting from block #4 – participants have no idea whether the current trial is a training trial or a transfer trial, as both trial types are randomly interleaved with no cue signalling the trial type. We have made this clear in the methods:

      “They subsequently performed 105 trials (with trialwise feedback) and 105 transfer trials including rotated and far transfer quadruplets (without trialwise feedback) which were presented in mixed blocks of 30 trials. Training and transfer trials were randomly interleaved, and no clue indicated whether participants were currently on a training trial or a transfer trial before feedback (or absence of feedback in case of a transfer trial).”

      Reviewer #3 (Public Review):

      Summary:

      Pesnot Lerousseau and Summerfield aimed to explore how humans generalize abstract patterns of sensory data (concepts), focusing on whether and how spatial representations may facilitate the generalization of abstract concepts (rotational invariance). Specifically, the authors investigated whether people can recognize rotated sequences of stimuli in both spatial and nonspatial domains and whether spatial pre-training and multi-modal mapping aid in this process.

      Strengths:

      The study innovatively examines a relatively underexplored but interesting area of cognitive science, the potential role of spatial scaffolding in generalizing sequences. The experimental design is clever and covers different modalities (auditory, visual, spatial), utilizing a two-dimensional feature manifold. The findings are backed by strong empirical data, good data analysis, and excellent transparency (including preregistration) adding weight to the proposition that spatial cognition can aid abstract concept generalization.

      Weaknesses:

      The examples used to motivate the study (such as "tree" = oak tree, family tree, taxonomic tree) may not effectively represent the phenomena being studied, possibly confusing linguistic labels with abstract concepts. This potential confusion may also extend to doubts about the real-life applicability of the generalizations observed in the study and raises questions about the nature of the underlying mechanism being proposed.

      We thank the Reviewer for their comments. We agree that we could have explained ore clearly enough how these examples motivate our study. The similarity between “oak tree” and “family tree” is not just the verbal label. Rather, it is the arrangement of the parts (nodes and branches) in a nested hierarchy. Oak trees and family trees share the same relational structure. The reason that invariance is relevant here is that the similarity in relational structure is retained under rigid body transformations such as rotation or translation. For example, an upside-down tree can still be recognised as a tree, just as a family tree can be plotted with the oldest ancestors at either top or bottom. Similarly, in our study, the quadruplets are defined by the relations between stimuli: all quadruplets use the same basic stimuli, but the categories are defined by the relations between successive stimuli. In our task, generalising means recognising that relations between stimuli are the same despite changes in the surface properties (for example in far transfer). We have clarify that in the introduction:

      “For example, the concept of a “tree” implies an entity whose structure is defined by a nested hierarchy, whether this is a physical object whose parts are arranged in space (such as an oak tree in a forest) or a more abstract data structure (such as a family tree or taxonomic tree). [...] Despite great changes in the surface properties of oak trees, family trees and taxonomic trees, humans perceive them as different instances of a more abstract concept defined by the same relational structure.”

      Next, the study does not explore whether scaffolding effects could be observed with other well-learned domains, leaving open the question of whether spatial representations are uniquely effective or simply one instance of a familiar 2D space, again questioning the underlying mechanism.

      We would like to mention that Reviewer #2 had a similar comment. We agree with both Reviewers that our task is non-naturalistic. As is common in experimental research, one must sacrifice the naturalistic elements of the task in exchange for the control and the absence of prior knowledge of the participants. We have decided to mitigate as possible the prior knowledge of the participants to make sure that our task involved learning a completely new task and that the pre-training was really causing the better learning/generalisation. The effects we report are consistent across the experiments so we feel confident about them but we agree with the Reviewer that an external validation with more naturalistic stimuli/tasks would be a nice addition to this work. We have included a sentence in the discussion:

      “All the effects observed in our experiments were consistent across near transfer conditions (rotation of patterns within the same feature space), and far transfer conditions (rotation of patterns within a different feature space, where features are drawn from the same modality). This shows the generality of spatial training for conceptual generalisation. We did not test transfer across modalities nor transfer in a more natural setting; we leave this for future studies.”

      Further doubt on the underlying mechanism is cast by the possibility that the observed correlation between mapping task performance and the adoption of a 2D strategy may reflect general cognitive engagement rather than the spatial nature of the task. Similarly, the surprising finding that a significant number of participants benefited from spatial scaffolding without seeing spatial modalities may further raise questions about the interpretation of the scaffolding effect, pointing towards potential alternative interpretations, such as shifts in attention during learning induced by pre-training without changing underlying abstract conceptual representations.

      The Reviewer is concerned about the fact that the spatial pre-training could benefit the participants by increasing global cognitive engagement rather than providing a scaffold for learning invariances. It is correct that the participants in the control group in Exp. 2c have poorer performances on average than participants that benefit from the spatial pre-training in Exp. 2a and 2b. The better performances of the participants in Exp. 2a and 2b could be due to either the spatial nature of the pre-training (as we claim) or a difference in general cognitive engagement. .

      However, if we look closely at the results of Exp. 3, we can see that the general cognitive engagement hypothesis is not well supported by the data. Indeed, the participants in the control condition (Exp. 3c) have relatively similar performances than the other groups during training. Rather, the difference is in the strategy they use, as revealed by the transfer condition. The majority of them are using a 1D strategy, contrary to the participants that benefited from a spatial pre-training (Exp 3a and 3b). We have included a sentence in the results:

      “Further, the results show that participants who did not experience spatial pre-training were still engaged in the task, but were not using the same strategy as the participants who experienced spatial pre-training (1D rather than 2D). Thus, the benefit of the spatial pre-training is not simply to increase the cognitive engagement of the participants. Rather, spatial pre-training provides a scaffold to learn rotation-invariant representation of auditory and visual concepts even when rotation is never explicitly shown during pre-training.”

      Finally, Reviewer #1 had a related concern about a potential alternative explanation that involved a shift in attention. We reproduce our response here: we agree with the Reviewer that the “attention to dimensions” hypothesis is an interesting (and potentially concerning) alternative explanation. However, we believe that the results of our control experiments Exp. 2c and Exp. 3c are not compatible with this alternative explanation.

      Indeed, in Exp. 2c, participants are pre-trained in the visual modality and then tested in the auditory modality. In the multimodal association task, participants have to associate the auditory stimuli and the visual stimuli: on each trial, they hear a sound and then have to click on the corresponding visual stimulus. It is necessary to pay attention to both auditory dimensions and both visual dimensions to perform well in the task. To give an example, the task might involve mapping the fundamental frequency and the amplitude modulation of the auditory stimulus to the colour and the shape of the visual stimulus, respectively. If participants pay attention to only one dimension, this would lead to a maximum of 25% accuracy on average (because they would be at chance on the other dimension, with four possible options). We observed that 30/50 participants reached an accuracy > 50% in the multimodal association task in Exp. 2c. This means that we know for sure that at least 60% of the participants actually paid attention to both dimensions of the stimuli. Nevertheless, there was a clear difference between participants that received a visual pre-training (Exp. 2c) and those who received a spatial pre-training (Exp. 2a) (frequency of 1D vs 2D models between conditions, BF > 100 in near transfer and far transfer). In fact, only 3/50 participants were best fit by a 2D model when vision was the pre-training modality compared to 29/50 when space was the pre-training modality. Thus, the benefit of the spatial pre-training cannot be due solely to a shift in attention toward both dimensions.

      This effect was replicated in Exp. 3c. Similarly, 33/48 participants reached an accuracy > 50% in the multimodal association task in Exp. 3c, meaning that we know for sure that at least 68% of the participants actually paid attention to both dimensions of the stimuli. Again, there was a clear difference between participants who received a visual pre-training (frequency of 1D vs 2D models between conditions, Exp. 3c) and those who received a spatial pre-training (Exp. 3a) (BF > 100 in near transfer and far transfer).

      Thus, we believe that the alternative explanation raised by the Reviewer is not supported by our data. We have added a paragraph in the discussion:

      “One alternative explanation of this effect could be that the spatial pre-training encourages participants to attend to both dimensions of the non-spatial stimuli. By contrast, pretraining in the visual or auditory domains (where multiple dimensions of a stimulus may be relevant less often naturally) encourages them to attend to a single dimension. However, data from our control experiments Exp. 2c and Exp. 3c, are incompatible with this explanation. Around ~65% of the participants show a level of performance in the multimodal association task (>50%) which could only be achieved if they were attending to both dimensions (performance attending to a single dimension would yield 25% and chance performance is at 6.25%). This suggests that participants are attending to both dimensions even in the visual and auditory mapping case.”

      Conclusions:

      The authors successfully demonstrate that spatial training can enhance the ability to generalize in nonspatial domains, particularly in recognizing rotated sequences. The results for the most part support their conclusions, showing that spatial representations can act as a scaffold for learning more abstract conceptual invariances. However, the study leaves room for further investigation into whether the observed effects are unique to spatial cognition or could be replicated with other forms of well-established knowledge, as well as further clarifications of the underlying mechanisms.

      Impact:

      The study's findings are likely to have a valuable impact on cognitive science, particularly in understanding how abstract concepts are learned and generalized. The methods and data can be useful for further research, especially in exploring the relationship between spatial cognition and abstract conceptualization. The insights could also be valuable for AI research, particularly in improving models that involve abstract pattern recognition and conceptual generalization.

      In summary, the paper contributes valuable insights into the role of spatial cognition in learning abstract concepts, though it invites further research to explore the boundaries and specifics of this scaffolding effect.

      Reviewer #1 (Recommendations For The Authors):

      Minor issues / typos:

      P6: I think the example of the "signed" mapping here should be "e.g., ABAB maps to one category and BABA maps to another", rather than "ABBA maps to another" (since ABBA would always map to another category, whether the mapping is signed or unsigned).

      Done.

      P11: "Next, we asked whether pre-training and mapping were systematically associated with 2Dness...". I'd recommend changing to: "Next, we asked whether accuracy during pre-training and mapping were systematically associated with 2Dness...", just to clarify what the analyzed variables are.

      Done.

      P13, paragraph 1: "only if the features were themselves are physical spatial locations" either "were" or "are" should be removed.

      Done.

      P13, paragraph 1: should be "neural representations of space form a critical substrate" (not "for").

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors use in multiple places in the manuscript the phrases "learn invariances" (Abstract), "formation of invariances" (p. 2, para. 1), etc. It might be just me, but this feels a bit like 'sloppy' wording: we do not learn or form invariances, rather we learn or form representations or transformations by which we can perform tasks that require invariance over particular features or transformation of the input such as the case of object recognition and size- translation- or lighting-invariance. We do not form size invariance, we have representations of objects and/or size transformations allowing the recognition of objects of different sizes. The authors might change this way of referring to the phenomenon.

      We respectfully disagree with this comment. An invariance occurs when neurons make the same response under different stimulation patterns. The objects or features to which a neuron responds is shaped by its inputs. Those inputs are in turn determined by experience-dependent plasticity. This process is often called “representation learning”. We think that our language here is consistent with this status quo view in the field.

      Reviewer #3 (Recommendations For The Authors):

      • I understand that the objective of the present experiment is to study our ability to generalize abstract patterns of sensory data (concepts). In the introduction, the authors present examples like the concept of a "tree" (encompassing a family tree, an oak tree, and a taxonomic tree) and "ring" to illustrate the idea. However, I am sceptical as to whether these examples effectively represent the phenomena being studied. From my perspective, these different instances of "tree" do not seem to relate to the same abstract concept that is translated or rotated but rather appear to share only a linguistic label. For instance, the conceptual substance of a family tree is markedly different from that of an oak tree, lacking significant overlap in meaning or structure. Thus, to me, these examples do not demonstrate invariance to transformations such as rotations.

      To elaborate further, typically, generalization involves recognizing the same object or concept through transformations. In the case of abstract concepts, this would imply a shared abstract representation rather than a mere linguistic category. While I understand the objective of the experiments and acknowledge their potential significance, I find myself wondering about the real-world applicability and relevance of such generalizations in everyday cognitive functioning. This, in turn, casts some doubt on the broader relevance of the study's results. A more fitting example, or an explanation that addresses my concerns about the suitability of the current examples, would be beneficial to further clarify the study's intent and scope.

      Response in the public review.

      • Relatedly, the manuscript could benefit from greater clarity in defining key concepts and elucidating the proposed mechanism behind the observed effects. Is it plausible that the changes observed are primarily due to shifts in attention induced by the spatial pre-training, rather than a change in the process of learning abstract conceptual invariances (i.e., modifications to the abstract representations themselves)? While the authors conclude that spatial pre-training acts as a scaffold for enhancing the learning of conceptual invariances, it raises the question: does this imply participants simply became more focused on spatial relationships during learning, or might this shift in attention represent a distinct strategy, and an alternative explanation? A more precise definition of these concepts and a clearer explanation of the authors' perspective on the mechanism underlying these effects would reduce any ambiguity in this regard.

      Response in the public review.

      • I am wondering whether the effectiveness of spatial representations in generalizing abstract concepts stems from their special nature or simply because they are a familiar 2D space for participants. It is well-established that memory benefits from linking items to familiar locations, a technique used in memory training (method of loci). This raises the question: Are we observing a similar effect here, where spatial dimensions are the only tested familiar 2D spaces, while the other 2 spaces are simply unfamiliar, as also suggested by the lower performance during training (Fig.2)? Would the results be replicable with another well-learned, robustly encoded domain, such as auditory dimensions for professional musicians, or is there something inherently unique about spatial representations that aids in bootstrapping abstract representations?

      On the other side of the same coin, are spatial representations qualitatively different, or simply more efficient because they are learned more quickly and readily? This leads to the consideration that if visual pre-training and visual-to-auditory mapping were continued until a similar proficiency level as in spatial training is achieved, we might observe comparable performance in aiding generalization. Thus, the conclusion that spatial representations are a special scaffold for abstract concepts may not be exclusively due to their inherent spatial nature, but rather to the general characteristic of well-established representations. This hypothesis could be further explored by either identifying alternative 2D representations that are equally well-learned or by extending training in visual or auditory representations before proceeding with the mapping task. At the very least I believe this potential explanation should be explored in the discussion section.

      Response in the public review.

      I had some difficulty in following an important section of the introduction: "... whether participants can learn rotationally invariant concepts in nonspatial domains, i.e., those that are defined by sequences of visual and auditory features (rather than by locations in physical space, defined in Cartesian or polar coordinates) is not known." This was initially puzzling to me as the paragraph preceding it mentions: "There is already good evidence that nonspatial concepts are represented in a translation invariant format." While I now understand that the essential distinction here is between translation and rotation, this was not immediately apparent upon first reading. This crucial distinction, especially in the context of conceptual spaces, was not clearly established before this point in the manuscript. For better clarity, it would be beneficial to explicitly contrast and define translation versus rotation in this particular section and stress that the present study concerns rotations in abstract spaces.

      Done.

      • The multi-modal association is crucial for the study, however to my knowledge, it is not depicted or well explained in the main text or figures (Results section). In my opinion, the details of this task should be explained and illustrated before the details of the associated results are discussed.

      We have included an illustration of a multimodal association trial in Fig. S3B.

      Author response image 2.

      • The observed correlation between the mapping task performance and the adoption of a 2D strategy is logical. However, this correlation might not exclusively indicate the proposed underlying mechanism of spatial scaffolding. Could it also be reflective of more general factors like overall performance, attention levels, or the effort exerted by participants? This alternative explanation suggests that the correlation might arise from broader cognitive engagement rather than specifically from the spatial nature of the task. Addressing this possibility could strengthen the argument for the unique role of spatial representations in learning abstract concepts, or at least this alternative interpretation should be mentioned.

      Response in the public review.

      • To me, the finding that ~30% of participants benefited from the spatial scaffolding effect for example in the auditory condition merely through exposure to the mapping (Fig 4D), without needing to see the quadruplets in the spatial modality, was somewhat surprising. This is particularly noteworthy considering that only ~60% of participants adopted the 2D strategy with exposure to rotated contingencies in Experiment 3 (Fig 3D). How do the authors interpret this outcome? It would be interesting to understand their perspective on why such a significant effect emerged from mere exposure to the mapping task.

      • I appreciate the clarity Fig.1 provides in explaining a challenging experimental setup. Is it possible to provide example trials, including an illustration that shows which rotations produce the trail and an intuitive explanation that response maps onto the 1D vs 2D strategies respectively, to aid the reader in better understanding this core manipulation?

      • I like that the authors provide transparency by depicting individual subject's data points in their results figures (e.g. Figs. 2 B, F, J). However, with an n=~50 per condition, it becomes difficult to intuit the distribution, especially for conditions with higher variance (e.g., Auditory). The figures might be more easily interpretable with alternative methods of displaying variances, such as violin plots per data point, conventional error shading using 95%CIs, etc.

      • Why are the authors not reporting exact BFs in the results sections at least for the most important contrasts?

      • While I understand why the authors report the frequencies for the best model fits, this may become difficult to interpret in some sections, given the large number of reported values. Alternatives or additional summary statistics supporting inference could be beneficial.

      As the Reviewer states, there are a large number of figures that we can report in this study. We have chosen to keep this number at a minimum to be as clear as possible. To illustrate the distribution of individual data points, we have opted to display only the group's mean and standard error (the standard errors are included, but the substantial number of participants per condition provides precise estimates, resulting in error bars that can be smaller than the mean point). This decision stems from our concern that including additional details could lead to a cluttered representation with unnecessary complexity. Finally, we report what we believe to be the critical BFs for the comprehension of the reader in the main text, and choose a cutoff of 100 when BFs are high (corresponding to the label “decisive” evidence, some BFs are larger than 1012). All the exact BFs are in the supplementary for the interested readers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides a near-complete description of the mechanosensory bristles on the Drosophila melanogaster head and the anatomy and projection patterns of the bristle mechanosensory neurons that innervate them. The data presented are solid. The study has generated numerous invaluable resources for the community that will be of interest to neuroscientists in the field of circuits and behaviour, particularly those interested in mechanosensation and behavioural sequence generation.

      We express our gratitude to the Reviewers for their valuable suggestions, which significantly enhanced the manuscript. The revisions were undertaken, not with the expectation of acceptance, but rather driven by our sincere belief that these revisions would enhance the manuscript's impact for future readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Sensory neurons of the mechanosensory bristles on the head of the fly project to the sub esophageal ganglion (SEZ). In this manuscript, the authors have built on a large body of previous work to comprehensively classify and quantify the head bristles. They broadly identify the nerves that various bristles use to project to the SEZ and describe their region-specific innervation in the SEZ. They use dye-fills, clonal labelling, and electron microscopic reconstructions to describe in detail the phenomenon of somatotopy - conserved peripheral representations within the central brain - within the innervation of these neurons. In the process they develop novel tools to access subsets of these neurons. They use these to demostrate that groups of bristles in different parts of the head control different aspects of the grooming sequence.

      Reviewer #2 (Public Review):

      The authors combine genetic tools, dye fills and connectome analysis techniques to generate a "first-of-its-kind", near complete, synaptic resolution map of the head bristle neurons of Drosophila. While some of the BMN anatomy was already known based on previous work by the authors and other researchers, this is the first time a near complete map has been created for the head BMNs at electron microscopy resolution.

      Strengths:

      (1) The authors cleverly use techniques that allow moving back and forth between periphery (head bristle location) and brain, as well as moving between light microscopy and electron microscopy data. This allows them to first characterize the pathways taken by different head BMNs to project to the brain and also characterize anatomical differences among individual neurons at the level of morphology and connectivity.

      (2) The work is very comprehensive and results in a near complete map of all I’m head BMNs.

      (3) Authors also complement this anatomical characterization with a first-level functional analysis using optogenetic activation of BMNs that results in expected directed grooming behavior.

      Weaknesses:

      (1) The clustering analysis is compelling but cluster numbers seem to be arbitrarily chosen instead of by using some informed metrics.

      We made revisions to the manuscript that address this concern. Please see our response to “recommendations for authors” for a description of these revisions.

      (2) It could help provide context if authors revealed some of the important downstream pathways that could explain optogenetics behavioral phenotypes and previously shown hierarchical organization of grooming sequences.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      (3) In contrast to the rigorous quantitative analysis of the anatomical data, the behavioral data is analyzed using much more subjective methods. While I do not think it is necessary to perform a rigorous analysis of behaviors in this anatomy focused manuscript, the conclusions based on behavioral analysis should be treated as speculative in the current form e.g. calling "nodding + backward walking" as an avoidance response is not justified as it currently stands. Strong optogenetic activation could lead to sudden postural changes that due to purely biomechanical constraints could lead to a couple of backward steps as seen in the example videos. Moreover since the quantification is manual, it is not clear what the analyst interprets as backward walking or nodding. Interpretation is also concerning because controls show backward walking (although in fewer instances based on subjective quantification).

      While unbiased machine vision-based methods would nicely complement the present work, this type of analysis is not yet working to distinguish between different head grooming movements. Therefore, we are currently limited to manual annotation for our behavioral analysis. That said, we do not believe that our manual annotation is subjective. The grooming movements that we examine in this work are distinguishable from each other through frame-by-frame manual annotation of video at 30 fps. Our annotation of the grooming and backward motions performed by flies are based on previous publications that established a controlled vocabulary defining each movement (Hampel et al., 2020a, 2017, 2015; Seeds et al., 2014). In this work, we added head nodding to this controlled vocabulary that is described in the Materials and methods. We have added additional text to the third paragraph of the Material and methods section entitled “Behavioral analysis procedures” that we hope better describes our behavioral analysis. This description now reads:

      Head nodding was annotated when the fly tilted its head downward by any amount until it returned its head back in its original position. This movement often occurred in repeated cycles. Therefore, the “start” was scored at the onset of the first forward movement and the “stop” when the head returned to its original position on the last nod.

      We do not make any firm conclusions about the head movements (nodding) and backwards motions. We refer to nodding as a descriptive term that would allow the reader to better understand what the behavior looks like. We make no firm conclusions about any behavioral functional role that either the nodding or the backward motions might have, with the exception of nodding in the context of grooming. We only suggest that the behaviors appear to be avoidance responses. Furthermore, backward walking was not mentioned. Instead we refer to backward motions. We are only reporting our annotations of these movements that do occur, and are significantly different from controls. We speculate that these could be avoidance responses based on support from the literature. Future studies will be required to understand whether these movements serve real behavioral roles.

      Summary:

      The authors end up generating a near-complete map of head BMNs that will serve as a long-standing resource to the Drosophila research community. This will directly shape future experiments aimed at modeling or functionally analyzing the head grooming circuit to understand how somatotopy guides behaviors.

      Reviewer #3 (Public Review):

      Eichler et al. set out to map the locations of the mechanosensory bristles on the fly head, examine the axonal morphology of the bristle mechanosensory neurons (BMNs) that innervate them, and match these to electron microscopy reconstructions of the same BMNs in a previously published EM volume of the female adult fly brain. They used BMN synaptic connectivity information to create clusters of BMNs that they show occupy different regions of the subesophageal zone brain region and use optogenetic activation of subsets of BMNs to support the claim that the morphological projections and connectivity of defined groups of BMNs are consistent with the parallel model for behavioral sequence generation.

      The authors have beautifully cataloged the mechanosensory bristles and the projection paths and patterns of the corresponding BMN axons in the brain using detailed and painstaking methods. The result is a neuroanatomy resource that will be an important community resource. To match BMNs reconstructed in an electron microscopy volume of the adult fly brain, the authors matched clustered reconstructed BMNs with light-level BMN classes using a variety of methods, but evidence for matching is only summarized and not demonstrated in a way that allows the reader to evaluate the strength of the evidence. The authors then switch from morphology-based categorization to non-BMN connectivity as a clustering method, which they claim demonstrates that BMNs form a somatotopic map in the brain. This map is not easily appreciated, and although contralateral projections in some populations are clear, the distinct projection zones that are mentioned by the authors are not readily apparent. Because of the extensive morphological overlap between connectivity-based clusters, it is not clear that small projection differences at the projection level are what determines the post-synaptic connectivity of a given BMN cluster or their functional role during behavior. The claim the somatotopic organization of BMN projections is preserved among their postsynaptic partners to form parallel sensory pathways is not supported by the result that different connectivity clusters still have high cosine similarity in a number of cases (i.e. Clusters 1 and 3, or Clusters 1 and 2). Finally, the authors use tools that were generated during the light-level characterization of BMN projections to show that specifically activating BMNs that innervate different areas of the head triggers different grooming behaviors. In one case, activation of a single population of sensory bristles (lnOm) triggers two different behaviors, both eye and dorsal head grooming. This result does not seem consistent with the parallel model, which suggests that these behaviors should be mutually exclusive and rely on parallel downstream circuitry.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      This work will have a positive impact on the field by contributing a complete accounting of the mechanosensory bristles of the fruit fly head, describing the brain projection patterns of the BMNs that innervate them, and linking them to BMN sensory projections in an electron microscopy volume of the adult fly brain. It will also have a positive impact on the field by providing genetic tools to help functionally subdivide the contributions of different BMN populations to circuit computations and behavior. This contribution will pave the way for further mechanistic study of central circuits that subserve grooming circuits.

      Recommendations for the authors:

      All three reviewers appreciated the work presented in this manuscript. There were also a few overlapping concerns that were raised that are summarised below, should the authors wish to address them:

      Somatotopy: We recommend that the authors describe the extent of prior knowledge in more detail to highlight their contribution better.

      We made revisions that better highlight the extent of prior knowledge about somatotopy. We describe how previous studies showed bristle mechanosensory neurons in insects are somatotopically organized, but these studies were not comprehensive descriptions of complete somatotopic maps for the head or body. To our knowledge, our study provides the first comprehensive and synaptic resolution somatotopic map of a head for any animal. This sets the stage for the complete definition of the interface between somatotopically-organized mechanosensory neurons and postsynaptic circuits, which has broad implications for future studies on aimed grooming, and mechanosensation in general. Below we itemize revisions to the Introduction, Discussion, and Figures to provide a clearer statement of the significance of our study as it relates to somatotopy.

      (1) Newly added Figure 1 – figure supplement 1 more explicitly grounds the study in somatotopy, providing a working model of the organization of the circuit pathways that produce the grooming sequence. This model features somatotopy as shown in Figure 1 – figure supplement 1C.

      (2) Figure 1 – figure supplement 1 is incorporated into the Introduction in the second, third, and fourth paragraphs, the first paragraph of the Results section titled “Somatotopically-organized parallel BMN pathways”, and the second and third paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      (3) We added text to the end of the fourth paragraph of the Introduction that now reads: “In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.”

      (4) There is a Discussion section that further explains the extent of prior knowledge and our contributions on somatotopy that is titled “A synaptic resolution somatotopic map of the head BMNs”. Additionally, the previous version of this section had a paragraph on the broader implications of our work as it relates to somatotopy across species. In light of the reviewer comments, we decided to make this paragraph into its own Discussion section to better highlight the broader significance of our work. This section is titled “First synaptic resolution somatotopic map of the head”.

      The somatotopy isn't overtly obvious - perhaps they could try mapping presynaptic sites and provide landmarks to improve visualisation.

      We made the following revisions to better highlight the head BMN somatotopy. One point of confusion from the previous manuscript version stemmed from us not explicitly defining the somatotopic organization that we observed. There seemed to be confusion that we were defining the head somatotopy based only on the small projection differences among BMNs from neighboring head locations. While we believe that these small differences indeed correspond to somatotopy, we failed to highlight that there are overt differences in the brain projections of BMNs from distant locations on the head. For example, Figure 5B (right panel) shows the distinct projections between the LabNv (brown) and AntNv (blue) BMNs that innervate bristles on the ventral and dorsal head, respectively. Thus, BMN types innervating neighboring bristles show overlapping projections with small projection differences, whereas those innervating distant bristles show non overlapping projections into distinct zones.

      Our analysis of postsynaptic connectivity similarity also shows somatotopic organization among the BMN postsynaptic partners, as BMN types innervating the same or neighboring bristle populations show high connectivity similarity (Figure 8, old Figure 7). Below we highlight major revisions to the text and Figures that hopefully better reveal the head somatotopy.

      (1) In the last paragraph of the Introduction we added text that explicitly frames the experiments in terms of somatotopic organization: “This reveals somatotopic organization, where BMNs innervating neighboring bristles project to the same zones in the CNS while those innervating distant bristles project to distinct zones. Analysis of the BMN postsynaptic connectome reveals that neighboring BMNs show higher connectivity similarity than distant BMNs, providing evidence of somatotopically organized postsynaptic circuit pathways.”

      (2) We mention an example of overt somatotopy from Figure 5 in the Results section titled “EM-based reconstruction of the head BMN projections in a full adult brain”. The text reads “For example, BMNs from the Eye- and LabNv have distinct ventral and anterior projections, respectively. This shows how the BMNs are somatotopically organized, as their distinct projections correspond to different bristle locations on the head (Figure 5B,C).”

      (3) In new Figure 8 (part of old Figure 7), we modified panels that correspond to the cosine similarity analysis of postsynaptic connectivity. The major revision was to plot the cosine similarity clusters onto the head bristles so that the bristles are now colored based on their clusters (C). This shows how neighboring BMNs cluster together, and therefore show similar postsynaptic connectivity. We believe that this provides a nice visualization of somatotopic organization in BMN postsynaptic connectivity. We also added the clustering dendrogram as recommended by Reviewer #2 (Figure 8A).

      (4) In new Figure 8, we added new panels (D-F) that summarize our anatomical and connectomic analysis showing different somatotopic features of the head BMNs. Different BMN types innervate bristles at neighboring and distant proximities (D). BMNs that innervate neighboring bristles project into overlapping zones (E, example of reconstructed BM-Fr and -Ant neurons with non-overlapping BM-MaPa neurons) and show postsynaptic connectivity similarity (F, example connectivity map of three BM types on cosine similarity data).

      (5) To accompany the new Figure 8D-F panels, we added a paragraph to summarize the different somatotopic features of the head BMNs that were identified based on our anatomical and connectomic analysis. This is the last paragraph in the Results section titled “Somatotopically-organized parallel BMN pathways”:

      Our results reveal head bristle proximity-based organization among the BMN projections and their postsynaptic partners to form parallel mechanosensory pathways. BMNs innervating neighboring bristles project into overlapping zones in the SEZ, whereas those innervating distant bristles project to distinct zones (example of BM-Fr, -Ant, and -MaPa neurons shown in Figure 8D,E). Cosine similarity analysis of BMN postsynaptic connectivity revealed that BMNs innervating the same bristle populations (same types) have the highest connectivity similarity. Figure 8F shows example parallel connections for BM-Fr, -Ant, and -MaPa neurons (vertical arrows), where the edge width indicates the number of synapses from each BMN type to their major postsynaptic partners. Additionally, BMNs innervating neighboring bristle populations showed postsynaptic connectivity similarity, while BMNs innervating distant bristles show little or none. For example, BM-Fr and -Ant neurons have connections to common postsynaptic partners, whereas BM-MaPa neurons show only weak connections with the main postsynaptic partners of BM-Fr or -Ant neurons (Figure 8F, connections under 5% of total BMN output omitted). These results suggest that BMN somatotopy could have different possible levels of head spatial resolution, from specific bristle populations (e.g. Ant bristles), to general head areas (e.g. dorsal head bristles).

      We also refer to Figure 8D-F to illustrate the different somatotopic features in the Discussion. These references can be found in the following Discussion sections titled “A synaptic resolution somatotopic map of the head BMNs (fourth paragraph)”, and “Parallel circuit architecture underlying the grooming sequence (second paragraph)”.

      (6) In addition to improving the Figures, we provide additional tools that enable readers to explore the BMN somatotopy in a more interactive way. That is, we provide 5 different FlyWire.ai links in the manuscript Results section that enable 3D visualization of the different reconstructed BMNs (e.g. FlyWire.ai link 1).

      Note: In working on old Figure 7 to address this Reviewer suggestion, we also reordered panels A-E. We believe that this was a more logical ordering than in the previous draft. These panels are now the only data shown in Figure 7, as the cosine similarity analysis is now in Figure 8. We hope that splitting these panels into two Figures will improve manuscript readability.

      Light EM Mapping: A better description of methods by which this mapping was done would be helpful. Perhaps the authors could provide a few example parallel representations of the EM and light images in the main figure would help the reader better appreciate the strength of their approach.

      We have done as the Reviewers suggested and added panels to Figure 6 that show examples of the LM and EM image matching (Figure 6A,B). We added two examples that used different methods for labeling the LM imaged BMNs, including MCFO labeling of an individual BM-InOc neuron and driver line labeling of a major portion of BM-InOm neurons using InOmBMN-LexA. These panels are referred to in the first paragraph of the Results section titled “Matching the reconstructed head BMNs with their bristles”. Note that examples for all LM/EM matched BMN types are shown in Figure 6 – figure supplement 2.

      We had provided Figure 6 – figure supplement 2 in the reviewed manuscript that shows all the above requested “parallel representations of the EM and light images”. However, the Reviewer critiques made us realize that the purpose of this figure supplement was not clearly indicated. Therefore, we have revised Figure 6 – figure supplement 2 and its legend to make its purpose clearer. First, we changed the legend title to better highlight its purpose. The legend is now titled: “Matching EM reconstructed BMN projections with light microscopy (LM) imaged BMNs that innervate specific bristles”. Second, we added label designations to the figure panel rows that highlight the LM and EM comparisons. That is, the rows for light microscopy images of BMNs are indicated with LM and the rows for EM reconstructed BMN images are labeled with EM. Reviewer #3 had indicated that it was not clear what labeling methods were used to visualize the LM imaged BM-InOm neurons in Figure 6 – figure supplement 2N. Therefore, we added text to the figure and the legend to better highlight the different methods used. Panels A and B were also cropped to accommodate the above mentioned revisions.

      The manuscript also provides an extensive Materials and methods section that describes the different lines of evidence that were used to assign the reconstructed BMNs as specific types. We changed the title to better highlight the purpose of this methods section to “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. The evidence used to support the assignment of the different BMN types is also summarized in Figure 6 – figure supplement 3.

      Parallel circuit model: The authors motivate their study with this. We're recommending that they define expectations of such circuitry, its alternatives (including implications for downstream pathways), and behavior before they present their results. We're also recommending that they interpret their behavioural results in the context of these circuits.

      Our primary motivation for doing the experiments described in this manuscript was to help define the neural circuit architecture underlying the parallel model that drives the Drosophila grooming sequence. This manuscript provides a comprehensive assessment of the first layer of this circuit architecture. A byproduct of this work is a contribution that offers immediate utility and significance to the Drosophila connectomics community. Namely, the description of the majority of mechanosensory neurons on the head, with their annotation in the recently released whole brain connectome dataset (FlyWire.ai). In writing this manuscript, we tried to balance both of these things, which was difficult to write. We very much appreciate the Reviewers' comments that have highlighted points of confusion in our original draft. We hope that the revised draft is now clearer and more logically presented. We have made revisions to the text and provided a new figure supplement (Figure 1 - figure supplement 1) and new panels in Figure 8. Below we highlight the major revisions.

      (1) The Introduction was revised to more explicitly ground the study in the parallel model, while also removing details that were not pertinent to the experiments presented in the manuscript.

      The first paragraph introduces different features of the parallel model. To better focus the reader on the parts of the model that were being assessed in the manuscript, we removed the following sentences: “Performance order is established by an activity gradient among parallel circuits where earlier actions have the highest activity and later actions have the lowest. A winner-take-all network selects the action with the highest activity and suppresses the others. The selected action is performed and then terminated to allow a new round of competition and selection of the next action.” Note that these sentences are included in the third and fourth paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      The first paragraph of the Introduction now introduces a bigger picture view of the model that emphasizes the two main features: 1) a parallel circuit architecture that ensures all mutually exclusive actions to be performed in sequence are simultaneously readied and competing for output, and 2) hierarchical suppression among the parallel circuits, where earlier actions suppress later actions.

      (2) Newly added Figure 1 – figure supplement 1 provides a working model of grooming (Reviewer # 1 suggestion). We now more strongly emphasize that the study aimed to define the parallel neural circuit architecture underlying the grooming sequence, focusing on the mechanosensory layer of this architecture. In particular, we refer to the new Figure 1 – figure supplement 1 that has been added to better convey the hypothesized grooming neural circuit architecture. Figure 1 – figure supplement 1 is incorporated into the Introduction (paragraphs two, three, and four), Results section titled “Somatotopically-organized parallel BMN pathways (first paragraph)”, and last Discussion section titled “Parallel circuit architecture underlying the grooming sequence (second and third paragraphs)”.

      (3) New panels in Figure 8 update the model of parallel circuit organization as it relates to somatotopy (D-F). These panels show the parallel circuits hypothesized by the model, but also indicate convergence, with different possible levels of head resolution for these circuits. We describe above where these panels are referenced in the text.

      (4) We added a new paragraph in the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence” that better incorporates the results from this manuscript into the working model of grooming. This paragraph is shown below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      Aside from this summary of major concerns, the detailed recommendations are attached below.

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the quality and exhaustive body of work presented in this manuscript. I have a few comments that the authors may want to consider:

      (1) The authors motivate this study by posing that it would allow them to uncover whether the complex grooming behaviour of flies followed a parallel model of circuit function. It would have been nice to have been introduced to what the alternative model might be and what each would mean for organisation of the circuit architecture. Some guiding schematics would go a long way in illustrating this point. Modifying the discussion along these lines would also be helpful.

      We made several revisions to the manuscript that address this recommendation. Among these revisions, we added Figure 1 – figure supplement 1 that includes a working model for grooming. Please see above for a description of these revisions.

      (2) The authors mention the body of work that has mapped head bristles and described somatotopy. It would be useful to discuss in more detail what these studies have shown and highlight where the gaps are that their study fills.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (3) The dye-fills and reconstructions that are single colour could use a boundary to demarcate the SEZ. This would help in orienting the reader.

      We agree with Reviewer #1 that Figure 4 and its supplements could use some indicator that would orient the reader with respect to the dye filled or stochastically labeled neurons. The images are of the entire SEZ in the ventral brain, and in the case of some panels, the background staining enables visualization of the brain (e.g. Figure 4H,M,N. To help orient the reader in this region, we added a dotted line to indicate the approximate SEZ midline. This also enables the reader to more clearly see which of the BMN types cross the midline.

      Midline visual guides were added for Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      (4) The comparison between the EM and the fills/clones are not obvious. And particularly because they are not directly determined, it would be nice to have the EM reconstruction alongside the dye-fills. This would work very nicely in the supplementary figure with the multiple fills of the same bristles. I think this would really drive home the point.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (5) Are there unnoticed black error-bars floating around in many of the gray-scale images?

      The black bars were masking white scale bars in the images. We have removed the black bars and remade the images without scale bars. This was done for the following Figures: Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) The only point in the paper I found myself going back and forth between methods/supp and text was when authors discuss about the clustering. I think it would help the reader if a few sentences about cosine clustering used for connectivity based clustering were included in the main text. Also, for NBLAST hierarchical clustering, it would help if some informed metrics could be used for defining cluster numbers (e.g. Braun et al, 2010 PLOS ONE shows how Ward linkage cost could be used for hierarchical clustering).

      Depending on where the cut height is placed on the dendrogram for cosine similarity of BMNs, different features of the BMN type postsynaptic connectivity are captured. As the number of clusters is increased (lower cut height), clustering is mainly among BMNs of the same type, showing that these BMNs have the highest connectivity similarity. As the number of clusters is reduced (higher cut height), BMNs innervating neighboring bristles on the head are clustered, revealing three general clusters corresponding to the dorsal, ventral, and posterior head. This reveals somatotopy based clustering among same and neighboring BMN types. The cut height shown in Figure 8 and Figure 8 – figure supplement 2 was chosen because it highlighted both of these features.

      The NBLAST clustering shows similar results to the connectivity based clustering with respect to neighboring and distant BMN types. As the number of clusters increases BMNs of the same type are clustered, and these types can be further subdivided into morphologically distinct subtypes. As the number of clusters is reduced, the clustering captures neighboring BMNs. Thus, neighboring BMN types showed high morphology similarity (and proximity) with each other, and low similarity with distant BMN types.

      Please see our responses to a Reviewer #3 critique below for further description of the clustering results.

      On the same lines it would help if the clustering dendrograms were included in the main figure.

      We thank Reviewer #2 for this comment. We have added the dendrogram to Figure 8A, a change that we feel makes this Figure much easier to understand.

      (2) It could help provide intuition if the authors revealed some of the downstream targets and their implication in explaining the behavioral phenotypes.

      While this will be the subject of at least two forthcoming manuscripts, we have added text to the present manuscript that provides insight into BMN postsynaptic targets. Our previous work (Hampel et al. 2015) described a mechanosensory connected neural circuit that elicits grooming of the antennae. While this previous study demonstrated that the Johnston’s organ mechanosensory neurons are synaptically and functionally connected with this circuit, our preliminary analysis indicates that it is also connected with BM-Ant neurons. We hypothesize that there are additional such circuits that are responsible for eliciting grooming of other head locations.

      To better highlight potential downstream targets in the manuscript, we now mention the antennal circuit in the Introduction. This text reads: In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.

      There is also text in the Discussion that addresses this Reviewer comment. It describes the antennal circuit and mentions the possibility that other similar circuits may exist. This can be found in the third paragraph of the section titled “Circuits that elicit aimed grooming of specific head locations”.

      (3) Authors find that opto activation of BMNs leads to grooming of targeted as well as neighboring areas. Is there any sequence observed here? i.e. first clean targeted area and then clean neighboring area? I wonder if the answer to this is something as simple as common post-synaptic targets which is essentially reducing the resolution of the BMN sensory map. Some more speculation on this interesting result could be helpful.

      We appreciate and agree with this point from Reviewer #2, and have tried to better emphasize the possible implications for grooming that the overlapping projections and connectivity among BMNs innervating neighboring bristles may have. This is now better addressed in the Results and Discussion sections. Below we highlight where this is addressed:

      (1) In the second paragraph of the Results section titled “Activation of subsets of head BMNs elicits aimed grooming of specific locations” we added text that suggests the possibility that grooming of the stimulated and neighboring locations could be due to the overlapping projections and connectivity. This text reads: This suggested that head BMNs elicit aimed grooming of their corresponding bristle locations, but also neighboring locations. This result is consistent with our anatomical and connectomic data indicating that BMNs innervating neighboring bristles show overlapping projections and postsynaptic connectivity similarity (see Discussion).

      (2) In the fourth paragraph of the Discussion section titled “A synaptic resolution somatotopic map of the head BMNs”, we added a sentence to the end of the fourth paragraph that alludes to further discussion of this topic. This sentence reads: This overlap may have implications for aimed grooming behavior. For example, neighboring BMNs could connect with common neural circuits to elicit grooming of overlapping locations (discussed more below).

      (3) In the fourth paragraph of the Discussion section titled “Circuits that elicit aimed grooming of specific head locations” there is a paragraph that mentions the possibility of mechanosensory convergence onto common postsynaptic circuits to promote grooming of the stimulated area, along with neighboring areas. This paragraph is below.

      We find that activation of specific BMN types elicits both aimed grooming of their corresponding bristle locations and neighboring locations. This suggests overlap in the locations that are groomed with the activation of different BMN types. Such overlap provides a means of cleaning the area surrounding the stimulus location. Interestingly, our NBLAST and cosine similarity analysis indicates that neighboring BMNs project into overlapping zones in the SEZ and show common postsynaptic connectivity. Thus, we hypothesize that neighboring BMNs connect with common neural circuits (e.g. antennal grooming circuit) to elicit overlapping aimed grooming of common head locations.

      (4) In the new second paragraph of the Discussion section titled “Parallel circuit architecture underlying the grooming sequence” we further discuss the issue of the BMN “sensory map. This paragraph is below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      (4) If authors were to include a summary table that shows all known attributes about BMN type as columns that could be very useful as a resource to the community. Table columns could include attributes like "bristle name", "nerve tract", "FlyWire IDs of all segments corresponding to the bristle class". "split-Gal4 line or known enhancer" , etc.

      We provided a table that includes much of this information after the manuscript had already gone out for review. We regret that this was not available. This is now provided as Supplementary file 3. This table provides the following information for each reconstructed BMN: BMN name, bristle type, nerve, flywire ID, flywire coordinates, NBLAST cluster (cut height 1), NBLAST cluster (cut height 5), and cosine cluster (cut height 4.5). Note that the driver line enhancers for targeting specific BMN types are shown in Figure 3I.

      Specific Points:

      Figure 4C-V:

      • I find it a bit difficult to distinguish ipsi- from contra-lateral projections. Maybe indicate the midline as a thin, stippled line?

      We thank the Reviewer #2 for this suggestion. We have now added lines in the panels in Figure 4C-V to indicate the approximate location of the midline. We also added lines to the Figure 4 – figure supplements as described above.

      I think this Fig reference is wrong "the red-light stimulus also elicited backward motions with control flies (Figure 6B,C, control, black trace, Video 5)." should be Fig 8B,C

      We have fixed this error.

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      Motivating this study in terms of understanding the neural mechanisms that execute the parallel model seems to overstate what you will achieve with the current study. If you want to motivate it this way, I suggest focusing on the grooming sequence of the head along (eyes, antennae, proboscis).

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that many of the revisions focus on the head grooming sequence. We also made minor revisions to the Introduction that further emphasize the focus on head grooming.

      Results:

      Figure 1. Please indicate that this is a male fly in either the figure title or in the figure itself.

      We added a male symbol to Figure 1A.

      Figure 3. Panel J is referenced in the main body text and in the figure caption, but there is no Fig 3J.

      Panel J is shown in the upper right corner of Figure 3. We realize that the placement of this panel is not ideal, but this was the only place that we could fit it. Additionally, the panel works nicely at that location to better enable comparison with panel C. We have revised the text in the Figure 3 legend to better highlight the location of this Figure panel: “Shown in the upper right corner of the figure are the aligned expression patterns of InOmBMN-LexA (red), dBMN-spGAL4 (green), and TasteBMN-spGAL4 (brown).”

      We also added text to a sentence in the results section entitled “Head BMNs project into discrete zones in the ventral brain” that indicates the panel location. This text reads: To further visualize the spatial relationships between these projections, we computationally aligned the expression patterns of the different driver lines into the same brain space (Figure 3J, upper right corner).

      Matching the BMNs to EM reconstructions: why cut the dendrogram at H=5? Would be better to determine cluster number using an unbiased method.

      To match the morphologically distinct EM reconstructed BMNs to their specific bristles, we relied on different lines of evidence, including NBLAST results (discussed more below), dye fill/stochastic labeling/driver line labeling matches, published morphology, nerve projection, bristle number, proximity to other BMNs, and postsynaptic connectivity (summarized in Figure 6 – figure supplement 3). The following Materials and methods section provides a detailed description of the evidence used to assign each BMN type in “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. In many cases, BMN type could be assigned with confidence solely based on morphological comparisons with our light level data (e.g. dye fills), in conjunction with bristle counts to indicate an expected number of BMNs showing similar morphology. Thus, the LM/EM matches and NBLAST clustering were largely complementary.

      The EM reconstructed BMNs were matched as particular BMN types, in part based on examination of the NBLAST data at different cut heights. NBLAST clustering of the BMNs revealed general trends at higher and lower cut heights (Figure 6 – figure supplement 1A, Supplementary file 3). The lowest cut heights included mostly BMNs of the same type innervating the same bristle populations, and smaller clusters that subdivided into morphologically distinct subtypes (see Supplementary file 3 for clusters produced at cut height 1). This revealed that BMNs of the same type tended to show the highest morphological similarity with each other, but they also showed intratype morphological diversity. Higher cut heights produced clusters of BMNs innervating neighboring bristles populations (e.g. ventral head BMNs), showing high morphological similarity among neighboring BMN types.

      We selected the cut height 5 shown in Figure 6 – figure supplement 1A,B because it captures examples of both same and neighboring type clustering. For example, it captures a cluster of mostly BM-Taste neurons (Cluster 16), and neighboring BMN types, including those from the dorsal head (Cluster 14) or ventral head (Cluster 15).

      Based on reviewer comments, we realized that the way we wrote the BMN matching section in the Results indicated more reliance on the NBLAST clustering than what was actually necessary, distorting the way we actually matched the BMNs. Therefore, we softend the first couple of sentences to place less emphasis on the importance of the NBLAST. We also indicated that the readers can find the resulting clusters at different cut heights, referring to Figure 6 – figure supplement 1A and Supplementary file 3. The first two sentences of the first paragraph in the Results section titled “Matching the reconstructed head BMNs with their bristles” now read:

      The reconstructed BMN projections were next matched with their specific bristle populations. The projections were clustered based on morphological similarity using the NBLAST algorithm (example clustering at cut height 5 shown in Figure 6 – figure supplement 1A,B, Supplementary file 3, FlyWire.ai link 2) (Costa et al., 2016). Clusters could be assigned as BMN types based on their similarity to light microscopy images of BMNs known to innervate specific bristles.

      The number of reconstructed BMNs is remarkably similar to what is expected based on bristle counts for each group except for lnOm. Why do you think there is such a large discrepancy there?

      We believe that there is a discrepancy between the number of reconstructed BM-InOm neurons and the number expected based on InOm bristle counts because these bristle counts were based on few flies and these numbers appear to be variable. We did not further investigate the numbers of InOm bristles in this manuscript because we only needed an estimate of their numbers, given that there is over an order of magnitude difference in the eye bristles versus any other head bristle population. Therefore, we could relatively easily conclude that the head BMNs were related to the InOm bristles, based on their sheer numbers and their morphology.

      Figure 6 - figure supplement 2N, please describe these panels better. Main text says the upper image is from lnOmBMN-LexA, but the figure legend doesn't agree.

      We have added text to the figure legend that now makes the contents of panel 2N clear to the reader. Further, we now indicate in the figure legend for each panel, the method used to obtain the labeled neurons (i.e. fill, MCFO, driver), to avoid similar confusion for the other panels.

      Figure 6 - figure supplement 4D. How frequently is there a mismatch between the number of BMNs for a given type across hemispheres?

      Although the full reconstruction of the BMNs on both sides of the brain was beyond the scope of this work, the BMNs on both sides have since been reconstructed and annotated (Schlegal et al. 2023). We plan to provide more analysis of BMNs on both sides of the brain in a forthcoming manuscript. However, the BMN numbers tend to show agreement on both sides of the brain. The table below shows a comparison between the two sides:

      Author response table 1.

      Figures 6 and 7. It would be helpful to include a reference brain in all panels that show cluster morphology. Without landmarks there is nothing to anchor the eye to allow the reader to see the described differences in BMN projection zones and patterns.

      While we apologize for not making this specific change, we have made revisions to other parts of the manuscript to better highlight the somatotopic organization among the BMNs (revisions described above). Please note that we now provide FlyWire.ai publicly available links that enable readers to view the BMN projections in 3D. They can also toggle a brain mesh on and off to provide spatial reference.

      "BMN somatotopic map": It would be helpful to show or describe in more detail what the unique branch morphology for each zone is. It is quite difficult to appreciate, as the groups also have a lot of overlap. Would the unique regions that the BMN groups innervate be easier to see if you plotted presynaptic sites by group? I am left unsure about whether there is a somatotopic map here.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that we did not examine the fine branch morphological differences between BMN types having overlapping projections. Showing these differences would require more extensive anatomical analysis that is beyond the scope of this work. For showing definitive somatotopy, we focused on the overt differences between BMNs innervating bristles at distant locations on the head.

      Overall the strict adherence to the parallel model impacts the interpretation of the data. It would be helpful for the authors to discuss which aspects of the current study are consistent with the parallel model and which results are not consistent.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      Discussion:

      "Circuits that elicit aimed grooming of specific head locations": In the previous paragraph you mention "BMN types innervating neighboring bristle populations have overlapping projections into zones that correspond roughly to the dorsal, ventral, and posterior head. The overlap is likely functionally significant, as cosine similarity analysis revealed that neighboring head BMN types have common postsynaptic partners. However, overlap between neighboring BMN types is only partial, as they show differing projections and postsynaptic connectivity." Then in this paragraph, you say, "How do the parallel-projecting head BMNs interface with postsynaptic neural circuits to elicit aimed grooming of specific head locations? Different evidence supports the hypothesis that the BMNs connect with parallel circuits that each elicit a different aimed grooming movement (Seeds et al., 2014)." The overlapping postsynaptic BMN connectivity seems in conflict with the claim that the circuits are parallel.

      We apologize for this confusion. We now better describe this apparent discrepancy between our results and the parallel model of grooming behavior. We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      We have made additional changes to the manuscript:

      (1) We added Supplementary file 2 that includes links for downloading the image stacks used to generate panels in Figure 1, Figure 2, Figure 3, Figure 4, and figure supplements for these figures. These image stacks are stored in the Brain Image Library (BIL). Rows in the spreadsheet correspond to each image stack. Columns provide information about each stack including: figure panels that each image stack contributed to, image stack title, DOI for each stack (link provides metadata for each stack and file download link), image stack file name, genotype of imaged fly, and information about image stack. References to this file have been made at different locations throughout the text and Figure legends. We also added a section on the BIL data in the Materials and methods entitled “Light microscopy image stack storage and availability”. Old Supplementary file 2 has been renamed Supplementary file 3.

      (2) We added a new reference for FlyWire.ai (Dorkenwald et al. 2023) that was posted as a preprint during the revision of this manuscript.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This is a strong manuscript about the existence of proteins coded by intracellular parasites (here Coxiella) that have evolved to parasitise the lipid transport machinery of their hosts. This is a first in that the parasite protein acts at a distance from the parasite itself, manipulating two of the host organelles - and not acting at their site of contact with PVs. There is considerable research into one protein and its effect when expressed by itself.

      Despite all the advances there are a couple of areas where the manuscript can be improved, and a few extra fairly straightforward experiments added about the amphipathic helix. Even though these are unlikely change the overall message, they would make the story more complete.

      Major points

      More details are required about the amphipathic helix. Check that the AH does target LDs by expression of the AH alone in a GFP chimera +/- oleate and then mutagenesis. Also show the AH in a helical wheel projection (eg by Heliquest) and say if it aligns with similar AHs in homologs (see my point below)

      Fig 1B: In infected cells, do the affected LDs tend to be close to the PVs?

      Also in Fig 1B: highlight small KDEL+ve ER rings around LDs here. Study whether LDs have these in infected cells without the confounder (?artefact) of EPF1 over-expression

      Fig 2A the ER looks quite different here from Fig 1B, even at t0. Grossly the strands are spaced wider apart. In detail there are no rings around LDs. Can the authors explain this? Which morphology is common, especially in cells early in infection without co-expressed protein?

      Fig 6 & Line 237: "As the N-terminal region of CbEPF1 is undefined": I suggest that the authors could do more here. At minimum change model to highlight the strong probability that the N term is a globular domain that functions at the LD ER interface. (What are three other unidentified LD proteins? I suggest omitting them).

      Although the Alphafold prediction for EPF1 is low confidence only, in a few minutes of BLAST searching I found the homolog A0A1J8NR10_9COXI (also FFAT+ve) which has a moderately confident structural prediction for its N terminus. This model has a quite large internal hydrophobic cavity, indicating lipid transfer capability and function similar to known LTPs. This means that action as a "tether" possibly results from experiments with viral promoters (see minor point on terminology).

      Minor:

      Fig 2B: add more arrowheads/arrows to fit legend (says they are both multiple)

      FFAT selectivity for MOSPD2: say if this fits the di Mattia or (as appears likely) it extends the known differences between VAPA/B and MOSPD2. Also say if VAPA is expected to behave as VAPB

      Explain how "Mutations in the CbEPF1 FFAT motif(s) did not influence CbEPF1-GFP localization to either the host ER (Supplementary Fig. 1)". In F3mt this shows that EPF1 has a way to target ER other than FFAT/VAP. Discuss if that is via AH insertion in ER.

      Also, the (admittedly low) level of ER targeting is possibly slightly reduced by F3mt, as shown by greater GFP in the nucleus in the single cells shown. If this is a feature of the whole field of cells, it implies that the FFATs normally work with the AHs to target EPF1 to the ER.

      "clustering" LDs w F3mt: could this indicate dimer formation by CbEPF1? Note: to me it appears wrong to describe fig 4A as showing ER exclusion. LD proximities to each other dominate. It's not 100% clear that LDs cluster as their proximities are not universal: "LD-LD interactions" may be (very) weak.

      Fig 5: can levels of EPF1 here be compared to those in cells undergoing natural infection(approximate comparison by qPCR better than nothing if no antibodies are available)? Fig 5a: would it be possible to increase the number of cells counted to attempt to make the reduced number of LD in F3mt significant?

      Minor

      Line 226: no sequence homology: misses the point- there is the common feature of an AH

      Issue to be discussed, as probably too difficult to experiment on: when EPF1 is on the ER does it engage vap only weakly (implying a means to mask its motifs), since if the interaction is strong vap is then unable to bind other partners?

      Line 245: "MOSPD2, a sole VAP that is known to localize on LD surfaces" (worth citing Zouiouich again here). Do the cells/tissues infected by Coxiella express MOSPD2?

      Line 259/260: this "suggestion" about cholesterol should be toned down. It is a speculation that could be tested in future, but the data here do not suggest it.

      'Tether' this word implies more than just bridging but also a role in the physical formation of the contact. Since EPF1 most likely has an LTP domain, it seems linguistically confusing to refer to it as a tether, especially since the experiments that physically later LD-ER contact involve probable over-expression.

      Discuss whether it is 100% certain that EPF1 is in the host cytosol or whether some experiment(s) at a future date (proteomic/western blotting) will be needed to make that conclusion 100% secure.

      Referees cross-commenting

      COMMENT 1

      I realise that both reviewer 1 and reviewer 3 have considered this MS carefully, but I think that their reviews could be improved in some respects. I will add two comments, one for each of the other reviews.

      Reviewer 1. The review poses multiple questions to the authors suggesting that answering these questions experimentally would strengthen the paper. Some of the points seem to misunderstand what is the accepted standard for membrane cell biological research into membrane contact sites. While it might be that the authors can rebut these points, I think it is preferable to use Cross Commenting as an opportunity to address these issues beforehand.

      Major Comment 1: CbEPF1 and ER-LD contact

      Looking at endogenous proteins: I wondered about the same point, but I concluded that this is not likely to be possible in the scope of this submission. If it were possible then I guess the authors would have attempted it. Looking on Google Scholar I could find no example of an endogenous Coxiella proteins being tagged in the bacterial genome. So the only way to find the portion is via an antibody. Assuming the authors do not have one, I do not think we should ask for one at this stage in the publication process.

      Electron microscopy: the reviewer is incorrect to say that this is necessary. It may be the gold standard, but it is a huge amount of extra work. Furthermore it is not at all necessary when the protein in question localises clearly to the interface between organelles identified by confocal microscopy.

      Can a specific CbEPF1 domain be identified? Here a Amphipathic Helix has been identified, but the lack of dissection of that region by the authors explains this question by the reviewer, which is also shared by Reviewer 3. I agree with the implication that more should be done to dissect that.

      Major Comment 2: CbEPF1 FFAT motifs and VAP binding

      Are the two FFAT motifs redundant or synergistic? I would say that the authors have addressed that to a reasonable extent

      CbEPF1 binding specificity towards a VAP/MOSPD2 Ditto

      Major Comment 3: LD clustering

      Since this is an effect of mutated protein only, I think that the 3 questions posed at the end here need only be addressed in Discussion.

      Major Comment 4: CbEPF1-mediated increase in LD number and size

      less LD upon expression of F1mt or F2mt, compared to WT: this seems wrong. The numbers are the same. The comment about IF images are unjustified as they have been quantified and do show a difference. I agree that the biological relevance is unclear, and that this might be addressed. That would require making a mutant Coxiella strain. While that would make a big different to this work, my feeling is that this is well over a year's work.I would be guided by the authors on that and I would not suggest it as required for this MS.

      De novo LD production at the ER is unlikely: This statement is ill-considered as the FFAT motifs ARE required (Fig 5). Furthermore, in all systems ever reported de novo LD production takes place at the ER, so any alternative would be quite extraordinary.

      Altogether, strengthening this aspect of the study: In my view, this area does not need more work and it would not be constructive to ask for more.

      Major Comment 5: Functional relevance

      assessing the phenotype of a Coxiella CbEPF1 mutant I agree that this would be good, but it mightn't be feasible within the confines of this one paper. In the various projects that have made transposon mutants of Coxiella, has a strain been made that affects EPF1? If not, then the authors should state this and discuss it as work for the future. The reviewers cannot expect any experiments!

      Is VAP required for Coxiella intracellular growth/vacuole maturation? On the surface this suggestion seems to offer an experimental route to understanding EPF1. However, VAP binds to >100 cellular proteins, many relating to lipids traffic and a considerable number of these already lcoalised to lipids droplets (ORP2, MIGA2, VPS13A/C). It is therefore unlikely that such an experiment would be interpretable, and I recommend that this request be reconsidered.

      Are LD formation induced upon infection? Are ER-LD contact increased upon infection? These are very reasonable ideas and the results would be interesting additions to this paper.

      COMMENT 2 I have given one set of comments already. Here are my comments for Reviewer 3.

      The review makes a few assumptions that I question. While it might be that the authors can rebut these assumptions, I think it is preferable to use Cross Commenting as an opportunity to address these issues beforehand.

      Major Point 1: What is surprising is that the BFP-KDEL signal is also localizing to the LD surface: "Surprising" is misguided, as it seems to deny the probability that there is a class of proteins that sit at organelle interfaces binding to both partners simultaneously. Maybe the reviewer means "significant" here, in which case I would agree.

      The authors must perform LD isolation the reviewer is incorrect to say that this must be done. It is a huge amount of extra work. Furthermore it is not at all necessary when the protein in question localises clearly to the structures, and its may not even work as the protein may need a reasonably high general concentration to avoid gradual dissociation (wit any re-association) during organelle purification.

      what features of the protein enable its LD binding? Here an Amphipathic Helix has been identified, but the lack of dissection of that region by the authors explains this question by the reviewer, which is also shared by Reviewer 1. I agree with the implication that more should be done to dissect that.

      Major Point 2: Quantitative image analysis:

      Mander's Colocalization analyses with Costes correction are required No. The images in Figure 4 speak for themselves.

      Please show the LD phenotype of untransfected, and CbEPF1-GFP transfected cells also This s a good idea.

      provide a means to quantify the clustering of LDs Unnecessary. Not all findings need to be quantified.

      Major Point 3:

      Data depends largely on overexpression of the protein in uninfected cells. I agree

      What is the localization of the protein in infected cells? I wondered about the same point, but I concluded that this is not likely to be possible in the scope of this submission. If it were possible then I guess the authors would have attempted it. Looking on Google Scholar I could find no example of an endogenous Coxiella proteins being tagged in the bacterial genome. So the only way to find the portion is via an antibody. Assuming the authors do not have one, I do not think we should ask for one at this stage in the publication process.

      What happens to ER-LD contacts upon infection with C. burnetii? This is a very valid question, and answering it would not only strengthen the manuscript but should be achievable in 1-3 months.

      Significance

      This work takes a reasonably big step towards uncovering how parasites have mimicked the molecular machinery of contact sites, here in the context of ER-LD interactions and tantalizingly suggestive of lipid transfer at that contact site (although hard to get strong evidence for that at this stage). This provides yet more evidence for the conservation and overall importance to cells of lipid transfer at contact sites, as well as reminding us of the ability of parasites to attack every aspect of cell function.

    1. Reviewer #1 (Public Review):

      Summary:

      The motivating questions are an accurate reflection of the current state of knowledge surrounding striatal pathway function. The comparisons of pathway function across striatal subregion, activation & inhibition, and task context are laudable and extremely important for advancing the subfield. Had these manipulations, to the largest extent possible been performed in single animals (e.g. activate dSPNs of DMS or DLS in the same mouse across the 3 tasks), this would have significantly strengthened the impact and conclusions that could be drawn by making this set of studies even more so internally consistent and directly comparable. While this is no longer possible, a conceptually related and fantastic contribution to the subfield (and likely beyond in terms of Opto manipulations of brain areas) would be to directly demonstrate that within their studies their DMS pathway manipulations do not impact nearby DLS activity (and vice versa). This is a significant and non-essential request. More feasibly and reasonably, it would be fantastic and strengthen the conclusions here to more fully detail their opsin expression patterns in DMS vs DLS groups and perhaps attempt to relate individual opsin profiles and fiberoptic targeting with behavioral outcomes across tests.

      Strengths:

      A comprehensive and paired comparison of inhibition and activation of striatal pathways across subregions and tasks is a very important and meaningful step towards reconciling contradictory results on striatal pathway function that are observed across labs (who typically focus on one subregion, one task setting, and often do not directly report comparisons of activation and inhibition).

      Weaknesses:

      Figure 1A - the example DMS vs DLS opsin expression and fiber targeting are not terribly convincing that the manipulations will be specific to each subregion (the example in Figure 2A is a little better but I have a similar concern still). The specificity of these manipulations is key to interpretation and conclusions and I strongly feel they should be strengthened here. The best evidence would be direct neural recordings (light in DMS, no effect in DLS, and vice versa), but this is a tall ask and not expected. The next best option, which is readily feasible, is to show not only fiberoptic targeting summaries (as in Figure 1A, Figure 2A) but also a summary of opsin spread for all animals (especially given the two examples appear to have significant spread across DMS and DLS). It would be of great benefit to the field to have these in the Allen Common Coordinate Framework. It would also be fine and useful to utilize the authors' current classical histological atlas alignment methods (e.g. Paxinos pdf). These histological summary figures would also benefit from being larger and more visible (perhaps as separate supplemental figures associated with the main figures).

      Related to the above, it is a concern that the classic view is supported or not because of individual variations in virus/fiber targeting to striatal subregions which likely have greater granularity than the traditional dorsal medial vs lateral (e.g. Hunnicutt et al 2016, Foster et al 2021, Hintiryan et al 2016). Although there may not be enough animals or variation in targeting in the present study to find meaningful relationships, it would strengthen the paper and be a great benefit to the field to know whether for key findings if the strength of behavioral effects correlated with anterior/posterior or medial/lateral or dorsal/ventral fiberoptic coordinates (or the volume of opsin expression profiles).

      Conceptually, a clear new idea or integrative interpretation of prior work (nor even the large body of results within this work) comes to the fore, save for the already appreciated fact that the classic view of opposing pathways is sometimes supported and sometimes not. Two tangible suggestions that I believe would facilitate the influence of this study - (1) can the authors more thoughtfully bridge the logical steps in their results sections and the prior context around them (some topic sentences jump right into results, e.g. line 195: "The inhibition experiment showed), and (2) in discussion, rather than emphasizing when/where the classic view is supported and not, more content on precisely why would be helpful. Some questions more specifically, if DMS/DLS pathway activation/inhibition is *mostly* oppositely appetitive/aversive, what does that mean in the context of spontaneous or reward-guided locomotion? Self-initiated pathway activation/inhibition is in part learned (with very intriguing differences across pathways in the expression across learning) - how should we think about striatal pathway function with regards to learning, spontaneous/innate behaviors, vs over-trained behaviors? When the classic view fails in the dorsal striatum - why? And is a complimentary "model" an actual alternative concept, a distinct mode of circuit function, or just a negative result on the classic view?

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their valuable comments, which definitely make our story stronger.

      2. Description of the planned revisions

      Reviewer 1

      Comments:

      No data are shown from the genome-wide screening approach, including the common regulators of KRAS and HRAS. Information about how imaging data were processed and analysed is missing. A final table of 8 selected factors with phosphatase activity is presented without providing further insight about the selection criteria and other factors.

      This information will be included in the revised manuscript. In the subsequent characterization via image-based quantification of GFP-KRAS membrane localization, a Manders´ coefficient was calculated. A respective chapter in the methods section on how this was done is missing.

      This information will be provided in the revised manuscript. I would be happy to see the following analyses to strengthen the dataset:

      • Reconstitution experiments and further validation to show that it is dependent on the enzymatic activity of MTMRs.

      MTMR3 knockdown (KD) cells will be rescued with wildtype (WT) MTMR3 or the phosphatase mutant MTMR3 (C413S, PMID: 11676921). MTMR4 KD cells will be rescued with WT MTMR4 or the phosphatase mutant MTMR4 (C407S, PMID: 20736309). In these cells, the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. - Additive effect upon depletion of multiple MTMRs? Are they functionally co-operative?

      MTMR3 and 4 KD cells will be rescued with WT MTMR4 and 3, respectively, and the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. - Signalling analysis is very limited (Fig. 5). Do the authors detect any defects in K-RAS driven downstream signaling in these cells upon depletion of MTMRs.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Reviewer 2

      Major comments

      The unbiased siRNA screen used to identify proteins that impact KRAS membrane localization was a very nice approach to identify MTMR proteins. Although there is a clear phenotype of KRAS mislocalization associated with knockdown of the various MTMR proteins, the data provided does not prove a causational role for the MTMR proteins in maintaining PtdSer content, nor KRAS localization, at the PM. The current data does not provide a mechanism by which MTMR proteins are influencing this process, but rather speculates using existing literature that it is the loss in MTMR 3-phosphotase activity that leads to decreased PtdSer in the membrane. There is a series of conversions and exchanges that act upon PI3P (the substrate of MTMR proteins) and PI to generate PtdSer in the PM; thus, it is a dynamic process that is influenced by a variety of different proteins and transporters [3, 4, 5, 6]. To prove their single-protein-driven hypothesis, the authors should clone and express a mutant MTMR protein construct that contains an inactive phosphatase catalytic domain, to prove that it is indeed MTMR's generation of PI (which is further converted into PI4P) in the membrane that is responsible for maintaining PtdSer content and KRAS localization. Without this, there is not enough evidence to support this claim.

      MTMR3 knockdown (KD) cells will be rescued with wildtype (WT) MTMR3 or the phosphatase mutant MTMR3 (C413S, PMID: 11676921). MTMR4 KD cells will be rescued with WT MTMR4 or the phosphatase mutant MTMR4 (C407S, PMID: 20736309). In these cells, the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. In addition, the authors speculate that ORP5 is a critical intermediate in this process, and that the loss in PI4P/ORP5 at the PM following MTMR knockdown is responsible for the decrease in PtdSer at the PM. The authors should knockdown ORP5 in MTMR-wildtype cells, since it is downstream of their proposed mechanism, and see whether this leads to comparable reductions in PtdSer levels and KRAS mislocalization at the PM. This would confirm ORP5 as having a major role in this setting and would support the initial mechanistic hypothesis. These experiments are imperative to forming an appropriate conclusion, especially since some of their current data contradicts their mechanistic hypothesis: the authors identify a decrease in whole cell PtdSer content, not just PM PtdSer content, when MTMR proteins are knocked down. Based on this result, one would predict that a secondary or supporting mechanism must exist that contributes to a reduction in whole cell PtdSer content, which likely contributes to its loss at the PM as well. The authors describe in line 360 how "previous work has shown that PM PI4P depletion indirectly blocks PtdSer synthase 1 and 2 activities," to explain this reduction in total cell levels of PtdSer. The authors should look at PtdSer synthase 1 and 2 activities in the presence of MTMR knockdown, as the loss in PtdSer at the PM may rely more heavily on synthase activity than ORP-dependent transfer of PtdSer.

      Investigating the PM localization of KRAS and PtdSer after silencing ORP5 in MTMR WT mammalian cell lines has been published (PMID: 31451509 and 34903667). In these studies, silencing ORP5 1) reduces the levels of PtdSer and KRAS from the plasma membrane (PM), 2) reduces KRAS signal output, 3) blocks the growth of KRAS-dependent PDAC in vitro and in vivo. These studies have been appropriately cited in our manuscript in lines 82 and 277. Although the c. Elegans model that was used to investigate downstream let-60 (RAS ortholog) activity through a multi-vulva phenotype is quite intriguing, it is more critical to assess downstream RAS pathway activation, especially in the human colorectal adenocarcinoma or the human mammary gland ductal carcinoma cell lines. Not only would this line of questioning provide a higher significance and increase the clinical applicability of these findings, but it is also crucial to support the author's claim that MTMR knockdown can influence mutant KRAS activity. Although small changes in KRAS localization to the PM can have significant effects on downstream signaling, these effects need to be measured and confirmed in this setting. The authors should perform western blots to assess the activation of both the PI3K and MAPK pathway in the MTMR knockdown cell lines.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. In addition to this, it might be important to know whether there are any changes in the levels of the KRAS protein itself, as recycling/transport pathways may be impacted by its lack of recruitment to the plasma membrane.

      Total KRAS protein expression will be measured in MTMR KD cell lines. Finally, the authors show that proliferation is inhibited by MTMR knockdown as a readout of RAS activity. The authors should also assess the levels of cell death, as the inhibition of mutant KRAS in cancer cells would likely lead to cell death. The authors do not describe why reducing any one of the MTMR proteins alone is sufficient to deplete the PM of PtdSer. This sort of discussion is important for understanding compensatory or regulatory mechanisms in place between the MTMR proteins, as this may influence PtdSer levels at the PM. For example, it has been shown that MTMR2 can stabilize MTMR13 on membranes. Do the levels, stability, or localization of the other MTMR proteins change when one specific MTMR is knocked down? Is this why we see an effect on PtdSer in any one of the knockdowns? The authors should at the very least provide western blots for each of the MTMR proteins discussed in the presence of each individual MTMR knockdown.

      MTMR3 knockdown (KD) cells will be rescued with WT MTMR3 or the phosphatase mutant MTMR3 (C413S, PMID: 11676921). MTMR4 KD cells will be rescued with WT MTMR4 or the phosphatase mutant MTMR4 (C407S, PMID: 20736309). In these cells, the PM localization of KRAS and PtdSer will be examined by confocal and electron microscopy. In addition, we will measure endogenous MTMR 2/3/4/7 proteins levels in the presence of each individual MTMR KD by immunoblotting. In addition to the above experiments, the MTMR hairpins should be expressed in a secondary or tertiary cell line to prove that these events are not specific to the current model used. Since their current human mammary gland ductal carcinoma cell line overexpresses a mutant KRAS-GFP construct, perhaps doing similar experiments in a cancer cell line that already expresses an endogenous mutant KRAS might provide a better model.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Although this protein would not include a GFP-tag, other ways of visualizing its localization at the PM (such as immunofluorescent staining) could be used to confirm its localization there.

      The anti-KRAS antibody for IF has not been reported to my knowledge. In addition, the effects on downstream RAS signaling could be measured through western blot of PI3K and MAPK pathways.

      Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Supplemental Figure 4 is incorrectly referred to in the text as Supplemental Figure 3 (line 257-258). The text reads, "Confocal microscopy further demonstrates that HRASG12V cellular localization is not disrupted after silencing MTMR 2/3/4/7 (Fig. S3)" but Figure S3 is an EM image of PM basal sheets from T47D cells expressing GFP-KRASG12V. Supplemental Figure 4 shows that mutant HRAS is unaffected by the various MTMR knockdowns.

      They will be labeled correctly in the revised manuscript. Since the authors show decreased proliferation in mutant KRAS cells following MTMR knockdown, the authors should also investigate any changes to proliferation rates in mutant HRAS cell lines following MTMR knockdown. This data is necessary to prove that MTMR-driven changes in downstream RAS signaling are specific to mutant KRAS and not mutant HRAS.

      Cell proliferation assay will be performed using MTMR 2/3/4/7-silenced T47D cell lines stably expressing oncogenic mutant HRAS (HRASG12V) to address this questions. It may also be important for the authors to also show any effects on wildtype RAS localization to the PM when MTMR-2,-3,-4, and -7 are knocked down, to show whether this is a oncoprotein-specific event.

      Cells expressing the truncated mutant KRAS, which contains the minimal membrane anchor and does not have G-domain will be infected with lentivirus expressing shRNA against MTMR 2/3/4/7, and their localization will be examined. The representative images chosen for Figure 4 diminish the reliability of the data, as it is difficult to see a visible change in the PI3P probe between the control and MTMR knockdown cells in these images. Since the authors rely on the Mander's coefficient and the number of gold particles throughout much of the paper, having the same conclusion quantitatively but not qualitatively for these assays is confusing. Perhaps the authors should elaborate on whether MTMR knockdown has a stronger effect on PtSer and KRAS PM presence than PI3P PM presence.

      We will include the discussion in the revised manuscript. They should also describe their method for identifying early endosomes, since they switch back and forth between describing the content of the PM and of early endosomes, such as in Figure 1 and Figure 4.

      We will include the information in the revised manuscript. Minor comments:

      An additional experiment that may add another layer of clinical applicability would be the use of an MTMR inhibitor in this cell line, to see whether similar effects can be achieved pharmacologically [7]. This would provoke other researchers to investigate MTMR inhibitors in vitro and in vivo to assess the effect on mutant KRAS cancers.

      • This is an important point, but while vanadate, a general phospho-tyrosine phosphatase (PTP) inhibitor, has been reported to inhibit myotubulin, a family member of MTMR (PMID: 8995372 and 1943774), there are no commercially available MTMR-specific inhibitors. Using vanadate to inhibit MTMR proteins will produce non-specific effects by blocking other PTPs. The inclusion of cell lines that express KRAS proteins of different mutational statuses would be extremely interesting, as KRAS' orientation within the plasma membrane has been shown to be altered by these mutations. This fact should potentially be considered when choosing a secondary or tertiary cell line to do additional experiments in, but it is not necessary for the authors to elaborate on how MTMR proteins may impact different KRAS mutants for the scope of this project.

      For the aforementioned experiments using human KRAS-dependent and -independent PDAC cell lines, we will use MiaPaCa2 (KRASG12C) and AsPC1 (KRASG12D). Reviewer #3

      *Major comments: *

      One of the two main manuscript claims indicates that KRAS12V "function" is impaired upon MTMR knockdown. While this is an obvious phenotype expected by mislocalizing KRAS from the inner PM it is not sufficiently demonstrated in the current version of the manuscript. Western blots of at least MAPK and PI3K signalling following MTMR knockdown in KRAS-dependent cell lines should be included. In addition to the T47D cells used in the manuscript, it would be ideal to include a KRAS-mutant cell line from tumour types where KRAS mutations are more frequent that in breast.

      • Human pancreatic ductal adenocarcinoma (PDAC) cell lines that harbor oncogenic mutant KRAS and their growth is KRAS signaling-dependent (MiaPaCa2 and AsPC1), and PDAC cell line harboring WT KRAS and their growth is KRAS signaling-independent (BxPC3) will be infected with lentivirus expressing shRNAs against MTMR 2, 3, 4 or 7. Their growth (proliferation assay) and KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) will be measure. Since the MTMR dependent phenotypes are mutant-KRAS specific it would be interesting to study the resulting phenotypes in HRAS-mutant cell line.

      Cell proliferation assay will be performed using MTMR 2/3/4/7-silenced T47D cell lines stably expressing oncogenic mutant HRAS (HRASG12V) to address these questions.

      **Referee cross-commenting**

      After reading the reviews of my colleagues I think there is a clear agreement on the need to further substantiate that KRAS membrane mis-localization is indeed affecting oncogenic output. The use of other KRAS addicted and non-addicted models would further enhance this analysis.

      Likewise, the other two reviewers request experimental evidences to validate the role of MTMR enzymatic activity in the process. This is a pertinent request that I failed to put forward. Suggestions include the use of reconstitution experiments catalytically dead mutants. Also, the use of MTMR small molecule inhibitors is proposed. If those exist with sufficient specificity this would indeed be appropriate to perform.

      Experiments addressing these comments have been described above.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      N/A

      • *

      4. Description of analyses that authors prefer not to carry out

      *Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. *

      Reviewer 2

      R2 suggests to investigate PtdSer synthase 1 and 2 activities in presence of MTMR knockdown, as the loss in PtdSer at the PM may rely more heavily on synthase activity than ORP-dependent transfer of PtdSer.

      Although it is intriguing to examine the effect of MTMR loss on the activities of PtdSer synthase 1 and 2, our lab does not have resources/techniques to carry out the experiment. * *

      The results of this paper rely heavily on one experimental technique, which is calculating a Mander's coefficient and counting the co-localization of the probe of interest with the CellMask stain of the plasma membrane. How this coefficient is derived is explained in appropriate detail in the methods section of this manuscript; however, a secondary route of identifying these changes in membrane constituents would greatly enhance the paper's conclusions. This would eliminate any doubt surrounding the accuracy of the technique, since so much of the data relies on one experimental output.

      In addition to Manders' coefficient for examining the colocalization of KRAS and LactC2 (the PtdSer probe) to propose KRAS/PS redistribution to endomembranes after MTMR loss. To complement this, we also performed quantitative EM to demonstrate the PM depletion of KRAS and PtdSer from the inner PM leaflet. We believe these two techniques would appropriate to investigate KRAS/PtdSer PM depletion and cellular re-distribution. * *

      Reviewer 3

      To further support the conclusions, oncogenic signalling should be studied in the C.elegans model by immunofluorescence of immunohistochemistry. Furthermore, although not strictly required to support the author's claims, it would be interesting to elucidate whether the inhibition of the multivulva phenotype upon MTMR knockdown in vivo results as a consequence of cell death.

      Our collaborator for C. elegans study does not have resources to carry out the proposed IF and IHC experiment. Instead, we will measure KRAS signaling (e.g. phosphorylated ERK and Akt by immunoblot) and the growth of KRAS-dependent PDAC after MTMR loss. These experiments would be more clinically and physiologically relevant.

    1. “detailed specifications”

      I think it's important to note that task analysis is ONLY used when learners require "detailed specification." Task analysis is important IF we assume that the ID's job is to tell the learner exactly what needs to be done and how. Task analysis is essential when using a behavioristic learning model. However, it may not be as applicable to more constructivist learning models.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies the mitotic localization mechanism for Aurora B and INCENP (parts of the chromosomal passenger complex, CPC) in Trypanosoma brucei. The mechanism is different from that in the more commonly studied opisthokonts and there is solid support from RNAi and imaging experiments, targeted mutations, immunoprecipitations with crosslinking/mass spec, and AlphaFold interaction predictions. The results could be strengthened by biochemically testing proposed direct interactions and demonstrating that the targeting protein KIN-A is a motor. The findings will be of interest to parasitology researchers as well as cell biologists working on mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the editor and the reviewers for their thorough and positive assessment of our work and the constructive feedback to further improve our manuscript. Please find below our responses to the reviewers’ comments. Please note that the conserved glycine residue in the Switch II helix in KIN-A was mistakenly labelled as G209 in the original manuscript. We now corrected it to G210 in the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The CPC plays multiple essential roles in mitosis such as kinetochore-microtubule attachment regulation, kinetochore assembly, spindle assembly checkpoint activation, anaphase spindle stabilization, cytokinesis, and nuclear envelope formation, as it dynamically changes its mitotic localization: it is enriched at inner centromeres from prophase to metaphase but it is relocalized at the spindle midzone in anaphase. The business end of the CPC is Aurora B and its allosteric activation module IN-box, which is located at the C-terminal part of INCENP. In most well-studied eukaryotic species, Aurora B activity is locally controlled by the localization module of the CPC, Survivin, Borealin, and the N-terminal portion of INCENP. Survivin and Borealin, which bind the N terminus of INCENP, recognize histone residues that are specifically phosphorylated in mitosis, while anaphase spindle midzone localization is supported by the direct microtubule-binding capacity of the SAH (single alpha helix) domain of INCENP and other microtubule-binding proteins that specifically interact with INCENP during anaphase, which are under the regulation of CDK activity. One of these examples includes the kinesin-like protein MKLP2 in vertebrates.

      Trypanosoma is an evolutionarily interesting species to study mitosis since its kinetochore and centromere proteins do not show any similarity to other major branches of eukaryotes, while orthologs of Aurora B and INCENP have been identified. Combining molecular genetics, imaging, biochemistry, cross-linking IP-MS (IP-CLMS), and structural modeling, this manuscript reveals that two orphan kinesin-like proteins KIN-A and KIN-B act as localization modules of the CPC in Trypanosoma brucei. The IP-CLMS, AlphaFold2 structural predictions, and domain deletion analysis support the idea that (1) KIN-A and KIN-B form a heterodimer via their coiled-coil domain, (2) Two alpha helices of INCENP interact with the coiled-coil of the KIN-A-KIN-B heterodimer, (3) the conserved KIN-A C-terminal CD1 interacts with the heterodimeric KKT9-KKT11 complex, which is a submodule of the KKT7-KKT8 kinetochore complex unique to Trypanosoma, (4) KIN-A and KIN-B coiled-coil domains and the KKT7-KKT8 complex are required for CPC localization at the centromere, (5) CD1 and CD2 domains of KIN-A support its centromere localization. The authors further show that the ATPase activity of KIN-A is critical for spindle midzone enrichment of the CPC. The imaging data of the KIN-A rigor mutant suggest that dynamic KIN-A-microtubule interaction is required for metaphase alignment of the kinetochores and proliferation. Overall, the study reveals novel pathways of CPC localization regulation via KIN-A and KIN-B by multiple complementary approaches.

      Strengths:

      The major conclusion is collectively supported by multiple approaches, combining site-specific genome engineering, epistasis analysis of cellular localization, AlphaFold2 structure prediction of protein complexes, IP-CLMS, and biochemical reconstitution (the complex of KKT8, KKT9, KKT11, and KKT12).

      We thank the reviewer for her/his positive assessment of our manuscript.

      Weaknesses:

      • The predictions of direct interactions (e.g. INCENP with KIN-A/KIN-B, or KIN-A with KKT9-KKT11) have not yet been confirmed experimentally, e.g. by domain mutagenesis and interaction studies.

      Thank you for this point. It is true that we do not have evidence for direct interactions between KIN-A with KKT9-KKT11. However, the interaction between INCENP with KIN-A/KIN-B is strongly supported by our cross-linking IP-MS of native complexes. Furthermore, we show that deletion of the INCENPCPC1 N-terminus predicted to interact with KIN-A:KIN-B abolishes kinetochore localization.

      • The criteria used to judge a failure of localization are not clearly explained (e.g., Figure 5F, G).

      As suggested by the reviewer in recommendation #14, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      • It remains to be shown that KIN-A has motor activity.

      We thank the reviewer for this important comment. Indeed, motor activity remains to demonstrated using an in vitro system, which is beyond the scope of this study. What we show here is that the motor domain of KIN-A effectively co-sediments with microtubules and that spindle localization of KIN-A is abolished upon deletion of the motor domain. Moreover, mutation of a conserved Glycine residue in the Switch II region (G210) to Alanine (‘rigor mutation’, (Rice et al., 1999)), renders KIN-A incapable of translocating to the central spindle, suggesting that its ATPase activity is required for this process. To clarify this point in the manuscript, we have replaced all instances, where we refer to ‘motor activity’ of KIN-A with ‘ATPase activity’ when referring to experiments performed using the KIN-A rigor mutant. In addition, we have included a Multiple Sequence Alignment (MSA) of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9 in Figure 6A and S6A, showing the conservation of key motifs required for ATP coordination and tubulin interaction. In the corresponding paragraph in the main text, we describe these data as follows:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      • The authors imply that KIN-A, but not KIN-B, interacts with microtubules based on microtubule pelleting assay (Fig. S6), but the substantial insoluble fractions of 6HIS-KINA and 6HIS-KIN-B make it difficult to conclusively interpret the data. It is possible that these two proteins are not stable unless they form a heterodimer.

      This is indeed a possibility. We are currently aiming at purifying full-length recombinant KIN-A and KIN-B (along with the other CPC components), which will allow us to perform in vitro interaction studies and to investigate biochemical properties of this complex (including the role of the motor domains of KIN-A and KIN-B) within the framework of an in-depth follow-up study. To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      • For broader context, some prior findings should be introduced, e.g. on the importance of the microtubule-binding capacity of the INCENP SAH domain and its regulation by mitotic phosphorylation (PMID 8408220, 26175154, 26166576, 28314740, 28314741, 21727193), since KIN-A and KIN-B may substitute for the function of the SAH domain.

      We have modified the introduction to include the following text and references mentioned by the reviewer: ‘The localization module comprises Borealin, Survivin and the N-terminus of INCENP, which are connected to one another via a three-helical bundle (Jeyaprakash et al., 2007, 2011; Klein et al., 2006). The two modules are linked by the central region of INCENP, composed of an intrinsically disordered domain and a single alpha helical (SAH) domain. INCENP harbours microtubule-binding domains within the N-terminus and the central SAH domain, which play key roles for CPC localization and function (Samejima et al., 2015; Kang et al., 2001; Noujaim et al., 2014; Cormier et al., 2013; Wheatley et al., 2001; Nakajima et al., 2011; Fink et al., 2017; Wheelock et al., 2017; van der Horst et al., 2015; Mackay et al., 1993).’

      Reviewer #2 (Public Review):

      How the chromosomal passenger complex (CPC) and its subunit Aurora B kinase regulate kinetochore-microtubule attachment, and how the CPC relocates from kinetochores to the spindle midzone as a cell transitions from metaphase to anaphase are questions of great interest. In this study, Ballmer and Akiyoshi take a deep dive into the CPC in T. brucei, a kinetoplastid parasite with a kinetochore composition that varies greatly from other organisms.

      Using a combination of approaches, most importantly in silico protein predictions using alphafold multimer and light microscopy in dividing T. brucei, the authors convincingly present and analyse the composition of the T. brucei CPC. This includes the identification of KIN-A and KIN-B, proteins of the kinesin family, as targeting subunits of the CPC. This is a clear advancement over earlier work, for example by Li and colleagues in 2008. The involvement of KIN-A and KIN-B is of particular interest, as it provides a clue for the (re)localization of the CPC during the cell cycle. The evolutionary perspective makes the paper potentially interesting for a wide audience of cell biologists, a point that the authors bring across properly in the title, the abstract, and their discussion.

      The evolutionary twist of the paper would be strengthened 'experimentally' by predictions of the structure of the CPC beyond T. brucei. Depending on how far the authors can extend their in-silico analysis, it would be of interest to discuss a) available/predicted CPC structures in well-studied organisms and b) structural predictions in other euglenozoa. What are the general structural properties of the CPC (e.g. flexible linkers, overall dimensions, structural differences when subunits are missing etc.)? How common is the involvement of kinesin-like proteins? In line with this, it would be good to display the figure currently shown as S1D (or similar) as a main panel.

      We thank the reviewer for her/his encouraging assessment of our manuscript and the appreciation on the extent of the evolutionary relevance of our work. As suggested, we have moved the phylogenetic tree previously shown in Fig. S1D to the main Fig. 1F. Our AF2 analysis of CPC proteins and (sub)complexes from other kinetoplastids failed to predict reliable interactions among CPC proteins except for that between Aurora B and the IN box. It therefore remains unclear whether CPC structures are conserved among kinetoplastids. Because components of CPC remain unknown in other euglenozoa (other than Aurora B and INCENP), we cannot perform structural predictions of CPC in diplonemids or euglenids.

      It remains unclear how common the involvement of kinesin-like proteins with the CPC is in other eukaryotes, partly because we could not identify an obvious homolog of KIN-A/KIN-B outside of kinetoplastids. Addressing this question would require experimental approaches in various eukaryotes (e.g. immunoprecipitation and mass spectrometry of Aurora B) as we carried out in this manuscript using Trypanosoma brucei.

      Reviewer #3 (Public Review):

      Summary:

      The protein kinase, Aurora B, is a critical regulator of mitosis and cytokinesis in eukaryotes, exhibiting a dynamic localisation. As part of the Chromosomal Passenger Complex (CPC), along with the Aurora B activator, INCENP, and the CPC localisation module comprised of Borealin and Survivin, Aurora B travels from the kinetochores at metaphase to the spindle midzone at anaphase, which ensures its substrates are phosphorylated in a time- and space-dependent manner. In the kinetoplastid parasite, T. brucei, the Aurora B orthologue (AUK1), along with an INCENP orthologue known as CPC1, and a kinetoplastid-specific protein CPC2, also displays a dynamic localisation, moving from the kinetochores at metaphase to the spindle midzone at anaphase, to the anterior end of the newly synthesised flagellum attachment zone (FAZ) at cytokinesis. However, the trypanosome CPC lacks orthologues of Borealin and Survivin, and T. brucei kinetochores also have a unique composition, being comprised of dozens of kinetoplastid-specific proteins (KKTs). Of particular importance for this study are KKT7 and the KKT8 complex (comprising KKT8, KKT9, KKT11, and KKT12). Here, Ballmer and Akiyoshi seek to understand how the CPC assembles and is targeted to its different locations during the cell cycle in T. brucei.

      Strengths & Weaknesses:

      Using immunoprecipitation and mass-spectrometry approaches, Ballmer and Akiyoshi show that AUK1, CPC1, and CPC2 associate with two orphan kinesins, KIN-A and KIN-B, and with the use of endogenously expressed fluorescent fusion proteins, demonstrate for the first time that KIN-A and KIN-B display a dynamic localisation pattern similar to other components of the CPC. Most of these data provide convincing evidence for KIN-A and KIN-B being bona fide CPC proteins, although the evidence that KIN-A and KIN-B translocate to the anterior end of the new FAZ at cytokinesis is weak - the KIN-A/B signals are very faint and difficult to see, and cell outlines/brightfield images are not presented to allow the reader to determine the cellular location of these faint signals (Fig S1B).

      We thank the reviewer for their thorough assessment of our manuscript and the insightful feedback to further improve our study. To address the point above, we have acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      They then demonstrate, by using RNAi to deplete individual components, that the CPC proteins have hierarchical interdependencies for their localisation to the kinetochores at metaphase. These experiments appear to have been well performed, although only images of cell nuclei were shown (Fig 2A), meaning that the reader cannot properly assess whether CPC components have localised elsewhere in the cell, or if their abundance changes in response to depletion of another CPC protein.

      We chose to show close-ups of the nucleus to highlight the different localization patterns of CPC proteins under the different RNAi conditions. In none of these conditions did we observe mis-localization of CPC subunits to the cytoplasm. To clarify this point, we added the following sentence in the legend for Figure 2A:

      ‘A) Representative fluorescence micrographs showing the localization of YFP-tagged Aurora BAUK1, INCENPCPC1, KIN-A and KIN-B in 2K1N cells upon RNAi-mediated knockdown of indicated CPC subunits. Note that nuclear close-ups are shown here. CPC proteins were not detected in the cytoplasm. RNAi was induced with 1 μg/mL doxycycline for 24 h (KIN-B RNAi) or 16 h (all others). Cell lines: BAP3092, BAP2552, BAP2557, BAP3093, BAP2906, BAP2900, BAP2904, BAP3094, BAP2899, BAP2893, BAP2897, BAP3095, BAP3096, BAP2560, BAP2564, BAP3097. Scale bars, 2 μm.’

      Ballmer and Akiyoshi then go on to determine the kinetochore localisation domains of KIN-A and KIN-B. Using ectopically expressed GFP-tagged truncations, they show that coiled-coil domains within KIN-A and KIN-B, as well as a disordered C-terminal tail present only in KIN-A, but not the N-terminal motor domains of KIN-A or KIN-B, are required for kinetochore localisation. These data are strengthened by immunoprecipitating CPC complexes and crosslinking them prior to mass spectrometry analysis (IP-CLMS), a state-of-the-art approach, to determine the contacts between the CPC components. Structural predictions of the CPC structure are also made using AlphaFold2, suggesting that coiled coils form between KIN-A and KIN-B, and that KIN-A/B interact with the N termini of CPC1 and CPC2. Experimental results show that CPC1 and CPC2 are unable to localise to kinetochores if they lack their N-terminal domains consistent with these predictions. Altogether these data provide convincing evidence of the protein domains required for CPC kinetochore localisation and CPC protein interactions. However, the authors also conclude that KIN-B plays a minor role in localising the CPC to kinetochores compared to KIN-A. This conclusion is not particularly compelling as it stems from the observation that ectopically expressed GFP-NLS-KIN-A (full length or coiled-coil domain + tail) is also present at kinetochores during anaphase unlike endogenously expressed YFP-KIN-A. Not only is this localisation probably an artifact of the ectopic expression, but the KIN-B coiled-coil domain localises to kinetochores from S to metaphase and Fig S2G appears to show a portion of the expressed KIN-B coiled-coil domain colocalising with KKT2 at anaphase. It is unclear why KIN-B has been discounted here.

      As the reviewer points out, a small fraction of GFP-NLS-KIN-B317-624 is indeed detectable at kinetochores in anaphase, although most of the protein shows diffuse nuclear staining. There are various explanations for this phenomenon: It is conceivable that the KIN-B motor domain may contribute to microtubule binding and translocation of the CPC from kinetochores onto the spindle in anaphase. In our experiments, ectopically expressed KIN-B317-624 likely outcompetes a fraction of endogenous KIN-B for binding to KIN-A, which could interfere with this translocation process, leaving a population of CPC ‘stranded’ at kinetochores in anaphase. Another possibility, hinted at by the reviewer, is that the C-terminus of KIN-B interacts with receptors at the kinetochore/centromere. Although we do not discount this possibility, we nevertheless decided to focus on KIN-A in this study, because the anaphase kinetochore retention phenotype for both full-length GFP-NLS-KIN-A and -KIN-A309-862 is much stronger than for KIN-B317-624. Two additional reasons were that (i) KIN-A is highly conserved within kinetoplastids, whereas KIN-B orthologs are missing in some kinetoplastids, and (ii) no convincing interactions between KIN-B and kinetochore proteins were predicted by AF2.

      To address the reviewer’s point, we decided to include KIN-B in the title of this manuscript, which now reads: ‘Dynamic localization of the chromosomal passenger complex is controlled by the orphan kinesins KIN-A and KIN-B in the kinetoplastid parasite Trypanosoma brucei’.

      Moreover, we modified the corresponding paragraph in the results section as follows:

      ‘Intriguingly, unlike endogenously YFP-tagged KIN-A, ectopically expressed GFP fusions of both full-length KIN-A and KIN-A310-862 clearly localized at kinetochores even in anaphase (Figs. 2, F and H). Weak anaphase kinetochore signal was also detectable for KIN-B317-624 (Fig. S2F). GFP fusions of the central coiled-coil domain or the C-terminal disordered tail of KIN-A did not localize to kinetochores (data not shown). These results show that kinetochore localization of the CPC is mediated by KIN-A and KIN-B and requires both the central coiled-coil domain as well as the C-terminal disordered tail of KIN-A.’

      Next, using a mixture of RNAi depletion and LacI-LacO recruitment experiments, the authors show that kinetochore proteins KKT7 and KKT9 are required for AUK1 to localise to kinetochores (other KKT8 complex components were not tested here) and that all components of the KKT8 complex are required for KIN-A kinetochore localisation. Further, both KKT7 and KKT8 were able to recruit AUK1 to an ectopic locus in the S phase, and KKT7 recruited KKT8 complex proteins, which the authors suggest indicates it is upstream of KKT8. However, while these experiments have been performed well, the reciprocal experiment to show that KKT8 complex proteins cannot recruit KKT7, which could have confirmed this hierarchy, does not appear to have been performed. Further, since the LacI fusion proteins used in these experiments were ectopically expressed, they were retained (artificially) at kinetochores into anaphase; KKT8 and KIN-A were both able to recruit AUK1 to LacO foci in anaphase, while KKT7 was not. The authors conclude that this suggests the KKT8 complex is the main kinetochore receptor of the CPC - while very plausible, this conclusion is based on a likely artifact of ectopic expression, and for that reason, should be interpreted with a degree of caution.

      We previously showed that RNAi-mediated depletion of KKT7 disrupts kinetochore localization of KKT8 complex members, whereas kinetochore localization of KKT7 is unaffected by disruption of the KKT8 complex (Ishii and Akiyoshi, 2020). Moreover, in contrast to the KKT8 complex, KKT7 remains at kinetochores in anaphase (Akiyoshi and Gull, 2014). These data show that KKT7 is upstream of the KKT8 complex. In this context, the LacI-LacO tethering approach can be very useful to probe whether two proteins (or domains of proteins) could interact in vivo either directly or indirectly. However, a recruitment hierarchy cannot be inferred from such experiments because the data just shows whether X can recruit Y to an ectopic locus (but not whether X is upstream of Y or vice versa). Regarding the retention of Aurora BAUK1 at kinetochores in anaphase upon ectopic expression of GFP-KKT8-LacI, we agree with the reviewer that these data need to be carefully interpreted. Nevertheless, the notion that the KKT7-KKT8 complex recruits the CPC to kinetochores is also strongly supported by IP-MS, RNAi experiments, and AF2 predictions. For clarification and to address the reviewer’s point, we re-formulated the corresponding paragraph in the main text:

      ‘We previously showed that KKT7 lies upstream of the KKT8 complex (Ishii and Akiyoshi, 2020). Indeed, GFP-KKT72-261-LacI recruited tdTomato-KKT8, -KKT9 and -KKT12 (Fig. S4E). Expression of both GFP-KKT72-261-LacI and GFP-KKT8-LacI resulted in robust recruitment of tdTomato-Aurora BAUK1 to LacO foci in S phase (Figs. 4, E and F). Intriguingly, we also noticed that, unlike endogenous KKT8 (which is not present in anaphase), ectopically expressed GFP-KKT8-LacI remained at kinetochores during anaphase (Fig. 4F). This resulted in a fraction of tdTomato-Aurora BAUK1 being trapped at kinetochores during anaphase instead of migrating to the central spindle (Fig. 4F). We observed a comparable situation upon ectopic expression of GFP-KIN-A, which is retained on anaphase kinetochores together with tdTomato-KKT8 (Fig. S4F). In contrast, Aurora BAUK1 was not recruited to LacO foci marked by GFP- KKT72-261-LacI in anaphase (Fig. 4E).’

      Further IP-CLMS experiments, in combination with recombinant protein pull-down assays and structural predictions, suggested that within the KKT8 complex, there are two subcomplexes of KKT8:KKT12 and KKT9:KKT11, and that KKT7 interacts with KKT9:KKT11 to recruit the remainder of the KKT8 complex. The authors also assess the interdependencies between KKT8 complex components for localisation and expression, showing that all four subunits are required for the assembly of a stable KKT8 complex and present AlphaFold2 structural modelling data to support the two subcomplex models. In general, these data are of high quality and convincing with a few exceptions. The recombinant pulldown assay (Fig. 4H) is not particularly convincing as the 3rd eluate gel appears to show a band at the size of KKT11 (despite the labelling indicating no KKT11 was present in the input) but no pulldown of KKT9, which was present in the input according to the figure legend (although this may be mislabeled since not consistent with the text). The text also states that 6HIS-KKT8 was insoluble in the absence of KKT12, but this is not possible to assess from the data presented.

      We thank the reviewer for pointing out an error in the text: ‘Removal of both KKT9 and KKT11 did not impact formation of the KKT8:KKT12 subcomplex’ should read ‘Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex’. Regarding the very faint band perceived to be KKT11 in the 3rd eluate: This band runs slightly lower than KKT11 and likely represents a bacterial contaminant (which we have seen also in other preps in the past). We have made a note of this in the corresponding legend (new Fig. 4I). Moreover, we provide the estimated molecular weights for each subunit, as suggested by the reviewer in recommendation #14 (see below):

      ‘(I) Indicated combinations of 6HIS-tagged KKT8 (~46 kDa), KKT9 (~39 kDa), KKT11 (~29 kDa) and KKT12 (~23 kDa) were co-expressed in E. coli, followed by metal affinity chromatography and SDS-PAGE. The asterisk indicates a common contaminant.’

      The corresponding paragraph in the results section now reads:

      To validate these findings, we co-expressed combinations of 6HIS-KKT8, KKT9, KKT11 and KKT12 in E. coli and performed metal affinity chromatography (Fig. 4I). 6HIS-KKT8 efficiently pulled down KKT9, KKT11 and KKT12, as shown previously (Ishii and Akiyoshi, 2020). In the absence of KKT9, 6HIS-KKT8 still pulled down KKT11 and KKT12. Removal of either KKT9 or KKT11 did not impact formation of the KKT8:KKT12 subcomplex. In contrast, 6HIS-KKT8 could not be recovered without KKT12, indicating that KKT12 is required for formation of the full KKT8 complex. These results support the idea that the KKT8 complex consists of KKT8:KKT12 and KKT9:KKT11 subcomplexes.’

      It is also surprising that data showing the effects of KKT8, KKT9, and KKT12 depletion on KKT11 localisation and abundance are not presented alongside the reciprocal experiments in Fig S4G-J.

      YFP-KKT11 is delocalized upon depletion of KKT8 and KKT9 (see below). Unfortunately, we were unsuccessful in our attempts at deriving the corresponding KKT12 RNAi cell line, rendering this set of data incomplete. Because these data are not of critical importance for this study, we decided not to invest more time in attempting further transfections.

      Author response image 1.

      The authors also convincingly show that AlphaFold2 predictions of interactions between KKT9:KKT11 and a conserved domain (CD1) in the C-terminal tail of KIN-A are likely correct, with CD1 and a second conserved domain, CD2, identified through sequence analysis, acting synergistically to promote KIN-A kinetochore localisation at metaphase, but not being required for KIN-A to move to the central spindle at anaphase. They then hypothesise that the kinesin motor domain of KIN-A (but not KIN-B which is predicted to be inactive based on non-conservation of residues key for activity) determines its central spindle localisation at anaphase through binding to microtubules. In support of this hypothesis, the authors show that KIN-A, but not KIN-B can bind microtubules in vitro and in vivo. However, ectopically expressed GFP-NLS fusions of full-length KIN-A or KIN-A motor domain did not localise to the central spindle at anaphase. The authors suggest this is due to the GPF fusion disrupting the ATPase activity of the motor domain, but they provide no evidence that this is the case. Instead, they replace endogenous KIN-A with a predicted ATPase-defective mutant (G209A), showing that while this still localises to kinetochores, the kinetochores were frequently misaligned at metaphase, and that it no longer concentrates at the central spindle (with concomitant mis-localisation of AUK1), causing cells to accumulate at anaphase. From these data, the authors conclude that KIN-A ATPase activity is required for chromosome congression to the metaphase plate and its central spindle localisation at anaphase. While potentially very interesting, these data are incomplete in the absence of any experimental data to show that KIN-A possesses ATPase activity or that this activity is abrogated by the G209A mutation, and the conclusions of this section are rather speculative.

      Thank you for this important comment, which relates to a similar point raised by Reviewer 1 (see above). Indeed, ATPase and motor activity of KIN-A remain to demonstrated biochemically using recombinant proteins, which is beyond the scope of this study. We generated MSAs of KIN-A and KIN-B from different kinetoplastids with human Kinesin-1, human Mklp2 and yeast Klp9, which are now presented in Figure 6A and S6A. These clearly show that key motifs required for ATP or tubulin binding in other kinesins are highly conserved in KIN-A (but not KIN-B). This includes the conserved glycine residue in the Switch II helix (G234 in human Kinesin-1, G210 in T. brucei KIN-A), which forms a hydrogen bond with the γ-phosphate of ATP, and upon mutation has been shown to impair ATPase activity and trap the motor head in a strong microtubule (‘rigor’) state (Rice et al., 1999; Sablin et al., 1996). The prominent rigor phenotype of KIN-AG210A is consistent with KIN-A having ATPase activity. In addition to the data in Fig. 6A and S6A, we made following changes to the main text:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).

      Ectopically expressed GFP-KIN-A and -KIN-A2-309 partially localized to the mitotic spindle but failed to concentrate at the midzone during anaphase (Figs. 2, F and G), suggesting that N-terminal tagging of the KIN-A motor domain may interfere with its function. To address whether the ATPase activity of KIN-A is required for central spindle localization of the CPC, we replaced one allele of KIN-A with a C-terminally YFP-tagged G210A ATP hydrolysis-defective rigor mutant (Fig. 6A) (Rice et al., 1999) and used an RNAi construct directed against the 3’UTR of KIN-A to deplete the untagged allele. The rigor mutation did not affect recruitment of KIN-A to kinetochores (Figs. S6, C and D). However, KIN-AG210A-YFP marked kinetochores were misaligned in ~50% of cells arrested in metaphase, suggesting that ATPase activity of KIN-A promotes chromosome congression to the metaphase plate (Figs. S6, E-H).’

      Impact:

      Overall, this work uses a wide range of cutting-edge molecular and structural predictive tools to provide a significant amount of new and detailed molecular data that shed light on the composition of the unusual trypanosome CPC and how it is assembled and targeted to different cellular locations during cell division. Given the fundamental nature of this research, it will be of interest to many parasitology researchers as well as cell biologists more generally, especially those working on aspects of mitosis and cell division, and those interested in the evolution of the CPC.

      We thank the reviewer for his/her feedback and thoughtful and thorough assessment of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Why did the authors omit KIN-B from the title?

      We decided to add KIN-B in the title. Please see our response to Reviewer #3 (public review).

      (2) Abstract, line 28, "Furthermore, the kinesin motor activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset." This must be revised - see public review.

      We changed this section of the abstract as follows:

      ‘Furthermore, the ATPase activity of KIN-A promotes chromosome alignment in prometaphase and CPC translocation to the central spindle upon anaphase onset. Thus, KIN-A constitutes a unique ‘two-in-one’ CPC localization module in complex with KIN-B, which directs the CPC to kinetochores (from S phase until metaphase) via its C-terminal tail, and to the central spindle (in anaphase) via its N-terminal kinesin motor domain.’

      (3) Line 87-90. The findings by Li et al., 2008 (KIN-A and KIN-B interacting with Aurora B and epistasis analysis) should be introduced more comprehensively in the Introduction section.

      We added the following sentence in the introduction:

      ‘In addition, two orphan kinesins, KIN-A and KIN-B, have been proposed to transiently associate with Aurora BAUK1 during mitosis (Li et al., 2008; Li, 2012).’

      (4) Figure 1B. The way the Trypanosoma cell cycle is defined should be briefly explained in the main text, rather than just referring to the figure.

      The ‘KN’ annotation of the trypanosome cell cycle is explained in the Figure 1 legend. We now also added a brief description in the main text:

      ‘We next assessed the localization dynamics of fluorescently tagged KIN-A and KIN-B over the course of the cell cycle (Figs. 1, B-E). T. brucei possesses two DNA-containing organelles, the nucleus (‘N’) and the kinetoplast (‘K’). The kinetoplast is an organelle found uniquely in kinetoplastids, which contains the mitochondrial DNA and replicates and segregates prior to nuclear division. The ‘KN’ configuration serves as a good cell cycle marker (Woodward and Gull, 1990; Siegel et al., 2008).’

      (5) Line 118. Throughout the paper, it is not clear why GFP-NLS fusion was used instead of GFP fusion. Please justify the fusion of NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (6) Line 121, "Unexpectedly". It is not clear why this was unexpected.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      (7) Line 127-129. Defining homologs and orthologs is tricky - there are many homologs and paralogs of kinesin-like proteins. The method to define the presence or absence of KIN-A/KIN-B homologs should be described in the Materials and Methods section.

      Due to the difficulty in defining true orthologs for kinesin-like proteins, we took a conservative approach: reciprocal best BLAST hits. We first searched KIN-A homologs using BLAST in the TriTryp database or using hmmsearch using manually prepared hmm profiles. When the top hit in a given organism found T. brucei KIN-A in a reciprocal BLAST search in T. brucei proteome, we considered the hit as a true ortholog. We modified the Materials and Methods section as below.

      ‘Searches for homologous proteins were done using BLAST in the TriTryp database (Aslett et al., 2010) or using hmmsearch using manually prepared hmm profiles (HMMER version 3.0; Eddy, 1998). The top hit was considered as a true ortholog only if the reciprocal BLAST search returned the query protein in T. brucei.’

      (8) Line 156. For non-experts of Trypanosoma cell biology, it is not clear how the nucleolar localization is defined.

      The nucleolus in T. brucei is discernible as a DAPI-dim region in the nucleus.

      (9) Fig.2G and Fig.S2F. These data imply that the coiled-coil and C-terminal tail domains of KIN-A/KIN-B are important for anaphase spindle midzone enrichment. However, it is odd that this was not mentioned. This reviewer recommends that the authors quantify the midzone localization data of these constructs and discuss the role of the coiled-coil domains.

      One possibility is that KIN-A and KIN-B need to form a complex (via their coiled-coil domains) to localize to the spindle midzone. Another likely possibility, which is discussed in the manuscript, is that N-terminal tagging of KIN-A impairs motor activity. This is supported by the fact that the central spindle localization is also disrupted in full-length GFP-KIN-A. We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      (10) Line 288-289, "pLDDT scores improved significantly for KIN-A CD1 in complex with KKT9:KKT11 (>80) compared to KIN-A CD1 alone (~20) (Figs. S3, A and B)." I can see that pLDDT score is about 20 at KIN-A CD1 from Figs S3A, but the basis of pLDDT > 80 upon inclusion go KKT9:KKT11 is missing.

      We added the pLDDT and PAE plots for the AF2 prediction of KIN-A700-800 in complex with KKT9:KKT11 in Fig. S5B.

      (11) Fig. 5A. Since there is no supporting biochemical data for KIN-A-KKT9-KKT11 interaction, it is important to assess the stability of AlphaFold-based structural predictions of the KIN-A-KKT9-KKT11 interaction. Are there significant differences among the top 5 prediction results, and do these interactions remain stable after the "simulated annealing" process used in the AlphaFold predictions? Are predicted CD1-interacting regions/amino residues in KKT9 and KKT11 evolutionarily conserved?

      See above. The interaction was predicted in all 5 predictions as shown in Fig. S5B. Conservation of the CD1-interacting regions in KKT9 and KKT11 are shown below:

      Author response image 2.

      KKT9 (residues ~53 – 80 predicted to interact with KIN-A in T. brucei)

      Author response image 3.

      KKT11 (residues 61-85 predicted to interact with KIN-A in T. brucei)

      (12) Line 300, Fig. S5D and E, "failed to localize at kinetochores". From this resolution of the microscopy images, it is not clear if these proteins fail to localize at kinetochores as the KKT and KIN-A310-716 signals overlap. Perhaps, "failed to enrich at kinetochores" is a more appropriate statement.

      We changed this sentence according to the reviewer’s suggestion.

      (13) Line 309 and Fig 5D and F, "predominantly localized to the mitotic spindle". From this image shown in Fig 5D, it is not clear if KIN-A∆CD1-YFP and Aurora B are predominantly localized to the spindle or if they are still localized to centromeres that are misaligned on the spindle. Without microtubule staining, it is also not clear how microtubules are distributed in these cells. Please clarify how the presence or absence of kinetochore/spindle localization was defined.

      As shown in Fig. S5E and S5F, deletion of CD1 clearly impairs kinetochore localization of KIN-A (kinetochores marked by tdTomato-KKT2). Moreover, misalignment of kinetochores, as observed upon expression of the KIN-AG210A rigor mutant, would result in an increase in 2K1N cells and proliferation defects, which is not the case for the KIN-A∆CD1 mutant (Fig. 5H, Fig. S5I). KIN-A∆CD1-YFP appears to localize diffusely along the entire length of the mitotic spindle, whereas we still observe kinetochore-like foci in the rigor mutant. Unfortunately, we do not have suitable antibodies that would allow us to distinguish spindle microtubules from the vast subpellicular microtubule array present in T. brucei and hence need to rely on tagging spindle-associated proteins such as MAP103.

      (14) Fig. 5F, G, S5F. Along the same lines, it would be helpful to show example images for each category - "kinetochores", "kinetochores + spindle", and "spindle".

      As suggested by the reviewer, we have now included example images for each category (‘kinetochores’, ‘kinetochores + spindle’, ‘spindle’) along with a schematic illustration in Fig. 5F.

      (15) Line 332 and Fig. S6A. The experiment may be repeated in the presence of ATP or nonhydrolyzable ATP analogs.

      We thank the reviewer for the suggestion. We envisage such experiments for an in-depth follow-up study.

      (16) Line 342, "motor activity of KIN-A". Until KIN-A is shown to have motor activity, the result based on the rigor mutant does not show that the motor activity of KIN-A promotes chromosome congression. The result suggests that the ATPase activity of KIN-A is important.

      We changed that sentence as suggested by the reviewer.

      (17) Line 419 -. The authors base their discussion on the speculation that KIN-A is a plus-end directed motor. Please justify this speculation.

      Indeed, the notion that KIN-A is a plus-end directed motor remains a hypothesis, which is based on sequence alignments with other plus-end directed motors and the observation that the KIN-A motor domain is involved in translocation of the CPC to the central spindle in anaphase. We have modified the corresponding section in the discussion as follows:

      ‘It remains to be investigated whether KIN-A truly functions as a plus-end directed motor. The role of the KIN-B in this context is equally unclear. Since KIN-B does not possess a functional kinesin motor domain, we deem it unlikely that the KIN-A:KIN-B heterodimer moves hand-over-hand along microtubules as do conventional (kinesin-1 family) kinesins. Rather, the KIN-A motor domain may function as a single-headed unit and drive processive plus-end directed motion using a mechanism similar to the kinesin-3 family kinesin KIF1A (Okada and Hirokawa, 1999).’

      (18) Line 422-423, "plus-end directed motion using a mechanism similar to kinesin-3 family kinesins (such as KIF1A)." Please cite a reference supporting this statement.

      See above. We cited a paper by (Okada and Hirokawa, 1999).

      Reviewer #2 (Recommendations For The Authors):

      Please provide a quantification of data shown in Figure 2F-H and described in lines 151-166.

      We decided not to provide a quantification for these data due to low sample sizes for some of the constructs (e.g. expression not observed in all cells).

      It appears as if the paper more or less follows a chronological order of the experiments that were performed before AF multimer enabled the insightful and compelling structural analysis. That is a matter of style, but in some cases, the writing could be updated, shortened, or re-arranged into a more logical order. Concrete examples:

      (i) Line 144: "we did not include CPC2 for further analysis in this study" Although CPC2 features at a prominent and interesting position in the predicted structures of the kinetoplastid CPC, shown in later main figures.

      We attempted RNAi-mediated depletion of CPC2 using two different shRNA constructs. However, we cannot exclude the possibility that the knockdown of CPC2 was less efficient compared with the other CPC subunits. For this reason, we decided to remove all the data on CPC2 from Fig. S2.

      (ii) The work with the KIN-A motor domain only and KIN-A ∆motor domain (Fig 2) begs the question about a more subtle mutation to interfere with the motor domain. Which is ultimately presented in Fig 6. I think that the final paragraph and Figure 6 follow naturally after Figure 2.

      We appreciate the suggestion. However, we would like to keep Figure 6 there.

      (iii) The high-confidence structural predictions in Fig 3 and Fig 4 are insightful. The XL-MS descriptions that precede them are not so helpful (Fig 3A and 4G and in the text). To emphasize their status as experimental support for the predicted structures, which is very important, it would be good to discuss the XL-MS after presenting the models.

      As suggested, we have re-arranged the text and/or figures such that the AF2 predictions are discussed first and the CLMS data are brought in afterwards.

      Figure 1A prominently features an arbitrary color code and a lot of protein IDs without a legend. That is not a very convincing start. Figure S1 is more informative, containing annotated protein names and results of the KIN-A and KIN-B IPs. Please improve Figure 1A, for example by presenting a modified version of Figure S1. In all these types of figures, please list both protein names and gene IDs.

      We agree with the reviewer that the IP-MS data in Fig. S1 is more informative and hence decided to swap the heatmaps in Fig. 1A and Fig. S1A. We further annotated the heatmap corresponding to the Aurora BAUK1 IP-MS (now presented in Fig. S1) as suggested by the reviewer.

      The visualization of the structural predictions is not consistent among figures:

      (i) The structure in Fig 4I is important and could be displayed larger. The pLDDT scores, and especially those of the non-displayed models, do not add much information and should not be a main panel. If the authors want to display the pLDDT scores, I recommend a panel (main or supplement) of the structure colored for local prediction confidences, as in Fig 5A.

      (ii) In Figure 5A itself, it is hard to follow the chains in general, and KIN-A in particular, since the structure is pLDDT-coloured. Please present an additional panel colored by chain (consistent with Fig 4I, as mentioned above).

      (iii) The summarizing diagram, currently displayed as Fig 4J, should be placed after Fig 5A and take the discovered KIN-A - KKT9-11 connection into account. Ideally, it also covers the suspected importance of the motor domain and serves as a summarising diagram.

      We thank the reviewer for the constructive comments. For each structure prediction, we now present two images side by side; one coloured by chain and one colored by pLDDT. We recently re-ran AF2 for the full CPC and also for the KKT7N-KKT8 complex, and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction. We also increased the size of the structures shown in Fig. 4. Furthermore, we decided to remove the summarizing diagram from Fig. 4 and instead made a new main Fig. 7, which shows a more detailed schematic, which also takes into account the proposed function of the KIN-A motor domain, as suggested by the reviewer, and other points addressed in the Discussion.

      The methods section for the structural predictions lacks essential information. Predictions can only be reproduced if the version of AF2 multimer v2.x is specified and key parameters are mentioned.

      As suggested, we have added the details in the Materials and Methods section as follows.

      ‘Structural predictions of KIN-A/KIN-B, KIN-A310-862/KIN-B317-624, CPC1/CPC2/KIN-A300-599/KIN-B 317-624, and KIN-A700-800/KKT9/KKT11 were performed using ColabFold version 1.3.0 (AlphaFold-Multimer version 2), while those of AUK1/CPC1/CPC2/KIN-A1-599/KIN-B, KKT71-261/KKT9/KKT11/KKT8/KKT12, KKT9/KKT11/KKT8/KKT12, and KKT71-261/KKT9/KKT11 were performed using ColabFold version 1.5.3 (AlphaFold-Multimer version 2.3.1) using default settings, accessed via https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.3.0/AlphaFold2.ipynb and https://colab.research.google.com/github/sokrypton/ColabFold/blob/v1.5.3/AlphaFold2.ipynb.’

      Line 121, please explain the "Unexpectedly" by including a reference to the work from Li and colleagues. A statement with some details would be useful, as the difference between both studies appears to be crucial for the novelty of this paper. Alternatively, refer to this being covered in the discussion.

      To clarify this point, we modified this paragraph in the results section:

      ‘To our surprise, KIN-A-YFP and GFP-KIN-B exhibited a CPC-like localization pattern identical to that of Aurora BAUK1: Both kinesins localized to kinetochores from S phase to metaphase, and then translocated to the central spindle in anaphase (Figs. 1, C-E). Moreover, like Aurora BAUK1, a population of KIN-A and KIN-B localized at the new FAZ tip from late anaphase onwards (Figs. S1, B and C). This was unexpected, because KIN-A and KIN-B were previously reported to localize to the spindle but not to kinetochores or the new FAZ tip (Li et al., 2008). These data suggest that KIN-A and KIN-B are bona fide CPC proteins in trypanosomes, associating with AuroraAUK1, INCENPCPC1 and CPC2 throughout the cell cycle.’

      Line 285 refers to "conserved" regions in the C-terminal part of KIN-A, referring to Figure 5. Please expand the MSA in Figure 5B to get an idea about the conservation/variation outside CD1 and CD2.

      We now present the full MSA for KIN-A proteins in kinetoplastids in Fig. S5A.

      Please specify what is meant by Line 367-369 for someone who is not familiar with the work by Komaki et al. 2022. Either clarify in the text or clarify in the text with data to support it.

      We updated the corresponding section in the discussion as follows:

      ‘Komaki et al. recently identified two functionally redundant CPC proteins in Arabidopsis, Borealin Related Interactor 1 and 2 (BORI1 and 2), which engage in a triple helix bundle with INCENP and Borealin using a conserved helical domain but employ an FHA domain instead of a BIR domain to read H3T3ph (Komaki et al., 2022).’

      Data presented in Figure S6A, the microtubule co-sedimentation assay, is not convincing since a substantial amount of KIN-A/B is pelleted in the absence of microtubules. Did the authors spin the proteins in BRB80 before the assay to continue with soluble material and reduce sedimentation in the absence of microtubules? If the authors want to keep the wording in lines 331-332, the MT-binding properties of KIN-A and KIN-B need to be investigated in more detail, for example with a titration and a quantification thereof. Otherwise, they should change the text and replace "confirms" with "is consistent with". In any case, the legend needs to be expanded to include more information.

      To address the point above, we have added the following text in the legend corresponding to Fig. S6:

      ‘Microtubule co-sedimentation assay with 6HIS-KIN-A2-309 (left) and 6HIS-KIN-B2-316 (right). S and P correspond to supernatant and pellet fractions, respectively. Note that both constructs to some extent sedimented even in the absence of microtubules. Hence, lack of microtubule binding for KIN-B may be due to the unstable non-functional protein used in this study.’

      We have also updated the main text in the results section:

      ‘We therefore speculated that anaphase translocation of the kinetoplastid CPC to the central spindle may involve the kinesin motor domain of KIN-A. KIN-B is unlikely to be a functional kinesin based on the absence of several well-conserved residues and motifs within the motor domain, which are fully present in KIN-A (Li et al., 2008). These include the P-loop, switch I and switch II motifs, which form the nucleotide binding cleft, and many conserved residues within the α4-L12 elements, which interact with tubulin (Fig. S6A) (Endow et al., 2010). Consistent with this, the motor domain of KIN-B, contrary to KIN-A, failed to localize to the mitotic spindle when expressed ectopically (Fig. S2E) and did not co-sediment with microtubules in our in vitro assay (Fig. S6B).’

      Details:

      The readability of the pAE plots could be improved by arranging sequences according to their position in the structure. For example in Fig4I, KKT8 could precede KKT12. If it is easy to update this, the authors might want to do so.

      We re-ran the AF2 predictions for the KKT7N – KKT8 complex in Fig. 4/S4 and changed the order according to the reviewer’s suggestion (KKT9:KKT11:KKT8:KKT12).

      The same paper is referred to as Je Van Hooff et al. 2017 and as Van Hooff et al. 2017

      Thank you for pointing this out. We have corrected the citation.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please state at the end of the introduction/start of the results section that this work was performed in procyclic trypanosomes. Given that the cell cycles of procyclic and bloodstream forms differ, this is important.

      We added this information at the end of the introduction:

      ‘Here, by combining biochemical, structural and cell biological approaches in procyclic form T. brucei, we show that the trypanosome CPC is a pentameric complex comprising Aurora BAUK1, INCENPCPC1, CPC2 and the two orphan kinesins KIN-A and KIN-B.’

      (2) Please define NLS at first use (line 118), and for clarity, explain the rationale for using GFP with an NLS.

      NLS refers to a short ‘nuclear localization signal’ (TGRGHKRSREQ) (Marchetti et al., 2000), which ensures that the ectopically expressed construct is imported into the nucleus. When we previously expressed truncations of KKT2 and KKT3 kinetochore proteins, many fragments did not go into the nucleus presumably due to the lack of an NLS, which prevented us from determining which domains are responsible for their kinetochore localization. We have since then consistently used this short NLS sequence in our inducible GFP fusions in the past without any complications. We added a sentence in the Materials & Methods section under Trypanosome culture: ‘All constructs for ectopic expression of GFP fusion proteins include a short nuclear localization signal (NLS) (Marchetti et al., 2000).’ To avoid unnecessary confusion, we removed ‘NLS’ from the main text and figures.

      (3) Lines 148-150 - it would strengthen this claim if KIN-A/B protein levels were assessed by Western blot.

      We now present a Western blot in Fig. S2C, showing that bulk KIN-B levels are clearly reduced upon KIN-A RNAi. The same is true also to some extent for KIN-A levels upon KIN-B RNAi, although this is less obvious, possibly due to the lower efficiency of KIN-B compared to KIN-A RNAi as judged by fluorescence microscopy (quantified in Fig. 2D and 2E).

      (4) Line 253 - the text mentions the removal of both KKT9 and KKT11, which is not consistent with the figure (Fig 4H) - do you mean the removal of either KKT9 or KKT11?

      Yes, we thank the reviewer for pointing out this mistake in the text, which has now been corrected.

      (5) Line 337 - please include a reference for the G209A ATPase-defective rigor mutant - has this been shown to result in KIN-A being inactive previously?

      Please see above our answer in public review.

      (6) It is not always obvious when fluorescent fusion proteins are being expressed endogenously or ectopically, or when they are being expressed in an RNAi background or not without tracing the cell lines in Table S1 - please ensure this is clearly stated throughout the manuscript.

      We now made sure that this is clearly stated in the main text as well as in the figure legends.

      (7) Line 410 - 'KIN-A C-terminal tail is stuffed full of conserved CDK1CRK3 sites' - what does 'stuffed full' really mean (this is rather imprecise) and what are the consensus sites - are these CDK1 consensus sites that are assumed to be conserved for CRK3? I'm not aware of consensus sites for CRK3 having been determined, but if they have, this should be referenced.

      We have modified the corresponding section in the discussion as follows:

      ‘In support of this, the KIN-A C-terminal tail harbours many putative CRK3 sites (10 sites matching the minimal S/T-P consensus motif for CDKs) and is also heavily phosphorylated by Aurora BAUK1 in vitro (Ballmer et al. 2024). Finally, we speculate that the interaction of KIN-A motor domain with microtubules, coupled to the force generating ATP hydrolysis and possibly plus-end directed motion, eventually outcompetes the weakened interactions of the CPC with the kinetochore and facilitates the extraction of the CPC from chromosomes onto spindle microtubules during anaphase. Indeed, deletion of the KIN-A motor domain or impairment of its motor function through N-terminal GFP tagging causes the CPC to be trapped at kinetochores in anaphase. Central spindle localization is additionally dependent on the ATPase activity of the KIN-A motor domain as illustrated by the KIN-A rigor mutant.’

      (8) Lines 412-416: this proposal is written rather definitively - given no motor activity has been demonstrated for KIN-A, please make clear that this is still just a theory.

      See above.

      (9) Fig 1: KKT2 is not highlighted in Fig 1A - given this has been used for colocalization in Fig 1C-E, was it recovered, and if not, why not? Fig 1B-E: the S phase/1K1N terminology is somewhat misleading. Not all S phase cells will have elongated kinetoplasts - usually an asterisk is used to signify replicated DNA, not kinetoplast shape. If it is to be used here for elongation, then for consistency, N should be used for G2/mitotic cells.

      Fig. 1A (now Fig. S1A) only shows the tip 30 hits. KKT2 was indeed recovered with Aurora BAUK1 (see Table S2) and is often used as a kinetochore marker in trypanosomes by our lab and others since the signal of fluorescently tagged KKT2 is relatively bright and KKT2 localizes to centromeres throughout the cell cycle.

      (10) A general comment for all image figures is that these do not have accompanying brightfield images and it is therefore difficult to know where the cell body is, or sometimes which nuclei and kinetoplasts belong to which cell where DNA from more than one cell is within the image. It would be beneficial if brightfield images could be added, or alternatively, the cell outlines were traced onto DAPI or merged images. Also, brightfield images would allow the stage of cytokinesis (pre-furrowing/furrowing/abscission) in anaphase cells to be determined.

      Since this study primarily addresses the recruitment mechanism of the CPC to kinetochores and to the central spindle from S phase to metaphase and in anaphase, respectively, and CPC proteins are not observed outside of the nucleus during these cell cycle stages, we did not present brightfield images in the figures. However, this point is particularly valid for discerning the localization of KIN-A and KIN-B to the new FAZ tip from late anaphase onwards. Hence, we acquired new microscopy data for Fig. S1B and S1C, which now includes phase contrast images, and have chosen representative cells in late anaphase and telophase. We hope that the signal of Aurora BAUK1, KIN-A and KIN-B at the anterior end of the new FAZ can be now distinguished more clearly.

      (11) Fig 2A: legend should state that the micrographs show the localisation of the proteins within the nucleus as whole cells are not shown. 2C: can INCENP not be split into 2 lines - the 'IN' looks like 1N at first glance, which is confusing.

      We have applied the suggested change in Fig. 2.

      (12) Fig 3 (and other AF2 figures): Could the lines for satisfied & not satisfied in the key be thicker so they more closely resemble the lines in the figure and are less likely to be confused with the disordered regions of the CPC components?

      We have now made those lines thicker.

      (13) Why were different E value thresholds used in Fig 3 and Fig 4?

      The CLMS data in Fig. 3 and Fig. 4 now both use the same E value threshold of E-3 (previously E-4 was used in Fig. 4). To determine a sensible significance threshold, we included some yeast protein sequences (‘false positives’) in the database used in pLink2 for identification of crosslinked peptides. Note that we recently also re-ran AF2 for the full CPC and for the KKT7N-KKT8 complex and got improved predictions. Hence some of the models in Fig. 3/S3 and Fig. 4/S4 have been updated accordingly. For the CLMS plots, we also decided to colour the cross-links according to whether the 30 angstrom distance constraints were fulfilled or not in the AF2 prediction.

      (14) Fig 4H legend - please give the expected sizes of these recombinant proteins & check the 3rd elution panel (see public review comments).

      See above response in public review.

      (15) Fig 4I - please explain what the colours of the PAE plot and the values in the key signify, as well as how the Scored Residue values are arrived at. Please also define the pIDDT in the legend.

      We have cited DeepMind’s 2021 methods paper, in which the outputs of AlphaFold are explained in detail. We also added a short description of the pLDDT and PAE scores and the corresponding colour coding in the legends of Fig. 3 and Fig. 4, respectively.

      From figure 3 legend:

      ‘(B) Cartoon representation showing two orientations of the trypanosome CPC, coloured by protein on the left (Aurora BAUK1: crimson, INCENPCPC1: green, CPC2: cyan, KIN-A: magenta, and KIN-B: yellow) or according to their pLDDT values on the right, assembled from AlphaFold2 predictions shown in Figure S3. The pLDDT score is a per-residue estimate of the confidence in the AlphaFold prediction on a scale from 0 – 100. pLDDT > 70 (blue, cyan) indicates a reasonable accuracy of the model, while pLDDT < 50 (red) indicates a low accuracy and often reflects disordered regions of the protein (Jumper et al., 2021). BS3 crosslinks in (B) were mapped onto the model using PyXlinkViewer (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å) (Schiffrin et al., 2020).’

      From Figure 4 legend:

      ‘(G) AlphaFold2 model of the KKT7 – KKT8 complex, coloured by protein (KKT71-261: green, KKT8: blue, KKT12: pink, KKT9: cyan and KKT11: orange) (left) and by pLDDT (center). BS3 crosslinks in (H) were mapped onto the model using PyXlinkViewer (Schiffrin et al., 2020) (blue = distance constraints satisfied, red = distance constraints violated, Cα-Cα Euclidean distance threshold = 30 Å). Right: Predicted Aligned Error (PAE) plot of model shown on the left (rank_2). The colour indicates AlphaFold’s expected position error (blue = low, red = high) at the residue on the x axis if the predicted and true structures were aligned on the residue on the y axis (Jumper et al., 2021).’

      (16) Fig 6 legend - Line 730 should say (F) not (C).

      Thank you for pointing out this typo.

      (17) Fig S1A - a key is missing for the colours. Fig S1B/C - cell outlines or a brightfield image are really needed here - see earlier comment. Fig S1D - there doesn't seem to be a method for how this tree was generated.

      See above response in public review regarding Fig. S1A and S1B/C. The tree in Fig. S1D is based on (Butenko et al., 2020).

      (18) Fig S2: A: how was protein knockdown validated (especially for CPC2 where there was little obvious phenotype)? Fig S2B: the y-axis should read proportion of cells, not percentage. Fig S2E - NLS should be labelled.

      Thank you for pointing out the mistake in the labelling.

      (19) Fig S3: PAE plots should be labelled with protein names, not A-E. Similarly, the pIDDT plots should be labelled as in Fig 4I.

      We have corrected the labelling in Fig. S3.

      (20) Fig S5A-D - cell cycle stage labels are missing from images.

      Thank you for pointing out the missing cell cycle stage labels.

      Addition by editor:

      In line 126 the statement that KIN-A and KIN-B "associate with Aurora-AUK1, INCENP-CPC1 and CPC2 throughout the cell cycle" seems too strong. There is no direct evidence for this. Please re-phrase as "likely associate" or "suggest... that ... may...".

      We have modified that sentence according to the editor’s suggestion.

      References:

      Akiyoshi, B., and K. Gull. 2014. Discovery of Unconventional Kinetochores in Kinetoplastids. Cell. 156. doi:10.1016/j.cell.2014.01.049.

      Butenko, A., F.R. Opperdoes, O. Flegontova, A. Horák, V. Hampl, P. Keeling, R.M.R. Gawryluk, D. Tikhonenkov, P. Flegontov, and J. Lukeš. 2020. Evolution of metabolic capabilities and molecular features of diplonemids, kinetoplastids, and euglenids. BMC Biology 2020 18:1. 18:1–28. doi:10.1186/S12915-020-0754-1.

      Cormier, A., D.G. Drubin, and G. Barnes. 2013. Phosphorylation regulates kinase and microtubule binding activities of the budding yeast chromosomal passenger complex in vitro. J Biol Chem. 288:23203–23211. doi:10.1074/JBC.M113.491480. Endow, S.A., F.J. Kull, and H. Liu. 2010. Kinesins at a glance. J Cell Sci. 123:3420. doi:10.1242/JCS.064113.

      Fink, S., K. Turnbull, A. Desai, and C.S. Campbell. 2017. An engineered minimal chromosomal passenger complex reveals a role for INCENP/Sli15 spindle association in chromosome biorientation. J Cell Biol. 216:911–923. doi:10.1083/JCB.201609123.

      van der Horst, A., M.J.M. Vromans, K. Bouwman, M.S. van der Waal, M.A. Hadders, and S.M.A. Lens. 2015. Inter-domain Cooperation in INCENP Promotes Aurora B Relocation from Centromeres to Microtubules. Cell Rep. 12:380–387. doi:10.1016/J.CELREP.2015.06.038.

      Ishii, M., and B. Akiyoshi. 2020. Characterization of unconventional kinetochore kinases KKT10/19 in Trypanosoma brucei. J Cell Sci. doi:10.1242/jcs.240978.

      Jeyaprakash, A.A., C. Basquin, U. Jayachandran, and E. Conti. 2011. Structural Basis for the Recognition of Phosphorylated Histone H3 by the Survivin Subunit of the Chromosomal Passenger Complex. Structure. 19:1625–1634. doi:10.1016/J.STR.2011.09.002.

      Jeyaprakash, A.A., U.R. Klein, D. Lindner, J. Ebert, E.A. Nigg, and E. Conti. 2007. Structure of a Survivin–Borealin–INCENP Core Complex Reveals How Chromosomal Passengers Travel Together. Cell. 131. doi:10.1016/j.cell.2007.07.045.

      Jumper, J., R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S.A.A. Kohl, A.J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A.W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 2021 596:7873. 596:583–589. doi:10.1038/s41586-021-03819-2.

      Kang, J.S., I.M. Cheeseman, G. Kallstrom, S. Velmurugan, G. Barnes, and C.S.M. Chan. 2001. Functional cooperation of Dam1, Ipl1, and the inner centromere protein (INCENP)-related protein Sli15 during chromosome segregation. J Cell Biol. 155:763–774. doi:10.1083/JCB.200105029.

      Klein, U.R., E.A. Nigg, and U. Gruneberg. 2006. Centromere targeting of the chromosomal passenger complex requires a ternary subcomplex of Borealin, Survivin, and the N-terminal domain of INCENP. Mol Biol Cell. 17:2547–2558. doi:10.1091/MBC.E05-12-1133.

      Komaki, S., E.C. Tromer, G. De Jaeger, N. De Winne, M. Heese, and A. Schnittger. 2022. Molecular convergence by differential domain acquisition is a hallmark of chromosomal passenger complex evolution. Proc Natl Acad Sci U S A. 119. doi:10.1073/PNAS.2200108119/-/DCSUPPLEMENTAL.

      Li, Z. 2012. Regulation of the Cell Division Cycle in Trypanosoma brucei. Eukaryot Cell. 11:1180. doi:10.1128/EC.00145-12.

      Li, Z., J.H. Lee, F. Chu, A.L. Burlingame, A. Günzl, and C.C. Wang. 2008. Identification of a Novel Chromosomal Passenger Complex and Its Unique Localization during Cytokinesis in Trypanosoma brucei. PLoS One. 3. doi:10.1371/journal.pone.0002354.

      Mackay, A.M., D.M. Eckley, C. Chue, and W.C. Earnshaw. 1993. Molecular analysis of the INCENPs (inner centromere proteins): separate domains are required for association with microtubules during interphase and with the central spindle during anaphase. J Cell Biol. 123:373–385. doi:10.1083/JCB.123.2.373.

      Marchetti, M.A., C. Tschudi, H. Kwon, S.L. Wolin, and E. Ullu. 2000. Import of proteins into the trypanosome nucleus and their distribution at karyokinesis. J Cell Sci. 113 ( Pt 5):899–906. doi:10.1242/JCS.113.5.899.

      Nakajima, Y., A. Cormier, R.G. Tyers, A. Pigula, Y. Peng, D.G. Drubin, and G. Barnes. 2011. Ipl1/Aurora-dependent phosphorylation of Sli15/INCENP regulates CPC-spindle interaction to ensure proper microtubule dynamics. J Cell Biol. 194:137–153. doi:10.1083/JCB.201009137.

      Noujaim, M., S. Bechstedt, M. Wieczorek, and G.J. Brouhard. 2014. Microtubules accelerate the kinase activity of Aurora-B by a reduction in dimensionality. PLoS One. 9. doi:10.1371/JOURNAL.PONE.0086786.

      Okada, Y., and N. Hirokawa. 1999. A processive single-headed motor: Kinesin superfamily protein KIF1A. Science (1979). 283:1152–1157. doi:10.1126/SCIENCE.283.5405.1152.

      Rice, S., A.W. Lin, D. Safer, C.L. Hart, N. Naber, B.O. Carragher, S.M. Cain, E. Pechatnikova, E.M. Wilson-Kubalek, M. Whittaker, E. Pate, R. Cooke, E.W. Taylor, R.A. Milligan, and R.D. Vale. 1999. A structural change in the kinesin motor protein that drives motility. Nature 1999 402:6763. 402:778–784. doi:10.1038/45483.

      Sablin, E.P., F.J. Kull, R. Cooke, R.D. Vale, and R.J. Fletterick. 1996. Crystal structure of the motor domain of the kinesin-related motor ncd. Nature 1996 380:6574. 380:555–559. doi:10.1038/380555a0.

      Samejima, K., M. Platani, M. Wolny, H. Ogawa, G. Vargiu, P.J. Knight, M. Peckham, and W.C. Earnshaw. 2015. The Inner Centromere Protein (INCENP) Coil Is a Single α-Helix (SAH) Domain That Binds Directly to Microtubules and Is Important for Chromosome Passenger Complex (CPC) Localization and Function in Mitosis. J Biol Chem. 290:21460–21472. doi:10.1074/JBC.M115.645317.

      Schiffrin, B., S.E. Radford, D.J. Brockwell, and A.N. Calabrese. 2020. PyXlinkViewer: A flexible tool for visualization of protein chemical crosslinking data within the PyMOL molecular graphics system. Protein Sci. 29:1851–1857. doi:10.1002/PRO.3902.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels.

      Strengths:

      The experiments were thorough and well designed. The results are compelling and support the main claim. The development and the use of the DrosoX two-choice assay put forward for a more quantitative and automatic/unbiased assessment for ingestion volume and preference.

      Weaknesses:

      There are a few inconsistencies with respect the the exact role by which IR60b neurons limit high salt consumption and the contribution of external (labellar) high-salt sensors in regulating high salt consumption. These weaknesses do not significantly impact the main conclusion, however.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Sang et al. set out to identify gustatory receptors involved in salt taste sensation in Drosophila melanogaster. In a two-choice assay screen of 30 Ir mutants, they identified that Ir60b is required for avoidance of high salt. In addition, they demonstrate that activation of Ir60b neurons is sufficient for gustatory avoidance using either optogenetics or TRPV1 to specifically activate Ir60b neurons. Then, using tip recordings of labellar gustatory sensory neurons and proboscis extension response behavioral assays in Ir60b mutants, the authors demonstrate that Ir60b is dispensable for labellar taste neuron responses to high salt and the suppression of proboscis extension by high salt. Since external gustatory receptor neurons (GRNs) are not implicated, they look at Poxn mutants, which lack external chemosensory sensilla but have intact pharyngeal GRNs. High salt avoidance was reduced in Poxn mutants but was still greater than Ir60b mutants, suggesting that pharyngeal gustatory sensory neurons alone are sufficient for high salt avoidance. The authors use a new behavioral assay to demonstrate that Ir60b mutants ingest a higher volume of sucrose mixed with high salt than control flies do, suggesting that the action of Ir60b is to limit high salt ingestion. Finally, they identify that Ir60b functions within a single pair of gustatory sensory neurons in the pharynx, and that these neurons respond to high salt but not bitter tastants.

      Strengths:

      A great strength of this paper is that it rigorously corroborates previously published studies that have implicated specific Irs in salt taste sensation. It further introduces a new role for Ir60b in limiting high salt ingestion, demonstrating that Ir60b is necessary and sufficient for high salt avoidance and convincingly tracing the action of Ir60b to a particular subset of gustatory receptor neurons. Overall, the authors have achieved their aim by identifying a new gustatory receptor involved in limiting high salt ingestion. They use rigorous genetic, imaging, and behavioral studies to achieve this aim, often confirming a given conclusion with multiple experimental approaches. They have further done a great service to the field by replicating published studies and corroborating the roles of a number of other Irs in salt taste sensation. An aspect of this study that merits further investigation is how the same gustatory receptor neurons and Ir in the pharynx can be responsible for regulating the ingestion of both appetitive (sugar) and aversive tastants (high salt).

      A previous report published in eLife from John Carlson’s lab (Joseph et al, 2017) showed that the Ir60b GRN in the pharynx responds to sucrose resulting in sucrose repulsion. Thus, stimulation of this pharyngeal GRN results in gustatory avoidance only, not both attraction and avoidance. (lines 205-207)

      Weaknesses:

      There are several weaknesses that, if addressed, could greatly improve this work.

      (1) The authors combine the results and discussion but provide a very limited interpretation of their results. More discussion of the results would help to highlight what this paper contributes, how the authors interpret their results, and areas for future study.

      We agree and have now separated the Results and Discussion, and in so doing have greatly expanded discussion of the results.

      (2) The authors rename previously studied populations of labellar GRNs to arbitrary letters, which makes it difficult to understand the experiments and results in some places. These GRN populations would be better referred to according to the gustatory receptors they are known to express.

      One of the corresponding authors (Craig Montell) introduced this alternative GRN nomenclature in a review in 2021: Montell, C. (Drosophila sensory receptors—a set of molecular Swiss Army Knives. Genetics 217, 1-34) (Montell, 2021). We are not fans of referring to different classes of GRNs based on the receptors that they express since it is not obvious which receptors to use. For example, the GRNs that respond to bitter compounds all express multiple GR co-receptors. The same is true for the GRNs that respond to sugars. The former system of referring to GRNs simply as sugar, bitter, salt and water GRNs is also not ideal since the repertoire of chemicals that stimulates each class is complex. For example, the Class A GRNs (formerly sugar GRNs) are also activated by low Na+, glycerol, fatty acids, and acetic acid, while the B GRNs (former bitter GRNs) are also stimulated by high Na+, acids, polyamines, and tryptophan. In addition, there are five classes of GRNs. At first mention of the Class A—E GRNs, we mention the most commonly used former nomenclature of sugar, bitter, salt and water GRNs. In addition, for added clarify, we now also include a mention of one of the receptors that mark each class. (lines 51-59)

      (3) The conclusion that GRNs responsible for high salt aversion may be inhibited by those that function in low salt attraction is not well substantiated. This conclusion seems to come from the fact that overexpression of Ir60b in salt attraction and salt aversion sensory neurons still leads to salt aversion, but there need not be any interaction between these two types of sensory neurons if they act oppositely on downstream circuits.

      We did not make this claim.

      (4) The authors rely heavily on a new Droso-X behavioral apparatus that is not sufficiently described here or in the previous paper the authors cite. This greatly limits the reader's ability to interpret the results.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      Reviewer #3 (Public Review):

      Summary:

      Sang et al. successfully demonstrate that a set of single sensory neurons in the pharynx of Drosophila promotes avoidance of food with high salt concentrations, complementing previous findings on Ir7c neurons with an additional internal sensing mechanism. The experiments are well-conducted and presented, convincingly supporting their important findings and extending the understanding of internal sensing mechanisms. However, a few suggestions could enhance the clarity of the work.

      Strengths:

      The authors convincingly demonstrate the avoidance phenotype using different behavioral assays, thus comprehensively analyzing different aspects of the behavior. The experiments are straightforward and well-contextualized within existing literature.

      Weaknesses:

      Discussion

      While the authors effectively relate their findings to existing literature, expanding the discussion on the surprising role of Ir60b neurons in both sucrose and salt rejection would add depth. Additionally, considering Yang et al. 2021's (https://doi.org/10.1016/j.celrep.2021.109983) result that Ir60b neurons activate feeding-promoting IN1 neurons, the authors should discuss how this aligns with their own findings.

      Yang et al. demonstrated that the activation of Ir60b neurons can trigger the activation of IN1 neurons akin to pharyngeal multimodal (PM) neurons, potentially leading to enhanced feeding (Yang et al, 2021). However, our research reveals a specific pattern of activation for Ir60b neurons. Instead of being generalists, they are specialized for certain sugars, such as sucrose and high salt. Consequently, while Ir60b GRNs activate IN1 neurons, we contend that there are other neurons in the brain responsible for inhibiting feeding. (lines 412-417)

      Lines 187: The discussion primarily focuses on taste sensillae outside the labellum, neglecting peg-type sensillae on the inner surface. Clarification on whether these pegs contribute to the described behaviors and if the Poxn mutants described also affect the pegs would strengthen the discussion.

      We added the following to the Discussion section. “We also found that the requirement for Ir60b appears to be different when performing binary liquid capillary assay (DrosoX), versus solid food binary feeding assays. When we employed the DrosoX assay to test mutants that were missing salt aversive GRNs in labellar bristles but still retained functional Ir60b GRNs, the flies behaved the same as wild-type flies (e.g. Figure 3J and 3L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al, 2015), displayed repulsion to high salt food that was intermediate between control flies and the Ir60b mutant (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), and these hairless taste organs become exposed to food only when the labial palps open. We suggest that there are high-salt sensitive GRNs associated with taste pegs, which are accessed when the labellum contacts a solid substrate, but not when flies drink from the capillaries used in DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B).”. (lines 430-444)

      In line 261 the authors state: "We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, similar to what was observed with Ir56b 8; however, this did not generate a salt receptor (Figures S6A)"

      An obvious explanation would be that these neurons are missing the identified necessary co-receptors Ir76b and Ir25a. The authors should discuss here if the Gr33a neurons they target also express these co-receptors, if yes this would strengthen their conclusion that an additional receptor might be missing.

      We clarified this point in the Discussion section as follows, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al, 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al, 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites”. (lines 464-477)

      Methods

      The description of the Droso-X assay seems to be missing some details. Currently, it is not obvious how the two-choice is established. Only one capillary is mentioned, I assume there were two used? Also, the meaning of the variables used in the equation (DrosoX and DrosoXD) are not explained.

      We expanded the description of the apparatus in the Droso-X assay section of the Materials and Methods. (lines 588-631)

      The description of the ex-vivo calcium imaging prep. is unclear in several points:

      (1) It is lacking information on how the stimulus was applied (was it manually washed in? If so how was it removed?).

      We expanded the description of the apparatus in the ex vivo calcium imaging section of the Materials and Methods. (lines 682-716)

      (2) The authors write: "A mild swallow deep well was prepared for sample fixation." I assume they might have wanted to describe a "shallow well"?

      We deleted the word “deep.”.(line 691)

      (3) "...followed by excising a small portion of the labellum in the extended proboscis region to facilitate tastant access to pharyngeal organs." It is not clear to me how one would excise a small portion of the labellum, the labellum depicts the most distal part of the proboscis that carries the sensillae and pegs. Did the authors mean to say that they cut a part of the proboscis?

      Yes. We changed the sentence to “…followed by excising a small portion of the extended proboscis to facilitate tastant access to the pharyngeal organs.”.(lines 693)-695

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In this manuscript, Sang et al. proposed a pair of IR60b-expressing pharyngeal neurons in Drosophila use IR25a, IR76b, and IR60b channels to detect high Na+ and limit its consumption. Some of the key findings that support this thesis are: 1) animals that lacked any one of these channels - or with their IR60b-expressing neurons selectively silenced - showed much reduced rejection of high Na+, but restored rejection when these channels were reintroduced back in the IR60b neurons; 2) animals with TRPV artificially expressed in their IR60b neurons rejected capsaicin-laced food whereas WT did not; 3) IR60b-expressing neurons exhibited increased Ca2+ influx in response to high Na+ and such response went away when animals lacked any of the three channels. In general, I find the collective evidence presented by the authors convincing. But I feel the MS can benefit from having a discussion session and a few simple experiments. Below I listed some inconsistencies I hope the authors can address or at least discuss.

      We have now added a Discussion section, and expanded the discussion.

      (1) The role of IR60b neurons on suppressing PER appeared inconsistent. On the one hand, optogenetic activation of these neurons suppressed PER (Fig 1D), on the other hand, IR60b mutants were as competent to suppress PER in response to high salt as WT (Fig 2G). Are pharyngeal neurons expected to modulate PER? It might be worth including a retinal-free or genotype control to ascertain the PER suppression exhibited by IR60b>CsChrimson is genuine.

      Please note that Figure 2G is now Figure 2H.

      Our interpretation is that activation of aversive GRNs by high salt either in labellar bristles or in the pharynx is sufficient to inhibit repulsion to high salt. Consistent with this conclusion, optogenetic activation of Ir60b GRNs, which are specific to the pharynx, is sufficient to reduce the PER to sucrose containing food (Figure 1D). However, mutation of Ir60b has no impact on the PER to sucrose plus high (300 mM) NaCl since the high-salt activated GRNs in labellar bristles are not impaired by the Ir60b mutation. In contrast, Ir25a and Ir76b are required in both labellar bristles and in the pharynx to reject high salt. As a consequence, mutation of either Ir25a or Ir76b impairs the repulsion to high salt. Thus, there is no inconsistency between the optogenetics and PER results. We clarified this point in the Discussion section. In terms of controls for IR60b>CsChrimson, we show that UAS-CsChrimson alone or UAS-CsChrimson in combination with the Gr5a driver has no impact on the PER (Figure 1D). In addition, we now include a retinal free control (Figure 1D). These findings provide the key genetic controls and are described in the Results section. (lines 167-170)

      (2) The role of labellar high-salt sensors in regulating salt intake appeared inconsistent. On the one hand, they appeared to have a role in limiting high salt consumption because poxn mutants were significantly more receptive to high salt than WT (Fig. 2J). On the other hand, selectively restoring IR76b or IR25a in only the IR60b neurons in these mutants - thus leaving the labellar salt sensors still defective - reverted the flies to behave like WT when given a choice between sucrose vs. sucrose+high salt (Fig 3J, L).

      We now offer an explanation for these seemingly conflicting results in the Discussion section. When we employed the DrosoX assay with mutants with functional Ir60b GRNs, but were missing salt aversive GRNs in labellar bristles, the flies behaved the same as control flies (e.g. Figure 3J and L). However, using solid food binary assays, Poxn mutants, which are missing labellar taste bristles but retain Ir60b GRNs (LeDue et al., 2015), display aversion high salt food intermediate between control and Ir60b mutant flies (Figure 2J). Poxn mutants retain taste pegs (LeDue et al., 2015), which are exposed to food substrates only when the labial palps open. We suggest that the taste pegs harbor high salt sensitive GRNs, and they may be exposed to solid substrates, but not to the liquid in capillary tubes used in the DrosoX assays. This explanation would also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but prefers 1 mM sucrose alone over 300 mM NaCl and 5 mM sucrose in the solid food binary assay (Figure 1B). (lines 433-444)

      (3) The behavior sensitivity of IR60b mutant to high salt again appeared somewhat inconsistent when assessed in the two different choice assays. IR60b mutant flies were indifferent to 300 mM NaCl when assayed with DrosoX (Fig 3A, B) but were clearly still sensitive to 300 mM NaCl when assayed with "regular" assay - they showed much reduced preference for 5 mM sucrose over 1 mM sucrose when the 5 mM sucrose was adulterated with 300 mM NaCl (Fig 1B).

      The explanation provided above may also account for the findings that the Ir60b mutant is indifferent to 300 mM NaCl in the DrosoX assay (Figure 3B), but not when selecting between 300 mM NaCl and 5 mM sucrose versus 1 mM sucrose in the solid food binary assay (Figure 1B). Alternatively, the different behavioral responses might be due to the variation in sucrose concentrations in each of these two assays, which employed 5 mM sucrose in the solid food binary assay, as opposed to 100 mM sucrose in the DrosoX assay. This disparity in attractive valence between these two concentrations of sucrose might consequently impact feeding amount and preference. This point is now also included in the Discussion section. (lines 441-449)

      (4) Given the IR60b neurons exhibited clear IR60b/IR25a/IR76b-dependent sucrose sensitivity, too, I am curious how the various mutant animals behave when given a choice between 100 mM sorbitol vs. 100 mM sorbitol + 300 mM NaCl, a food choice assay not complicated by the presence of sucrose. Similarly, I am curious if the Ca2+ response of IR60 neurons differs significantly when presented with 100 mM sucrose vs. when presented with 100 mM sucrose + 300 mM NaCl. In principle, the magnitude for the latter should be significantly larger than the former as animals appeared to be capable of discriminating these two choices solely relying on their IR60b neurons.

      To investigate the aversion induced by high salt in the absence of a highly attractive sugar, such as sucrose, we combined 300 mM salt with 100 mM sorbitol, which is a tasteless but nutritive sugar (Burke & Waddell, 2011; Fujita & Tanimura, 2011). Using two-way choice assays, we found that the Ir25a, Ir60b, and Ir76b mutants exhibited substantial reductions in high salt avoidance (Figure 3—figure supplement 2A). In addition, we performed DrosoX assays using 100 mM sorbitol alone, or sorbitol mixed with 300 mM NaCl. Sorbitol alone provoked less feeding than sucrose since it is a tasteless sugar (Figure 3—figure supplement 2B and C). Nevertheless, addition of high salt to the sorbitol reduced food consumption (Figure 3—figure supplement 2B and C). (lines 300-308)

      We also conducted a comparative analysis of the Ca2+ responses within the Ir60b GRN, examining its reaction to various stimuli, including 100 mM sucrose alone, 300 mM NaCl alone, and a combination of 100 mM sucrose and 300 mM NaCl. We found that the Ca2+ responses were significantly higher when we exposed the Ir60b GRN to 300 mM NaCl alone, compared with the response to 100 mM sucrose alone (Figure 4—figure supplement 1D). However, the GCaMP6f responses was not higher when we presented 100 mM sucrose with 300 mM NaCl, compared with the response to 300 mM NaCl alone (Figure 4—figure supplement 1D). (lines 360-367)

      Minor issues

      (1) The labels of sucrose concentration on Figure 2D were flipped.

      This has been corrected.

      (2) The phrasing of the sentence that begins in line 196 (i.e., "This suggests the internal sensor ...") is not as optimal.

      We changed the sentence to, “We found that the aversive behavior to high salt was reduced in the Poxn mutants relative to the control (Figure 2J), consistent with previous studies demonstrating roles for GRNs in labellar bristles in high salt avoidance (Jaeger et al, 2018; McDowell et al, 2022; Zhang et al, 2013).”. (lines 217-219)

      (3) In Line 231, I am not sure why the authors think ectopic expressing IR60b in labellar neurons would allow them to become activated by Na+. It seems highly unlikely to me, especially given IR60b also plays a role in sensing sugar.

      We added the following paragraph to the Discussion addressing this point, “An open question is the subunit composition of the pharyngeal high Na+ receptor, and whether the sucrose/glucose and Na+ receptors in the Ir60b GRN are the same or distinct. Our results indicate that the high salt sensor in the Ir60b GRN includes IR25a, IR60b and IR76b since all three IRs are required in the pharynx for sensing high levels of NaCl. I-type sensilla do not elicit a high salt response, and we were unable to induce salt activation in I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. This indicates that IR25a, IR60b and IR76b are insufficient for sensing high Na+. The inability to confer a salt response by ectopic expression of Ir60b was not due to absence of Ir25a and Ir76b in Gr33a GRNs since Gr33a and Gr66a are co-expressed (Moon et al., 2009), and Gr66a GRNs express Ir25a and Ir76b (Li et al., 2023). Thus, the high salt receptor in Ir60b GRNs appears to require an additional subunit. Given that Na+ and sugars are structurally unrelated, we suggest that the Na+ and sucrose/glucose receptors do not include the identical set of subunits, or that that they activate a common receptor through disparate sites.”. (lines 464-477)

      Reviewer #2 (Recommendations For The Authors):

      Line 41, acutely excessive salt ingestion can lead to death, not just health issues

      We now state that, “consumption of excessive salt can contribute to various health issues in mammals, including hypertension, osteoporosis, gastrointestinal cancer, autoimmune diseases, and can lead to death.”. (lines 41-43)

      Line 46, delete the comma after flies

      Done. (line 47)

      Lines 51-56: This description is unnecessarily confusing and does not cite proper sources. Renaming these GRNs arbitrarily can only create confusion, plus this description lacks nuance. If E GRNs are Ir94e positive, this description is out of date. Furthermore, If D GRNs are ppk23 and Gr66a positive then they will respond to both bitter and high salt.

      Papers to consult: https://elifesciences.org/articles/37167 10.1016/j.cell.2023.04.038

      We have now added citations. We prefer the A—E nomenclature, which was introduced in a 2021 Genetics review by one of the authors of this manuscript (Montell) (Montell, 2021) since naming different classes of GRNs on the basis of markers or as sweet, bitter, salt and water GRNs is misleading and an oversimplification. We cite the Genetics 2021 review, and for added clarity include both types of former names (markers and sweet, bitter, salt and water). Class D GRNs are not marked by Gr66a. The eLife reference cited above provided the initial rationale for stating that Class E GRNs are marked by Ir94e and activated by low salt. According to the Taisz et al reference (Cell 2023), the Class E GRNs, which are marked by Ir94e, are also activated by pheromones, which we now mention (Taisz et al, 2023). (lines 51-59)

      Line 62, E GRNs are not required for low salt behaviors

      We do not state that E GRNs are required for low salt behaviors, only that they sense low Na+ levels. (line 58)

      Line 70-81 - Great deal of emphasis on labellar GRNs but then no mention of how pharyngeal GRNs fit into categories A-E

      We devote the following paragraph to pharyngeal GRNs. We do not mention how they fit in with the A—E categories because it is not clear.

      “In addition to the labellum and taste bristles on other external structures, such as the tarsi, fruit flies are endowed with hairless sensilla on the surface of the labellum (taste pegs), and three internal taste organs lining the pharynx, the labral sense organ (LSO), the ventral cibarial sense organ (VCSO), and the dorsal cibarial sense organ (DCSO), which also function in the decision to keep feeding or reject a food (Chen & Dahanukar, 2017, 2020; LeDue et al., 2015; Nayak & Singh, 1983; Stocker, 1994). A pair of GRNs in the LSO express a member of the gustatory receptor family, Gr2a, and knockdown of Gr2a in these GRNs impairs the avoidance to slightly aversive levels of Na+ (Kim et al, 2017). Pharyngeal GRNs also promote the aversion to bitter tastants, Cu2+, L-canavanine, and bacterial lipopolysaccharides (Choi et al, 2016; Joseph et al., 2017; Soldano et al, 2016; Xiao et al, 2022). Other pharyngeal GRNs are stimulated by sugars and contribute to sugar consumption (Chen & Dahanukar, 2017; Chen et al, 2021; LeDue et al., 2015). Remarkably, a pharyngeal GRN in each of the two LSOs functions in the rejection rather the acceptance of sucrose (Joseph et al., 2017).”. (lines 74-89)

      Line 89, aversive --> aversion

      We changed this part.

      Line 90, gain of aversion capsaicin avoidance suggests they are sufficient for avoidance, not essential for avoidance.

      We changed “essential” to “sufficient.”. (line 100)

      Line 104, what are you recording from here? Labellar or pharyngeal GRNs

      We added “S-type and L-type sensilla” to the sentence. (line 119)

      Line 107, How are A GRNS marked with tdTomato? It is important to mention how you are defining A GRNs.

      We modified the sentence as follows: “Using Ir56b-GAL4 to drive UAS-mCD8::GFP, we also confirmed that the reporter was restricted to a subset of Class A GRNs, which were marked with LexAop-tdTomato expressed under the control of the Gr64f-LexA (Figure 1—figure supplement 1D—F).”. (lines 120-123)

      Line 124, should read "concentrated as sea water."

      We made the change. (line 142)

      Line 125, I am not sure what is meant by "alarm neurons"

      We changed “additional pain or alarm neurons” to “nociceptive neurons.”. (line 144)

      Line 141, Are you definitely A GRNs as only labellar GRNs, i.e. the Gr5a-GAL4 pattern with labellar plus few pharyngeal GRNs? Or are the defining it as Gr64f-GAL4 (i.e. labellar plus many pharyngeal GRNs)

      We refer to the Class A—E GRNs as labellar GRNs. Therefore, in this instance, we removed the reference to A GRNs and B GRNs, and simply mention the drivers that we used (Gr5a-GAL4 and Gr66a-GAL4) to express UAS-CsChrimson. The modified sentence is, “As controls we drove UAS-CsChrimson under control of either the Gr5a-GAL4 or the Gr66a-GAL4.”. (lines 51-59, 160-161)

      Line 180, labellar hairs--> labellar taste bristles

      We made the change. (line 204)

      Line 190, possess only --> only possess

      We made the change. (line 216)

      Line 202, Should this read increased?

      Yes. We changed “reduced” to “increased.”. (line 225)

      Line 206, The information provided here and in reference 47 was not sufficient for me to understand how the Droso-X system works and whether it has been validated. Better diagrams and much more description is required for the reader to understand this system and assess its validity

      We now explain that the DrosoX “system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary is then monitored automatically over the course of 6 hours and recorded on a computer.”. (lines 238-243)

      Line 218-219, It would be helpful to expand on this to explain how the previous paper detected no difference. Is this because the contact time with the food is the same but the rate of ingestion is slower?

      Yes. This is correct. We now clarify this point by stating that, “In a prior study, it was observed that the repulsion to high salt exhibited by the Ir60b mutant was indistinguishable from wild-type (Joseph et al., 2017). Specifically, the flies were presented with drop of liquid (sucrose plus salt) at the end of a probe, and the Ir60b mutant flies fed on the food for the same period of time as control flies (Joseph et al., 2017). However, this assay did not discern whether or not the volume of the high salt-containing food consumed by the Ir60b mutant flies was reduced relative to control flies. Therefore, to assess the volume of food ingested, we used the DrosoX system, which we recently developed (Figure 3—figure supplement 1A) (Sang et al, 2021). This system consists of a set of five separately housed flies, each of which is exposed to two capillary tubes with different liquid food options. One capillary contained 100 mM sucrose and the other contained 100 mM sucrose mixed with 300 mM NaCl. The volume of food consumed from each capillary was then monitored automatically over the course of 6 hours and recorded on a computer. We found that control flies consuming approximately four times more of the 100 mM sucrose than the sucrose mixed with 300 mM NaCl (Figure 3A). In contrast, the Ir25a, Ir60b, and Ir76b mutants consumed approximately two-fold less of the sucrose plus salt (Figure 3A). Consequently, they ingested similar amounts of the two food options (Figure 3B; ingestion index). Thus, while the Ir60b mutant and control flies spend similar amounts of time in contact with high salt-containing food when it is the only option (Joseph et al., 2017), the mutant consumes considerably less of the high salt food when presented with a sucrose option without salt.”. (lines 226-251)

      Lines 231-235, Is this evidence for this, that Ir60b expression in the Ir25a or Ir76b pattern will induce high salt responses in the labellum? You should elaborate on this to clearly state what you mean rather than implying it. I do not think that overexpression of one Ir is enough evidence for this sweeping conclusion.

      We agree. We eliminated this point. (lines 227-232)

      Lines 261-263, Please elaborate here, how did you target the I-type sensilla and where are these neurons? So they already express Ir76b and Ir25a?

      We now explain in the Results that, “We attempted to induce salt activation in the I-type sensilla by ectopically expressing Ir60b, under control of the Gr33a-GAL4. Gr33a is co-expressed with Gr66a (Moon et al., 2009), which has been shown to be co-expressed Ir25a and Ir76b (Li et al., 2023). When we performed tip recordings from I7 and I10 sensilla, we did not observe a significant increase in action potentials in response to 300 mM NaCl (Figure 4—figure supplement 1A), indicating that ectopic expression of Ir60b in combination with Ir25a and Ir76b is not sufficient to generate a high salt receptor.”. (lines 324-330)

      Lines 300-303, The discussion needs to be greatly expanded. What is the proposed mechanism by which the same neurons/receptors can inhibit sucrose and high salt feeding? What is the author's interpretation of what this study adds to our understanding of taste aversion?

      We have now added a Discussion section and greatly expanded the discussion.

      Reviewer #3 (Recommendations For The Authors):

      In line 73 there is a typo in "esophagus"

      We changed this part.

      In line 331, the use of a mixture of sucrose and "saponin" seems to be a mistake; "NaCl" is likely intended.

      We made the correction. (lines 546 and 640)

      On several occasions, the authors refer to the pharynx as a taste organ (for example 1st sentence of the abstract). I am not sure this is correct, the actual pharyngeal taste organs are the LSO, DSCO, and VSCO which are located in the pharynx.

      We made the corrections. (lines 24, 90, 92, 93, and 356)

      In line 155 the authors refer to Ir25a and Ir76b as "broadly tuned". I think it is not correct to refer to co-receptors this way, I'd suggest to just call them co-receptors.

      We made the correction. (lines 177-178)

      In line 182, stating "Gr2a is also expressed in the proboscis" is unclear. Clarify whether it refers to sensillae, pharyngeal taste organs, etc.

      We clarified it refers to pharyngeal taste organs. (lines 206-207)

      Line 253: "These finding imply that all three Irs are coexpressed in the pharynx." "The pharynx" is very unspecific, did the authors mean to say "the same neuron"?

      We now clarify by saying “in the Ir60b GRN in the pharynx.”. (line 317)

      Figures & Legends

      I found it confusing that the same color scale is being reused for different panels with different meanings repeatedly and in inconsistent ways. For example in Figure 2, red and blue are being used for Ir25a² mutants, while blue is also being used for Gr64f-Gal4 and S type sensilla. It is also not easily visible nor mentioned in the caption which of the 3 color scales presented belong to which panels.

      We modified the colors in the figures so that they are used in a consistent way. We now also define the colors in the legends.

      In Figure 2 F-I, indicating the stimulus sequence in each panel would enhance clarity. The color scale in Figure 3 could benefit from explicit explanations of different shades in the caption for easier interpretation.

      For example: "The ingestion of (a, dark color) 100 mM sucrose alone and (b, light color) in combination with 300 mM"

      We made the suggested modification.

      In Figure 4a the authors highlight that Ir76b and Ir25a label 2 neurons in the LSO. Did the imaging in 4c also capture the second cell, and if so did it respond to their stimulation?

      No, the focal plane differs, and the signal in Figure 4C is considerably weaker compared to the immunohistochemistry shown in Figure 4A. Notably, the other neuron did not exhibit a response to NaCl.

      In Figure 4f a legend for the color scale is missing, or the color might not be necessary at all. Also, the asterisks seem to be shifted to the right.

      We fixed the shifted asterisks and eliminated the color.

      Figure 4i is mislabeled 4f

      We made the correction.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Summary __

      Geng et al. explore the molecular mechanisms underlying the role of KIF1C in RNA transport, focusing on how it interacts with RNA. KIF1C is shown to form dynamic puncta when overexpressed in COS-7 cells that do not appear to colocalise with organelle markers. An IDR in the tail of the kinesin is necessary and sufficient for the formation of these structures and FRAP experiments show that they can exchange their contents with proteins in the cytosol and that their formation can can be reversibly modulated by hypotonic shock, consistent with LLPS. In vitro, the IDR and flanking regions can undergo phase separation at physiologically relevant concentrations and salt conditions. In cells, KIF1C puncta enrich for RNAs and support their transport, and depletion of RNA modulates KIF1C LLPS properties. A model is proposed whereby KIF1C mediated RNA transport to the cell periphery promotes the formation of a protein-RNA condensate that may act to fine tune local RNA activity.

      __Major comments __

      In general, the claims made here are well-supported by the data. However, I think that some exploration of the extent of LLPS at different KIF1C expression levels in cells is important but missing. The authors carefully estimate the endogenous concentration of KIF1C in COS-7 cells (at around 25 nm), but is isn't clear how this compares to that observed in transient transfection experiments. Although this is partly addressed in the vitro assays, I am still left with some questions over the extent of this phenomenon in a cellular context. Can the authors provide some experimental evidence to support the proposition that LLPS occurs (perhaps in a more localised fashion?, as Fig.9) at lower KIF1C expression levels? One way to address this might be a GFP-knock-in (although how feasible this is may depend on the genomic context), alternatively, the authors could generate cell lines that express KIF1C-GFP from a very weak promoter, demonstrate LLPS using their established assays, show that this is comparable to endogenous expression.

      Response: We thank the reviewer for this suggestion. We have carried out additional experiments to explore the extent of KIF1C LLPS at endogenous levels. We used antibody against KIF1C to stain WT and KIF1C knockout (KO) cells. Although the antibody shows a high background of non-specific signal in the cytoplasm and nucleoplasm of both WT and KO cells, we were able to observe small puncta of KIF1C at the periphery of WT but not KO cells (new Figure 8). This finding supports our hypothesis that endogenous KIF1C undergoes LLPS upon reaching a high local concentration at the periphery of cells. Two lines of evidence support that these puncta of endogenous KIF1F protein are RNA-containing biomolecular condensates formed by LLPS (new Figure 8). First, these small puncta of endogenous KIF1C incorporate RAB13 mRNA, suggesting that they are RNA granules. Second, the puncta do not form in cells stably expressing KIF1C DIDR at near-endogenous levels.

      Minor comments

      Lines 107-109 and Figure 1B on localisation of other kinesin-3s. The authors state that they localise to certain organelle but don't show co-staining for those organelles.

      Response: The localization of other kinesin-3s to certain organelles has been shown in the cited literature. In response to the reviewer's request, we now verify these findings by staining cells expressing the other kinesin-3s for specific organelles (new Figure S1 A).

      Lines 172-183 and Figure 3. Evidence is provided through FRAP experiments that KIF1C puncta exchange with the cytosolic pool. However, the extent of recovery appears to saturate at Response: We agree that the data suggest the existence of an immobile pool of KIF1C within the condensates. We have added this information to the main text (lines 178-182). We note that these findings are consistent with recent studies demonstrating membrane-less organelles with at least partially solid-like properties, including nucleoli and stress granules as well as microtubule associated proteins (see references, reviewed in Van Treeck & Parker 2019).

      Line 238 - Fig. S5C is cited as data on endogenous concentration of KIF1C - this should be Fig. S6C.

      Response: Thank you. We have corrected this (now Fig S8 C).

      Line 331-332 - I did not fully follow the logic here the RNAse A injection experiment supports the idea that KIF1C interaction with RNA is sequence selective. Could the authors expand on this.

      Response: We thank the reviewer for this comment. We have rewritten the text (lines 235-238, 246-248).

      __Reviewer #1 (Significance (Required)): __

      This study introduces a new and exciting concept to motor protein biology: that some cytoskeletal motors and motor-cargo complexes can undergo phase separation, and that this is important for their function. The experiments are logical, progressive, and form a clear and compelling case. The main limitation is that demonstration of LLPS in cells is limited to over-expressed protein. Some exploration/demonstration of LLPS properties of KIF1C in cells at near to endogenous expression levels would enhance the study.

      The work should be of interest to a broad range of readers, from the cytoskeletal motor community, those interested in mRNA regulation, as well as scientists studying phase separation more generally.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This paper investigates mRNA transport by the kinesin Kif1C and tests the hypothesis that liquid condensation of the disordered C terminal region is important for mRNA recruitment. It is based on prior work from other labs showing that Kif1C recruits and transports a set of mRNAs to the periphery of cells. The mechanism of the KifC1-mRNA interaction was not investigated in the prior work, so the proposal that a liquid condensate is involved is novel. It is also topical, since there is intense current interest in transport and regulation of mRNAs by condensate-mediated mechanisms. The most useful part of this paper to the field may be the identification of IDR2 as required for mRNA binding in Fig 7.

      __Major comments __

      A major concern is reliance on expression of tagged KifC1 in Cos cells in several figures. The expression level in these experimental probably far exceeds normal, though this comparison is not reported. It is possibly justified to use over-expression to reveal a condensate mechanism, but it is concerning and the authors needs to strongly qualify their conclusions. One way to moderate this concern would be to examine condensation as a function of expression level.

      Response: We thank the reviewer for this suggestion. We have carried out additional experiments to explore the extent of KIF1C LLPS at endogenous levels. We used antibody against KIF1C to stain WT and KIF1C knockout (KO) cells. Although the antibody shows a high background of non-specific signal in the cytoplasm and nucleoplasm of both WT and KO cells, we were able to observe small puncta of KIF1C at the periphery of WT but not KO cells (new Figure 8). This finding supports our hypothesis that endogenous KIF1C undergoes LLPS upon reaching a high local concentration at the periphery of cells. Two lines of evidence support that these puncta of endogenous KIF1F protein are RNA-containing biomolecular condensates formed by LLPS (new Figure 8). First, these small puncta of endogenous KIF1C incorporate RAB13 mRNA, suggesting that they are RNA granules. Second, the puncta do not form in cells stably expressing KIF1C DIDR at near-endogenous levels.

      Another significant concern is that the biochemical reconstitution figure tests protein alone, not protein + RNA. Disordered RNA binding proteins usually phase separate better in the presence of RNA. The best reconstitution papers evaluate specificity of RNA recruitment to condensates. Specificity testing in a reconstituted system may not be required for a first paper, but testing the effect of some kind of RNA seems important.

      Response: The purified CC4+IDR and IDR constructs form condensates at low mM concentrations and in the absence of RNA or crowding agents, thus we did not test whether they would phase separate better in the presence of RNA. In response to the reviewer's comments, we now evaluate the specificity of RNA recruitment to the KIF1C condensates. We utilized the purified CC4+IDR protein and added the same GU-rich and polyA RNAs used in cells (now Fig 4 B) at different concentrations. Interestingly, there is selective incorporation of GU-rich oligos in condensates at low RNA concentrations, incorporation of both RNAs into condensates at medium concentrations, and an inhibition of condensate formation at high RNA concentrations (new Fig 7 E,F).

      A final concern is that the specificity of mRNA recruitment to Kif1C puncta in cells is not critically evaluated. Among endogenous mRNAs, only one (Rab13) is tested. The paper would be stronger with a second positive mRNA and a negative control mRNA.

      Response: We have now tested whether the specificity of mRNA recruitment to KIF1C puncta applies to additional mRNAs. We carried out single-molecule FISH (smFISH) experiments for two additional mRNAs. Based on the literature showing KIF1C-dependent localization of specific RNAs, we chose NET1 as a second positive mRNA and CAM1 as a negative control mRNA (Pichon et al., 2021). We first show that NET1 mRNA is mislocalized in KIF1C KO cells whereas CAM1 mRNA is not (new Fig S7 C,D). We then rescued the KO cells with FL or DIDR constructs and show that the FL protein rescues NET1 mRNA localization to the cell periphery whereas the DIDR construct does not (new Fig S7 E,F).

      __Reviewer #2 (Significance (Required)): __

      The mechanism of the KifC1-mRNA interaction was not investigated in the prior work, so the proposal that a liquid condensate is involved in novel. It is also topical, since there is intense current interest in transport and regulation of mRNAs by condensate-mediated mechanisms. The most useful part of this paper to the field may be the identification of IDR2 as required for mRNA binding.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      KIF1C is a member of the kinesin-3 family, which is responsible for fast organelle transport in cells. The cargos of KIFIC are diverse, such as Golgi apparatus, Rab6 vesicles, exon junction complex (EJC), integrins, and RNA. Mutations in the KIF1C coding sequence leads to neurodegenerative diseases, such as hereditary spastic paraparesis (HSP). In addition, as an RNA transporter, KIF1C transports various types of mRNAs (e. g., APC-dependent mRNAs, KIF1C's own mRNA) along the microtubules and clusters them to cytoplasmic protrusions to fulfill certain biological functions.

      In the current manuscript, Gen et.al., investigated the intracellular behaviors of the kinesin-3 member KIF1C. The study revealed that the KIF1C can form dynamic condensates both in cells and in vitro via an unstructured domain within the tail of the motor. KIF1C was found to also interact with synthesized RNA and other RNA granules in cells. In addition, the authors also show the KIFIC participates intracellular transport of endogenous mRNA, Rab13mRNA, identified a 47aa fragment in the KIF1C's IDR is critical for the KIF1C- Rab13mRNA interaction. Finally, as well as other prion-like proteins, the PPLS of KIF1C is buffered by the non-specific RNA pool in the cytoplasm.

      In summary, this is an interesting work in the field, and reveals novel results about the mechanisms of motor protein transport that will be broadly interesting. The assays are generally well performed, and the results and discussion are well described, but some descriptions in the article should be more rigorous and objective. The article is very long, and I think it would benefit from streamlining and reducing the number of figures to make it more accessible for non-specialists in the field.

      Here are some concerns:

      __Major: __

      Fig. 1A shows the domain organization of all kinesin-3 members, but Figure 1B only represents KIF1Bβ, KIF13B and KIF16B as controls. Generally, the KIF1Bα has the highest sequence similarity with KIF1C in kinesin-3 family (very high sequence similarity before aa 992 in KIF1C, which locates in IDR, probably contains IDR2a from Fig. S10A). In addition, both KIF1C and KIF1Bα contain a PLD from the prediction in this paper (Figure S2C). Although the authors show the phenotype of KIF1Bα in the Fig. S9, it might be better to put some descriptions up front, as readers may consider why the authors did not use KIF1Bα as a control. Actually, I kept thinking about this concern before I got to the discussion.

      Response: We thank the reviewer for this suggestion. We have moved the descriptions of KIF1Ba phenotypes to earlier in the manuscript. We show that KIF1Ba forms puncta in cells but unlike KIF1C, the KIF1Ba puncta do not colocalize with known RNA granules P-bodies or stress granules (now in Fig S5 B,C). We show that, unlike KIF1C, the KIF1Ba puncta do not incorporate GU-rich or polyA RNA (now in Fig S6 B).

      It would be better if the authors can combine the Fig. 2B and 2C, since the article did not mention Fig. 2B at all. In addition, Fig. S3 does not help this article too much. Probably it would be better if the authors could take the ΔIDR-mNG data from the Fig. S3 and put into the Fig. 2. as a negative control, especially for Fig. 2D an 2F. As for whether the phenotype of the ΔIDR-mNG construct is "similar to a constitutively active KIF1C construct containing only the motor domain (amino acids 1-348) (Fig. S3 C)", I do not think it is important here, since in this part, the authors are aiming to confirm the IDR is critical for KIF1C phase separation.

      Response: We have combined Figures 2B and 2C as suggested. We prefer to leave Figure S3 intact since, as the reviewer mentioned, the article is already long and these data are not critical for the story.

      The description "the condensate properties can be modulated by adjacent coiled-coil segments" in the abstract and the sentence "However, the coiled-coil segments in the stalk domain appear to facilitate puncta formation as the addition of increasing amounts of coiled coil resulted in increased KIF1C enrichment in puncta as compared to the IDR alone" in the article are not accurate, since there is no direct evidence in this manuscript that shows that. In Fig. 2D, as well as Fig. 6A and Fig. S7, it is manifest at a glance there are lots of IDR-mNG localized in nucleus, which decreases the concentration of this construct in cytoplasm which in turn may lower its capability to form puncta. This is important, as the results in Fig.4 show that the concentration of protein directly affects the formation of phase separated puncta in cells. From my view, the words "modulate", "tune" ... usually describe active processes, and these words may be confusing unless there are enough evidence support direct regulation. But the data presented in this article suggests to us that it is likely a passive process, such as the coiled coil region preventing the CC4-IDR construct from entering the nucleus (Fig. 2D, Fig. 6A and Fig. S7). Moreover, CC4 does affect the critical concentration of IDR in vitro (Fig. 5E), but that could be attributed to the coiled coil domain increasing its solubility. I like the word "influence" used in a subtitle in the discussion portion.

      Response: We have removed this from the text.

      In addition, the in vitro study of this paper in Fig. 5 did not show any significant difference of the puncta formation between IDR-mNG and CC4 - IDR-mNG (Diameter: 0.43 {plus minus} 0.22 μm (mean {plus minus} STD) for IDR-mNG vs 0.48 {plus minus} 0.27 μm (mean {plus minus} STD) for CC4-IDR-mNG. Roundness: No value was show in the article). So, a stricter assay or a more accurate description is required here to avoid any misleading to the readers.

      Response: We now include p values showing that the differences in diameter and roundness are statistically significant (data moved to Fig S8 B).

      The description for Fig. 5 "At 2 uM protein concentration and 100 mM NaCl, the KIF1C(IDR) droplets were smaller [diameter 0.43 {plus minus} 0.22 μm (mean {plus minus} STD)] than KIF1C(CC4+IDR) droplets [0.48 {plus minus} 0.27 μm (mean {plus minus} STD)] (Fig. 5 C)" does not appear accurate as well, since there is no significant difference between the value 0.43 {plus minus} 0.22 μm and the value 0.48 {plus minus} 0.27 μm, so it should not be descripted as "smaller". In addition, the article mentioned that "The KIF1C(IDR) puncta were also less round than those of KIF1C(CC4+IDR) (Fig. 5 C)", but there is no corresponding value from the quantification show the KIF1C(IDR) is less round.

      Response: We now include p values showing that the differences in diameter and roundness are statistically significant (data moved to Fig S8 B).

      The description in sentence "We thus tested whether ... LLPS is mutually exclusive (Fig. S5 A)" may not be accurate. Results in Fig. S5 only show there is no direct interaction between KIF1C and CLIP-170 or these two proteins do not colocalize. The words "mutually exclusive" means two proteins competent each other in the same location from my understanding.

      Response: We have replaced the words "mutually exclusive" with "no colocalization" (line 204).

      In addition, is it necessary to put Fig. S5 into this article? Since from my side, it does not help too much for the whole story. In cells, kinesin motors are autoinhibited in the cytoplasm. For this KIF1C, most of motors appear autoinhibited as well, even when the authors removed the IDR based on Fig. S3C (ΔIDR-mNG vs. MD-mNG). In this case, it is hard to investigate the potential interaction between the KIF1C (or its ΔIDR mutant) with the microtubules or with the tubulin due to the autoinhibition of constructs used in Fig. S5. It would be better to use other active versions of KIF1C, such as ΔP (Soppina et. al., PNAS, 2014) or other mutants (Ren et. al., PNAS, 2018; Wang et. al., Nat. Commun., 2022) if the authors want to show this part in the article.

      Response: We agree that this data is not essential for the story, however, it may be of interest and benefit to others in the field studying LLPS of microtubule-associated proteins and we prefer to leave Figure S5 (now Figure S4) in the supplementary information.

      The conclusion "This result suggests that the IDR- driven LLPS of KIF1C does not depend on mRNA incorporation, but is strongly affected by it" may not be accurate, there is no direct evidence that shows that mRNA, at least Rab13mRNA incorporation strongly affects the IDR- driven LLPS of KIF1C. Perhaps a knock out of Rab13mRNA would alter the formation of condensates, which would support a direct effect on LLPS.

      Response: We have changed the text (line 306).

      In addition, the sentence "These results also show that the LLPS is resistant to truncations of large portions of IDR" may not accurate, from my view, except IDR2a, the rest of the IDR may not participate or contribute too much to the formation of puncta, but that doesn't mean LLPS is resistant to the truncation of these portions in IDR, these are different logics. The quantification from Fig. 7E also show there is no significant difference between the ST and truncations except ΔIDR2 and ΔIDR2a in statistics, such as ST (21.8 {plus minus} 12.0 puncta per cell, 2.06 {plus minus} 0.83 μm diameter), ΔPLD (20.1 {plus minus} 13.7 puncta per cell, 1.62 {plus minus} 0.94 μm diameter), ΔIDR1 (23.1 {plus minus} 14.3 puncta per cell, 2.13 {plus minus} 1.04 μm diameter), ΔIDR3 (18.4 {plus minus} 8.1 puncta per cell, 1.71 {plus minus} 0.98 μm diameter).

      Response: We have changed the text (line 307).

      I am not sure I agree with the author's interpretation of their FRAP data in Fig. 3. It appears to me that there is a large immobile population of molecules, as the bleached areas recover less than 50% of their initial intensity. However, the authors conclude that there is rapid exchange of molecules in the puncta. The authors need to further analyze and discuss both the exchange rate of the population of molecules that exchange, but also the fraction of apparently immobile molecules that do not recover in their experiments. These data appear to suggest that a large percentage of the molecules in the KIF1C puncta in fact do not exchange with the cytoplasm and undermine their argument for a liquid-like phase of the puncta.

      Response: We agree that the data suggest the existence of an immobile pool of KIF1C within the condensates. We have added this information to the main text (lines 178-182). We note that these findings are consistent with recent studies demonstrating membrane-less organelles with at least partially solid-like properties, including nucleoli and stress granules as well as microtubule associated proteins (see references, reviewed in Van Treeck & Parker 2019).

      __Minor: __

      As mentioned above, Fig. 2 F needs a negative control, since the values of FL and IDR are lower than other constructs, maybe use the Δ IDR-mNG protein is better. In addition, from my view, the lower value of IDR construct does not represent this construct has lower capability to form puncta, but more likely because most of this protein localizes in nucleus, thus dramatically lowering the cytoplasmic concentration.

      Response: We have changed the text as suggested (lines 152-154).

      Fig. 6A probably need a negative control as well, maybe use the same construct ΔIDR in Fig. S7 is better.

      Response: We have now included KIF1Ba as a negative control (Fig S6 B).

      Although I guess the reason for using hTERT-RPE1 cells in Rab13mRNA rescue assay (Fig. 6D-G) probably is easier to get KIF1C knock out cells (if I am correct), it would be better if there is a brief introduction for the reason to use hTERT-RPE1 here, since all previous assay in the article used COS-7 cells.

      Response: You are correct and we have added text introducing the use of hTERT-RPE1 cells (line 269).

      Is there any specific reason to use the construct ST in Fig. 7? Since in Fig. 6, the authors used FL-length KIFIC, if the authors want to avoid any effects caused by motor domain, the construct CC4-IRD also could be a simpler candidate.

      Response: No specific reason other than to be consistent as most experiments that we carried out in cells used the ST construct (e.g. FRAP assay in Fig 3, hypotonic assay in Fig 3, RNaseA injection in Fig 4, RNA incorporation in Fig 4). (Note that Fig 7 is now Fig 6).

      This article is a great case for motor-cargo interaction, since the RNA binding site of KIF1C is within its tail domain. This left me curious about if the interaction between the KIF1C and the membrane-less RNA granule is sufficient to release the KIF1C motor from autoinhibition? I guess the binding of RNA is not enough to release the KIF1C from autoinhibition. From Fig. S3C and Fig. 6D, seems the motor still in autoinhibition, even remove the Rab13mRNA binding region.

      Response: We believe the question of whether the RNA binding relieves autoinhibition of KIF1C is beyond the scope of this manuscript and we plan to address this in the future with recombinant full-length KIF1C and RAB13 mRNAs.

      There are some grammar mistakes, e.g., There should be a "is" between "IDR" and "critical" in the title "A subregion of the KIF1C IDR critical for enrichment of Rab13mRNA in condensates".

      Response: Thank you. We have corrected this (line 289).

      There should be a definition for the full names of the abbreviate "RBD" mentioned in the article although the readers may guess that is an RNA binding domain, if possible, it would be better but not necessary if the authors could show the residues or the region in IDR.

      Response: RBD is defined at the beginning to the section "KIF1C condensates display properties of RNA granules" (line 219) but in response to the reviewer's comment, we now include this definition a second time in the Discussion section (line 420).

      In the results (line 126), the authors refer to the KIF1C IDR without first defining this region in the introduction. I would re-word this sentence for clarity by first defining what an IDR is and how it's assessed in the current study.

      Response: The IDR is defined at the end of the Introduction (lines 94-95).

      What is the significance of the roundness measurement in Fig. 5? This should be described for the reader.

      Response: Roundness refers to the shape of the droplet and this is now included in the text (line 323, data moved to Fig S8 B).

      The authors state several times that this is the first kinesin shown to undergo LLPS. However, is this true? What about the recent work showing that the yeast Tea2 kinesin undergoes LLPS with other +TIP components (Maan et al. NCB 2023).

      Response: We thank the reviewer for this comment. The recent work from the Dogterom lab (Maan et al., 2023) demonstrates that the end binding (EB) protein Mal3 forms condensates alone and with the kinesin-7 family member Tea2 and its cargo Tip1 for enrichment at microtubule plus ends. The authors show images of Mal 3 droplets and the requirement of the IDR domain and the crowding agent polyethylene glycol for droplet formation. The authors state that "Tea2 and Tip1 formed condensates under similar crowding conditions and concentrations on their own (Extended Data Fig. 5)." However, Extended Data Fig 5 reports on the fluorescence intensity of Mal3-EGFP colocalizing with Tea2 or Tip1. No images of Tea2-only droplets are shown and no information is provided on the Tea2 and/or PEG concentrations required for droplet formation or the liquid nature of Tea2 droplets. Thus, we do not feel comfortable stating that Tea2 on its own undergoes LLPS. We do reference the Maan et al., 2023 work in the Discussion listing microtubule-associated proteins shown to undergo LLPS (line 403) and when comparing the mM concentrations of KIF1C required for LLPS to the mM concentrations of these other microtubule-associated proteins (line 417).

      The authors don't discuss KIF5A, but their analysis reveals it also contains a low complexity region that may undergo LLPS (Fig. S2D). This would fit with recent reports that KIF5A tends to oligomerize more than other KIF5 isoforms, and that mutations in KIF5A that impact the tail domain may lead to aberrant oligomerization. I feel that it would be useful to the field for the authors to discuss these results in light of their own.

      Response: We thank the reviewer for this suggestion. Although it is intriguing that KIF5A is predicted to contain an IDR, there is, however, no data to suggest that KIF5A undergoes LLPS. Rather, the current literature suggests that KIF5A undergoes higher-order oligomerization and accumulation at the cell periphery, especially for the isoform lacking exon 27 (Nakano et al., 2022, Baron et al., 2022, Pant et al., 2023, Soustelle et al., 2023). It thus does not seem prudent for us to speculate on whether or not KIF5A undergoes LLPS.

      __Reviewer #3 (Significance (Required)): __

      The study is novel and interesting and will be impactful for the cytoskeletal and RNA biology communities. The experiments are of high quality and controls are appropriate. The finding that motor proteins can participate in LLPS will be of high interest for a variety of fields and provides a very interesting advance over current knowledge in the field.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      KIF1C is a member of the kinesin-3 family, which is responsible for fast organelle transport in cells. The cargos of KIFIC are diverse, such as Golgi apparatus, Rab6 vesicles, exon junction complex (EJC), integrins, and RNA. Mutations in the KIF1C coding sequence leads to neurodegenerative diseases, such as hereditary spastic paraparesis (HSP). In addition, as an RNA transporter, KIF1C transports various types of mRNAs (e. g., APC-dependent mRNAs, KIF1C's own mRNA) along the microtubules and clusters them to cytoplasmic protrusions to fulfill certain biological functions.

      In the current manuscript, Gen et.al., investigated the intracellular behaviors of the kinesin-3 member KIF1C. The study revealed that the KIF1C can form dynamic condensates both in cells and in vitro via an unstructured domain within the tail of the motor. KIF1C was found to also interact with synthesized RNA and other RNA granules in cells. In addition, the authors also show the KIFIC participates intracellular transport of endogenous mRNA, Rab13mRNA, identified a 47aa fragment in the KIF1C's IDR is critical for the KIF1C- Rab13mRNA interaction. Finally, as well as other prion-like proteins, the PPLS of KIF1C is buffered by the non-specific RNA pool in the cytoplasm.

      In summary, this is an interesting work in the field, and reveals novel results about the mechanisms of motor protein transport that will be broadly interesting. The assays are generally well performed, and the results and discussion are well described, but some descriptions in the article should be more rigorous and objective. The article is very long, and I think it would benefit from streamlining and reducing the number of figures to make it more accessible for non-specialists in the field.

      Here are some concerns:

      Major:

      1. Fig. 1A shows the domain organization of all kinesin-3 members, but Figure 1B only represents KIF1Bβ, KIF13B and KIF16B as controls. Generally, the KIF1Bα has the highest sequence similarity with KIF1C in kinesin-3 family (very high sequence similarity before aa 992 in KIF1C, which locates in IDR, probably contains IDR2a from Fig. S10A). In addition, both KIF1C and KIF1Bα contain a PLD from the prediction in this paper (Figure S2C). Although the authors show the phenotype of KIF1Bα in the Fig. S9, it might be better to put some descriptions up front, as readers may consider why the authors did not use KIF1Bα as a control. Actually, I kept thinking about this concern before I got to the discussion.
      2. It would be better if the authors can combine the Fig. 2B and 2C, since the article did not mention Fig. 2B at all. In addition, Fig. S3 does not help this article too much. Probably it would be better if the authors could take the ΔIDR-mNG data from the Fig. S3 and put into the Fig. 2. as a negative control, especially for Fig. 2D an 2F. As for whether the phenotype of the ΔIDR-mNG construct is "similar to a constitutively active KIF1C construct containing only the motor domain (amino acids 1-348) (Fig. S3 C)", I do not think it is important here, since in this part, the authors are aiming to confirm the IDR is critical for KIF1C phase separation.
      3. The description "the condensate properties can be modulated by adjacent coiled-coil segments" in the abstract and the sentence "However, the coiled-coil segments in the stalk domain appear to facilitate puncta formation as the addition of increasing amounts of coiled coil resulted in increased KIF1C enrichment in puncta as compared to the IDR alone" in the article are not accurate, since there is no direct evidence in this manuscript that shows that. In Fig. 2D, as well as Fig. 6A and Fig. S7, it is manifest at a glance there are lots of IDR-mNG localized in nucleus, which decreases the concentration of this construct in cytoplasm which in turn may lower its capability to form puncta. This is important, as the results in Fig.4 show that the concentration of protein directly affects the formation of phase separated puncta in cells. From my view, the words "modulate", "tune" ... usually describe active processes, and these words may be confusing unless there are enough evidence support direct regulation. But the data presented in this article suggests to us that it is likely a passive process, such as the coiled coil region preventing the CC4-IDR construct from entering the nucleus (Fig. 2D, Fig. 6A and Fig. S7). Moreover, CC4 does affect the critical concentration of IDR in vitro (Fig. 5E), but that could be attributed to the coiled coil domain increasing its solubility. I like the word "influence" used in a subtitle in the discussion portion.

      In addition, the in vitro study of this paper in Fig. 5 did not show any significant difference of the puncta formation between IDR-mNG and CC4 - IDR-mNG (Diameter: 0.43 {plus minus} 0.22 μm (mean {plus minus} STD) for IDR-mNG vs 0.48 {plus minus} 0.27 μm (mean {plus minus} STD) for CC4-IDR-mNG. Roundness: No value was show in the article). So, a stricter assay or a more accurate description is required here to avoid any misleading to the readers.

      The description for Fig. 5 "At 2 uM protein concentration and 100 mM NaCl, the KIF1C(IDR) droplets were smaller [diameter 0.43 {plus minus} 0.22 μm (mean {plus minus} STD)] than KIF1C(CC4+IDR) droplets [0.48 {plus minus} 0.27 μm (mean {plus minus} STD)] (Fig. 5 C)" does not appear accurate as well, since there is no significant difference between the value 0.43 {plus minus} 0.22 μm and the value 0.48 {plus minus} 0.27 μm, so it should not be descripted as "smaller". In addition, the article mentioned that "The KIF1C(IDR) puncta were also less round than those of KIF1C(CC4+IDR) (Fig. 5 C)", but there is no corresponding value from the quantification show the KIF1C(IDR) is less round. 4. The description in sentence "We thus tested whether ... LLPS is mutually exclusive (Fig. S5 A)" may not be accurate. Results in Fig. S5 only show there is no direct interaction between KIF1C and CLIP-170 or these two proteins do not colocalize. The words "mutually exclusive" means two proteins competent each other in the same location from my understanding.

      In addition, is it necessary to put Fig. S5 into this article? Since from my side, it does not help too much for the whole story. In cells, kinesin motors are autoinhibited in the cytoplasm. For this KIF1C, most of motors appear autoinhibited as well, even when the authors removed the IDR based on Fig. S3C (ΔIDR-mNG vs. MD-mNG). In this case, it is hard to investigate the potential interaction between the KIF1C (or its ΔIDR mutant) with the microtubules or with the tubulin due to the autoinhibition of constructs used in Fig. S5. It would be better to use other active versions of KIF1C, such as ΔP (Soppina et. al., PNAS, 2014) or other mutants (Ren et. al., PNAS, 2018; Wang et. al., Nat. Commun., 2022) if the authors want to show this part in the article. 5. The conclusion "This result suggests that the IDR- driven LLPS of KIF1C does not depend on mRNA incorporation, but is strongly affected by it" may not be accurate, there is no direct evidence that shows that mRNA, at least Rab13mRNA incorporation strongly affects the IDR- driven LLPS of KIF1C. Perhaps a knock out of Rab13mRNA would alter the formation of condensates, which would support a direct effect on LLPS.

      In addition, the sentence "These results also show that the LLPS is resistant to truncations of large portions of IDR" may not accurate, from my view, except IDR2a, the rest of the IDR may not participate or contribute too much to the formation of puncta, but that doesn't mean LLPS is resistant to the truncation of these portions in IDR, these are different logics. The quantification from Fig. 7E also show there is no significant difference between the ST and truncations except ΔIDR2 and ΔIDR2a in statistics, such as ST (21.8 {plus minus} 12.0 puncta per cell, 2.06 {plus minus} 0.83 μm diameter), ΔPLD (20.1 {plus minus} 13.7 puncta per cell, 1.62 {plus minus} 0.94 μm diameter), ΔIDR1 (23.1 {plus minus} 14.3 puncta per cell, 2.13 {plus minus} 1.04 μm diameter), ΔIDR3 (18.4 {plus minus} 8.1 puncta per cell, 1.71 {plus minus} 0.98 μm diameter). 6. I am not sure I agree with the author's interpretation of their FRAP data in Fig. 3. It appears to me that there is a large immobile population of molecules, as the bleached areas recover less than 50% of their initial intensity. However, the authors conclude that there is rapid exchange of molecules in the puncta. The authors need to further analyze and discuss both the exchange rate of the population of molecules that exchange, but also the fraction of apparently immobile molecules that do not recover in their experiments. These data appear to suggest that a large percentage of the molecules in the KIF1C puncta in fact do not exchange with the cytoplasm and undermine their argument for a liquid-like phase of the puncta.

      Minor:

      1. As mentioned above, Fig. 2 F needs a negative control, since the values of FL and IDR are lower than other constructs, maybe use the Δ IDR-mNG protein is better. In addition, from my view, the lower value of IDR construct does not represent this construct has lower capability to form puncta, but more likely because most of this protein localizes in nucleus, thus dramatically lowering the cytoplasmic concentration.
      2. Fig. 6A probably need a negative control as well, maybe use the same construct ΔIDR in Fig. S7 is better.
      3. Although I guess the reason for using hTERT-RPE1 cells in Rab13mRNA rescue assay (Fig. 6D-G) probably is easier to get KIF1C knock out cells (if I am correct), it would be better if there is a brief introduction for the reason to use hTERT-RPE1 here, since all previous assay in the article used COS-7 cells.
      4. Is there any specific reason to use the construct ST in Fig. 7? Since in Fig. 6, the authors used FL-length KIFIC, if the authors want to avoid any effects caused by motor domain, the construct CC4-IRD also could be a simpler candidate.
      5. This article is a great case for motor-cargo interaction, since the RNA binding site of KIF1C is within its tail domain. This left me curious about if the interaction between the KIF1C and the membrane-less RNA granule is sufficient to release the KIF1C motor from autoinhibition? I guess the binding of RNA is not enough to release the KIF1C from autoinhibition. From Fig. S3C and Fig. 6D, seems the motor still in autoinhibition, even remove the Rab13mRNA binding region.
      6. There are some grammar mistakes, e.g., There should be a "is" between "IDR" and "critical" in the title "A subregion of the KIF1C IDR critical for enrichment of Rab13mRNA in condensates".
      7. There should be a definition for the full names of the abbreviate "RBD" mentioned in the article although the readers may guess that is an RNA binding domain, if possible, it would be better but not necessary if the authors could show the residues or the region in IDR.
      8. In the results (line 126), the authors refer to the KIF1C IDR without first defining this region in the introduction. I would re-word this sentence for clarity by first defining what an IDR is and how it's assessed in the current study.
      9. What is the significance of the roundness measurement in Fig. 5? This should be described for the reader.
      10. The authors state several times that this is the first kinesin shown to undergo LLPS. However, is this true? What about the recent work showing that the yeast Tea2 kinesin undergoes LLPS with other +TIP components (Maan et al. NCB 2023).
      11. The authors don't discuss KIF5A, but their analysis reveals it also contains a low complexity region that may undergo LLPS (Fig. S2D). This would fit with recent reports that KIF5A tends to oligomerize more than other KIF5 isoforms, and that mutations in KIF5A that impact the tail domain may lead to aberrant oligomerization. I feel that it would be useful to the field for the authors to discuss these results in light of their own.

      Significance

      The study is novel and interesting and will be impactful for the cytoskeletal and RNA biology communities. The experiments are of high quality and controls are appropriate. The finding that motor proteins can participate in LLPS will be of high interest for a variety of fields and provides a very interesting advance over current knowledge in the field.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript by DeHaro-Arbona et al., the authors wish to understand how a signaling pathway (Notch) is dynamically decoded to elicit a specific transcriptional output. In particular, they investigate the kinetic properties of Notch-responsive nuclear complexes (the DNA binding factor CSL and its co-activator Mastermind (mam) along with several candidate interacting partners). Their experimental model is the polytene chromosome of the Drosophila salivary gland, in which the naturally inactive Notch can be artificially induced through the expression of a constitutively active form of Notch.

      The authors develop a series of CRISPR and transgenic lines enabling the live imaging of these complexes at a specific locus and in various backgrounds (genetic perturbations/drug treatments). This quantitative live imaging data suggests that Notch nuclear complexes form hubs, and the authors characterize their binding dynamics. Interestingly, they elegantly demonstrate that the content of these hubs and their kinetic properties can evolve, even within Notch ON cells. Hence, they propose the existence of distinct hubs, distinguishing an open (CSL), engaged (CSK-Mam), or active (CSL-Mam-Med-PolII) configuration in Notch ON cells and an inactive hub (in Notch OFF having previously been exposed to Notch) state, that would explain the surprising transcriptional memory that the authors observe hours after Notch withdrawal.

      We thank the reviewer for this constructive summary of our work

      Reviewer #2 (Public Review):

      The manuscript from deHaro-Arbona et al, entitled "Dynamic modes of Notch transcription hubs conferring memory and stochastic activation revealed by live imaging the co-activator Mastermind", uses single molecule microscopy imaging in live tissues to understand the dynamics and molecular determinants of transcription factor recruitment to the E(spl)-C locus in Drosophila salivary gland cells under Notch-ON and -OFF conditions. Previous studies have identified the major players that are involved in transcription regulation in the Notch pathway, as well as the importance of general transcriptional coregulators, such as CBP/P300 and the Mediator CDK module, but the detailed steps and dynamics involved in these processes are poorly defined. The authors present a wealth of single molecule data that provides significant insights into Notch pathway activation, including:

      (1) Activation complexes, containing CSL and Mam, have slower dynamics than the repressor complexes, containing CSL and Hairless.

      (2) Contribution of CSL, NICD, and Mam IDRs to recruitment.

      (3) CSL-Mam slow-diffusing complexes are recruited and form a hub of high protein concentrations around the target locus in Notch-ON conditions.

      (4) Mam recruitment is not dependent on transcription initiation or RNA production.

      (5) CBP/P300 or its associated HAT activity is not required for Mam recruitment.

      (6) Mediator CDK module and CDK8 activity are required for Mam recruitment, and vice-versa, but not CSL recruitment.

      (7) Mam is not required for chromatin accessibility but is dependent on CSL and NICD.

      (8) CSL recruitment and increased chromatin accessibility persist after NICD removal and loss of Mam, which confers a memory state that enables rapid re-activation in response to subsequent Notch activation.

      (9) Differences in the proportions of nuclei with both Pol II and with Mam enrichment, which results in transcription being probabilistic/stochastic. These data demonstrate that the presence of Mamcomplexes is not sufficient to drive all the steps required for transcription in every Notch-ON nucleus.

      (10) The switch from more stochastic to robust transcription initiation was elicited when ecdysone was added.

      Overall, the manuscript is well written, concise, and clear, and makes significant contributions to the Notch field, which are also important for a general understanding of transcription factor regulation and behavior in the nucleus. I recommend that the authors address my relatively minor criticisms detailed below.

      We thank the reviewer for their thorough and constructive summary of our work. We are glad that they overall found it insightful and interesting. Below we have addressed the points they have raised.

      Page 7, bottom. The authors speculate, "It is possible therefore that, once recruited, Mam can be retained at target loci independently of CSL by interactions with other factors so that it resides for longer." Is it possible that another interpretation of that data is that Mam is a limiting factor?

      As indicated our comment is a speculation and is based on the observations summarized in the paragraph. We are not entirely sure what the reviewer is proposing as an alternate model. However, if it relates to the relative concentrations of the different factors, this would not account for the differences in trajectory durations. And for most aspects of our analysis, K[off] has the most profound influence on the results. Furthermore, differences persist even when CSL levels are considerably reduced (as in conditions with Hairless RNAi).

      Page 9. The authors write, "A very low level of enrichment was evident for... for the CSL Cterminus..". The recruitment of CSL ct IDR does not appear to be statistically significant or there is no apparent difference (Figure S2C), suggesting the CSL ct IDR does not play a role in enrichment.

      We agree with the comments of the reviewer and have adjusted the text on page 9 accordingly.

      Page 9. The authors write, "Notably, MamnIDR::GFP fusion was present in droplets, suggesting it can self-associate when present in a high local concentration (Figure S2B)." Is this result only valid for Mam nIDR or does full-length Mam also localize into droplets, as has been previously observed for full-length mammalian Maml1 in transfected cells?

      We agree that the observed foci of MamL1 that have been detected in mammalian cells are interesting. We have not tried to replicate those data because the large size of Mam has made it challenging to produce a full-length form in over-expression. We note however that another portion of Mam, MamIDR, does not make droplets when over-expressed despite it containing a large section of the disordered region of the Drosophila Mam. We have now included a comment about the mammalian data in the text (page 9) to put our findings in context.

      Previous studies in mammalian cells suggest that Maml1 is a high-confidence target for phosphorylation by CDK8, see Poss et al 2016 Cell Reports https://doi.org/10.1016/j.celrep.2016.03.030. By sequence comparison, does fly Mam have similar potential phosphorylation sites, and might these be critical for Mam/CDK module recruitment?

      We thank the reviewer for highlighting this point. Indeed, we were very excited when we learnt that MamL1 was found to be a high confidence CDK8 target and we looked hard in the Mam sequence for potential phosphorylation sites. Sadly, there is very little conservation between the fly and the mammalian proteins beyond the helical region that contacts CSL and NICD. Furthermore, there are no identifiable putative CDK8 phosphorylation sites based on conventional motifs. It therefore remains to be established whether or not Mam is a direct target of the CDK8 kinase activity. We have added an explanatory comment in the text (page 11).

      Page 11: The authors write, "The differences in the effects on Mam and CSL imply that the CDK module is specifically involved in retaining Mam in the hub, and that in its absence other CSL complexes "win-out", either because the altered conditions favour them and/or because they are the more abundant." Are the "other" complexes the authors are referring to Hairless-containing complexes? With the reagents the authors have in hand couldn't this be explicitly shown for CSLcomplexes rather than speculated upon?

      The reviewer is correct that CSL complexes containing Hairless are good candidates to be recruited in these conditions. We have compared the levels of Hairless at E(spl)-C following treatments with Senexin and have not detected a difference. However, it appears that the high proportion of unbound Hairless makes it difficult to detect/quantify the enrichment at E(spl)-C. We have therefore taken a different strategy, which is to measure the recruitment of a mutant form of CSL that is compromised for Hairless binding. Recruitment of the mutant CSL is detected in Notch-ON conditions, but is significantly reduced/absent following Senexin treatment. These data favour the model proposed by the reviewer that in the absence of CDK8 activity, the CSL-Hairless complexes win out. These new data have been added in new Supplementary Figure S3F and S3G (and see text page 11)

      Page 12/13: The authors write, "Based on these results we propose that, after Notch activity decays, the locus remains accessible because when Mam-containing complexes are lost they are replaced by other CSL complexes (e.g. co-repressor complexes)." Again, why not actually test this hypothesis rather than speculate? The dynamics of Hairless complexes following the removal of Notch would be very interesting and build upon previously published results from the Bray lab.

      We thank the reviewer for this comment and we agree it’s possible that the proportion of Hairless complexes increases after Notch withdrawal. However, for the reasons outlined above, it is difficult to quantify changes in Hairless, (and our preliminary experiment did not reveal any large-scale effect) and because of the complexity of the genetics we cannot straightforwardly extend the experiment to analyze the behaviour of the mutant CSL as above. Therefore, at present, we cannot say whether the loss of Mam is compensated by an increase in Hairless. We hope in future to investigate the characteristics of the memory in more depth.

      Page 13: The authors write, "As Notch removal leads to a loss of Mam, but not CSL, from the hub, it should recapitulate the effects of MamDN." While the data in Figure 5B seem to support this hypothesis, it's not clear to me that the loss of Mam and MamDN should phenocopy each other, bc in the case of MamDN, NICD would still be present.

      We apologise that this sentence was a bit misleading. We have now rewritten it to improve accuracy (page 13) “As Notch removal leads to a loss of Mam, but not CSL, from the hub, we hypothesised it would recapitulate the effects of MamDN on chromatin accessibility and transcription of targets.”

      The temporal dynamics for Mam recruitment using the temperature- and optogenetic-paradigms are quite different. For example, in the optogenetic time course experiments, the preactivated cells are in the dark for 4 hours, while in the temperature-controlled experiments, there is still considerable enrichment of Mam at 4 hours. For the preactivated optogenetic experiments, how sure are the authors that Mam is completely gone from the locus, and alternatively, can the optogenetic experimental results be replicated in the temperature-controlled assays? My concern is whether the putative "memory" observation is just due to incomplete Mam removal from the previous activation event.

      We appreciate the concerns of the reviewer. However, we are confident that the 4-hour optogenetic inactivation is much more effective than the equivalent time for temperature shifts. The temperature sensitive experiment involves a longer decay, because not only the protein but also the mRNA has to decay to fully remove NICD activity. The optogenetic experiments, involve only protein decay and so are more acute. Furthermore, we have tested (and we show in Figure 5H) that Mam is fully depleted after 4 hours “Off” in the optogenetic experiments.

      In order to further strengthen the evidence in favour of the memory hub, we have extended the time-frame further to show that CSL is retained at the locus even after 24 hours “Notch OFF” in both the temperature and the optogenetic paradigm. We have also measured the effects on transcription after a 24hr OFF period using the optogenetic paradigm and seen that robust transcription is initiated in cells that have experienced a previous activation (preactivated) compared to those that have not (naïve). These new data have been added to new Figure 5 C-F and strongly support the memory model.

      Reviewer #3 (Public Review):

      Summary:

      DeHaro-Arbona and colleagues investigate the in vivo dynamics of Notch-dependent transcriptional activation with a focus on the role of the Mastermind (MAM) transcriptional co-activator. They use GFP and HALO-tagged versions of the CSL DNA-binding protein and MAM to visualize the complex, and Int/ParB to visualize the site of Notch-dependent E(Spl)-C transcription. They make several conclusions. First, MAM accumulates at E(Spl)-C when Notch signaling is active, just like CSL. Second, MAM recruits the CDK module of Mediator but does not initiate chromatin accessibility. Third, after signaling is turned off, MAM leaves the site quickly but CSL and chromatin accessibility are retained. Fourth, RNA pol II recruitment, Mediator recruitment, and active transcription were similar and stochastic. Fifth, ecdysone enhances the probability of transcriptional initiation.

      Strengths:

      The conclusions are well supported by multiple lines of extensive data that are carefully executed and controlled. A major strength is the strategic combination of Drosophila genetics, imaging, and quantitative analyses to conduct compelling and easily interpretable experiments. A second major strength is the focus on MAM to gain insights into the dynamics of transcriptional activation specifically.

      We thank the reviewer for their positive comments about the strengths of our work.

      Weaknesses:

      Weaknesses are minor. There were no p-values reported for data presented in Figure S1D and no indication of how variable measurements were. In addition, the discussion of stochasticity was not integrated optimally with relevant literature.

      We thank the reviewer for noting these points. The statistical tests have now been included for Figure S1D (now Figure S1F). We have amplified the discussion about stochasticity, to include more reference to the literature and to make clear also the distinction with transcription bursting (page 19, 20).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have an elegant series of manipulations that provide strong evidence for their hypotheses and conclusions. Their exploitation of a unique biological system amenable to imaging in the larval salivary gland is well-considered and well-performed. Most of the conclusions are supported by the data. I only have the concerns below.

      (1) One of the main findings is the composition of Notch nuclear complexes and their interactions within a 'hub'. Yet most of the data showing hubs focus on labeling one protein component (+the locus or transcription), but multi-color imaging is rarely used to show how CSL-Mam, Mam-Med... protein signals coalescence to form a hub. Given the powerful tool developed, it would be important to show these multi-state hubs. Related to this, if the authors expect that hubs are formed independently of transcription or Notch pathway activation, do the authors see clustering at other non-specific loci in the nucleus? If not, can the authors comment on why they think that is the case? If so, do they demonstrate consistent residence time profiles with the tracked E(spl) locus?

      We apologise that it was not evident from the data shown that the proteins co-localize. First we stress that all the experiments are multicolor and most rely on very powerful methods to measure co-recruitment at a chromosomal locus- something that is very rarely achieved by others studying hubs. Second, we have in all cases confirmed that the proteins do colocalize. We have modified the diagram of our analysis pipeline to make more clear that this relies on multi-colour imaging, and adjusted all the figure labels to indicate the position of E(spl)-C. We have also added panels to new supplementary Figure S1C with examples of the co-localization between CSL and Mam and a plot confirming their levels of recruitment are correlated across multiple nuclei.

      We would like to clarify that our data show that the hubs do require Notch activation for their establishment. Other regions of enrichment are detected in Notch-ON conditions, but these are less prominent and, with no independent method for identifying them, can’t be compared between nuclei. In SPT experiments, other clusters with consistent residence are detected as reported in our recent paper which expanded on the SPT data (Baloul et al, 2023). We also detect co-localizations and “hubs” in other tissues, but those analyses are ongoing and beyond the scope of this paper.

      (2) The authors convincingly show that Notch hub complexes exhibit a memory. While the data showing rapid hub reformation upon Notch withdrawal are solid and convincing (Figure 5, in particular, F), the claim that this memory fosters rapid transcriptional reactivation is less clear. Yet in order to invoke transcriptional memory, it's necessary to solidify this transcriptional response angle. The authors should consider quantifying the changes in transcription activity (at the TS and not in the cytoplasm as currently shown), as well as the timing of transcriptional reactivation (with the MS2 system or smFISH). Manipulating the duration of the activation and dark recovery periods could help to draw a better correlation between the timing of hub reformation and that of transcriptional response and would also help determine how persistent this phenomenon is.

      We thank the reviewer for these suggestions. We have carried out several new experiments to probe further the persistence of memory and to show the effects on transcription when Notch is inactivated/reactivated. First, we have extended the time period for Notch inactivation by temperature control and show that the CSL hub persists even at 24 hours and that no transcription from the target E(spl)m3 is detected –neither at the transcription start-site nor in the cytoplasm. Second, we have extended the Notch OFF time period to 24 hours using the optogenetic approach and show that transcription is robustly reinitiated in preactivated nuclei when Notch is re-activated with 30 mins light treatment while little if any E(spl)m3 transcription is detected in naïve nuclei with the same treatment. These new data are included in new Figure 5 C-F and see page 13-14. Both these new experiments substantiate the model that the nuclei retain transcriptional memory.

      (3) The manuscript ends with the finding that the presence of a Mam hub does not always correlate with transcription. They conclude that transcription is initially stochastic. The authors find this surprising and even state that this could not be observed without their in vivo live imaging approaches. I don't understand why this result is surprising or unexpected, as we now know that transcription is generally a stochastic process and that most (if not all) loci are transcribed in a bursting manner. The fact that E(spl)-C locus is bursty is already obvious from the smFISH data. The fact that active nascent transcription does not correlate with local TF hubs was already observed in early Drosophila embryos (with Zelda hubs and two MS2 reporters, hb-MS2, sna-MS2). If, in spite of the inherent stochasticity of transcription (bursting), the data are surprising for other reasons, the authors should explain it better.

      We apologise that we had not made clear the reasons why the results were unexpected. We have substantially rewritten this section, and the discussion section, to clarify. We have also moderated the language used to better reflect the overall context of our results. We briefly summarise here. As the reviewer correctly states, it is well known that transcription is inherently bursty. Indeed the MS2 transcription profiles in “ON” nuclei are bursty, which likely reflects the switching of the promoter. However, in other contexts where we have monitored transcription although it is bursty it has nevertheless been initiated synchronously in response to Notch in all nuclei in a manner that was fully penetrant. What we observe in our current conditions, is that some nuclei never initiate transcription over the time-course of our experiments (2-3 hours), and those that are ON rarely switch off. This implies that there is another rate-limiting step. Supplying a second signal can modulate this so that it occurs with much higher frequency/penetrance. We consider this to be a second tier of regulation above the fundamental transcriptional bursting.

      The fact that Mam is recruited in all nuclei, whether or not they are actively transcribing was surprising because recruitment of the activation complex has been considered as the limiting step. This is somewhat different from Zelda, which is thought to be permissive and needed at an early step to prime genes for later activation rather than to be the last step needed to fire transcription. We note also that we are not monitoring the position of the hub with respect to the promoter, as in the Zelda experiments (Zelda hubs may still persist, but they are not overlapping with the nascent RNA), we are monitoring the presence or absence of Mam hub in proximity to a genomic region.

      Minor suggestions:

      (1) The genotypes of the samples should be indicated in the figure legends.

      We thank the reviewer for this suggestion. We have provided a table (new Table S3) where all of the genetic combinations are provided in detail for each figure. We considered that this approach would be preferable because it would be quite cumbersome to have the genotypes in each legend as they would become very long and repetitive.

      (2) While the schematic Fig1A explains how the locus is detected, the presence of ParS/ParB is never indicated in subsequent panels and Figure. I assume that all panels depicting enrichment profiles, use a given radius from the ParS/ParB dot to determine the zero of the x-axis (grey zone). This should be clearly stated in all panels/figure legends concerned.

      We apologies if this was not made explicit. Yes, all panels depicting enrichment profiles, use immunofluorescence signal from ParA/ParB recruitment to determine the zero of the x-axis. We have now marked this more clearly In all figures (grey bar, grey shading or labelled 0). All images where the locus is indicated by an arrowhead, by a coloured bar above the intensity plots or by grey shading in the graphs have been captured with dual colour and the signal from ParA/B recruitment used to define its location. This is now clearly stated in the analysis methods and in the legend. We have also modified the diagram in new supplementary Figure S1B, showing our analysis pipeline, to make that more explicit.

      (3) FRAP/SPT experiments: the author should provide more details. How many traces? Are traces showing bleaching removed?

      P7: does the statement ' The residences are likely an underestimation because bleaching and other technical limitations also affect track durations' imply that traces showing bleaching have not been removed from the analysis?

      The authors could justify the choice of the model for fitting FRAP/Spt experiments and be cautious about their interpretation. For example, interpreting a kinetic behavior as a DNA-specific binding event can be accurate, only if backed up with measurements with a mutant version of the DNA binding domain.

      We apologise if some of this information was not evident. The number of trajectories is provided in new Figure S1F, which indicates the number of trajectories analyzed for each condition in Figure 1.

      We have now added also the numbers of trajectories analyzed for the ring experiments.

      The comments on page 7 about bleaching refer to the technical limitations of the SPT approach. However, as bleached particles cannot be distinguished from those that leave the plane of imaging, they have not been filtered or removed. We have not sought to make claims about absolute residence times for that reason. Rather the point is to make a comparison between the different molecules. As the same fluorescent ligand and imaging conditions are used in all the experiments, all the samples are equivalently affected by bleaching. We subdivide trajectories according to their properties and infer that those which are essentially stationary are bound to chromatin, as is common practice in the field. We note that we have previously shown that a DNA binding mutant of CSL does not produce a hub at E(spl)-C in Notch-ON conditions and has a markedly more rapid recovery in FRAP experiments (Gomez-Lamarca et al, 2018) consistent with the slow recovery being related to DNA binding. This point has been added to the text (page 8).

      (4) The authors should quantify their RNAi efficiency for Hairless-RNAi, Med13-RNAi, white-RNAi, yellow-RNAi, CBP-RNAi, and CDK8-RNAi.

      We thank the reviewer for this comment. We have made sure that we are using well validated RNAis in all our experiments and have included the references in Table S2 where they have been used. We have now evaluated the knock-down in the precise conditions used in our experiments by quantitative RT-PCR and added those data, which show efficient knock-down is occurring, to new Supplementary Figure S1D and Figure S3J. We note also that the RNAi experiments are complemented by experiments inhibiting the complexes with specific drugs and that these yield similar results.

      (5) Figure 3 A: could the author show that transcription is indeed inhibited upon triptolide treatment with smFISH (with for example m3 probes)? Why not use alpha-amanitin?

      We thank the reviewer for this suggestion. We had omitted the smFISH data from this experiment in error. These data have now been added to new Supplementary Figure S3A and clearly show that transcription is inhibited following 1 hour exposure to triptolide. Triptolide is a very fast acting and very efficient inhibitor of transcription that acts at a very early step in transcription initiation. In our experience it is much more efficient than alpha-amanitin and is now the inhibitor of choice in many transcription studies.

      (6) Figure 4 typo: panel B should be D and vice versa. Accessibility panels are referred to as Figure 4D, D' in the text but presented as panel B in the Figure.

      We thank the reviewer for noting this mistake, it is now changed in the main text.

      (7) The authors must add their optogenetic manipulation protocol to their methods section.

      The method is described in detail in a recently published paper that reports its design and use. We have now also added a section explaining the paradigm in the methods (Page 31) as requested.

      (8) Figure 3G needs a Y-axis label.

      Our apologies, this has now been added.

      (9) The authors should note why there was a change of control in Figure 3D compared to 3E and G (yellow RNAi vs white RNAi).

      This is a pragmatic choice that relates to the chromosomal site of the RNAis being tested. Controls were chosen according to the chromosome that carries the UAS-RNAi: for the second chromosome this was yellow RNAi and for the third white RNAi. This is explained in the methods.

      (10) Figure 1 would benefit from a diagram describing the genomic structure of the E(spl) locus and the relative position of the labelled locus within it.

      We thank the reviewer for this suggestion and have added a diagram to Supplementary Figure S1A .

      Reviewer #2 (Recommendations For The Authors):

      Minor criticisms and typos:

      Pet peeve: in some of the figure panels they are labeled Notch ON or OFF, but in others they are not, albeit that info is included in the figure legend. For the ease of the reader/reviewer, would it be possible to label all relevant figure panels either Notch ON or OFF for clarity?

      We thank the reviewer for this suggestion and have modified the figures accordingly.

      Page 7, top. "In comparison to their average distribution across the nucleus, both CSL and Mam trajectories were significantly enriched in a region of approximately 0.5 μm around the target locus in Notch-ON conditions, reflecting robust Notch dependant recruitment to this gene complex." Are the authors referring to Figure 1D here?

      Thank you, this figure call-out has been added in the text.

      Page 9. "...reported to interact with p300 and other factors (Figure S2B)." I believe the authors mean Figure S2C and not S2B.

      Thank you, this has been corrected in the text.

      Page 9. There is no Figure S2D.

      Apologies, this was referring to Figure S1D, and is now corrected in the text.

      Page 11: "...were at very reduced levels in nuclei co-expressing MamDN (Figure 4B).." Should be Figure 4CD.

      Thank you, this has been corrected in the text.

      Page 12: "...which was maintained in the presence of MamDN (Figure 4D, D')." Should be Figure 4B.

      Thank you, this has been corrected in the text.

      Reviewer #3 (Recommendations For The Authors):

      In the Results section on Hub, the paragraph starting with "Third, we reasoned . ." the callout to Figure S2D should be Fig S1D.

      Thank you, this has been corrected in the text

      Figures: The font size in the Figures is so small that most words and numbers cannot be read on a printout. One has to go to the electronic version and increase the size to read it. This reviewer found that inconvenient and often annoying.

      We apologise for this oversight, the font size has now been adjusted on all the graphs etc.

      Figure legends: the legends are terse and in some cases leave explanations to the imagination (e.g. "px" in Figure 2E). It would be useful to go through them and make sure those who are not a Drosophila Notch person and not a transcription biochemist can make sense of them.

      Our apologies for the lack of clarity in the legends. We have gone over them to make them more accessible and less succinct.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-01938R

      Corresponding author(s): Ilan, Davis

      1. General Statements

      We thank all four reviewers for their helpful and constructive comments. We have gone through each and every comment and proposed how we would address each point raised by the reviewers. We are confident our proposed revisions are feasible within a reasonable and expected time frame. Some of the comments regarding minor typo/aesthetics and extra references have already been addressed in the transferred manuscript. The changes are highlighted in yellow in the transferred manuscript.

      2. Description of the planned revisions

      Reviewer #1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. The presented work itself (Figures 1-4) does not need significant adjustments prior to publication, in my view, with only a few points to address. However, the work in Figure 5- doesn't really support the claims the authors make on its own, and would require some additional experiments or at the very least discussion of the caveats to its current form.

      We thank the reviewer for these comments and will follow the reviewer’s suggestion by discussing the caveats regarding the interpretation of Figure 5. We will also add to the discussion to suggest future research approaches beyond the scope of this manuscript that would address the functional importance of localised mRNA translation. We will briefly mention in the discussion methods such as the quantification of the mRNA foci and the disruption of the mRNA localisation signals to disrupt localised translation and the use of techniques such as Sun-Tag (Tanenbaum et al, 2014) and FLARIM (Richer et al, 2021) to visualise local translation directly.


      Tanenbaum et al, 2014 DOI: 10.1016/j.cell.2014.09.039

      Richer et al, 2021 DOI: 10.1101/2021.08.13.456301

      * __ Localized glia transcripts, are they "glial/CNS/PNS" significant or are they similar to other known datasets of protrusion transcriptomes? The authors compared their 4801 "total" localized to a local transcriptome dataset from the Chekulaeva lab finding that a significant fraction are localized in both. As the authors note, this is in good agreement with a recent paper from the Talifarro lab showing conservation of localization of mRNAs across different cell types. What the authors haven't done here, is further test this by looking at other non-neuronal projection transcriptomic datasets (for example Mardakheh Developmental Cell 2015, among others). If the predicted glia-localized processes are similar to non-neuronal processes transcriptomes, this would further strengthen this claim and rule out some level of CNS/PNS derived linage driving the similarities between glia and neuronal localized transcripts. __*

      This is a good point and we thank the review for pointing out this interesting cancer data set. We will do as the reviewer suggests and intersect our data with Mardakheh Dev Cell 2015 to test the further generality of localisation in neurons and glia, in other cell types. Specifically, we plan to intersect both glial (this study) and neuronal (von Kuegelgen & Chekulaeva, 2020) dataset with protrusive breast cancer cells (Mardakeh et al, 2015).

      • *

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      Mardakeh et al, 2015 DOI: 10.1016/j.devcel.2015.10.005

      * __ The presentation/discussion around Figure 3 is a bit weaker than other parts of the manuscript, and it doesn't really contribute to the story in its current form. Notably there is no discussion about the significance of glia in neurological disorders until the very end of the manuscript (page 21), meaning when its first brought up.. it just sits there as a one off side point. The authors might consider strengthening/tightening up the discussion here, if they really want to keep it as a solo main figure rather than integrating it somewhere else/putting it into supplemental. In my view, Figures 2 & 3 should be merged into something a bit more streamlined. __*

      This is a good point. We plan to strengthen the presentation of Figure 3 and discussion of the significance of glia in neurological disorders by adding a description of the Figure in the Results section and highlighting the significance of glia in nervous system disorders in the Discussion section.

      * __ Why aren't there more examples of different mRNAs in Figure 4? Seems a waste to kick them all to supplemental. __*

      We agree that it could be helpful to show different expression patterns in the main figure. To address this point we will add Pdi (Fig. S4D), which shows mRNA expression in both the glia and the surrounding muscle cell. This pattern is in contrast to Gs2, which is highly specific to glial cells. We will also note that although pdi mRNA is present in both the glia and muscle, Pdi protein is only abundant in the glia, suggesting that translation of pdi mRNA to protein is regulated in a cell-specific manner.

      The plasticity experiments, while creative, I think need to be approached far more cautiously in their interpretation. Given that the siRNAs will completely deplete these mRNAs- it really needs to be stressed any/all of the effects seen could just be the result of "defective" or "altered" states in this glial population- which has spill over effects on plasticity in at the NMJ. Without directly visualizing if these mRNAs are locally translated in these processes and assessing if their translation is modulated by their plasticity paradigm, all these experiments can say is that these RNAs are needed in glia to modulate ghost bouton formation in axons. This represents the weakest part of this manuscript, and the part that I feel does not actually backup the claims currently being made. Without any experiments to A. quantify how much of these transcripts are localized vs in the cell body of these glia, B. visualize/quantify the translation of these mRNAs during baseline and during plasticity; the authors cannot use these data to claim that localized mRNAs are required for synaptic plasticity.

      We are grateful to the reviewer for pointing out that we were not precise enough in defining our interpretation of the structural plasticity assay. We did not intend to claim that our results show that local translation of these transcripts is necessary for plasticity, only that these transcripts are localized and are required in the glia for plasticity in the adjacent neuron (in which the transcript levels are not disrupted in the experiment). Definitively proving that these transcripts are required locally and translated in response to synaptic activity would require genetic/chemical perturbations and imaging assays that would require a year or more to complete, so are beyond the scope of this manuscript. To address this point, we will clarify that the results do not show that localized transcripts are required, only that the transcripts are required somewhere specifically in the glial cell (without affecting the neuron level), and we can indeed show in an independent experiment that there are localized transcripts.

      Reviewer #2 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. * __ The authors analyse the 1700 shortlisted genes for Gene Ontology and associations with austism spectrum disorder, leading to interesting results. However, it is not clear to what extent the enrichments they observe are driven by their presumptive localization or if the associations are driven to a significant extent by the presence of these genes in the selected cell types in the Fly Cell Atlas. One way to address this would be to perform the GO and SFARI analysis on genes that are expressed in the same cells in the Fly Cell Atlas but were not shortlisted from the mammalian cell datasets - the results could then be compared to those obtained with the 1700 localized transcripts. __* This is a fair point raised by the reviewer as genes involved in neurological disease such as Autism Spectrum Disorder may be enriched in CNS/PNS cell types. We will follow the reviewer’s suggestion to perform GO and SFARI gene enrichment analysis in genes that were not shortlisted for presumptive glial localisation.

      Although the authors attempt to justify its inclusion, I'm not convinced why it was important to use the whole cell transcriptome of perisynaptic Schwann cells as part of the selection process for localizing transcripts. Including this dataset may reduce the power of the pipeline by including mRNAs that are not localized to protrusions. How many of the shortlisted 1700 genes, and how many of the 11 glial localized mRNAs in Table 5, would be lost if the whole cell transcriptome were excluded. More generally, what is the distribution of the 11 validated localizing transcripts in each dataset in Table 4? This information might be valuable for determining which dataset(s), if any, has the best predictive power in this context.

      We thank the reviewer for raising this point, which we will address with further analysis and adding to the discussion. We propose to address the criticism by running our analysis pipeline without the inclusion of the dataset using Perisynaptic Schwann Cells (PSCs) and then intersect with the PSCs-expressed genes, since their functional similarity with polarised Drosophila glial cells is highly relevant. We also agree with the reviewer that it would be a useful control for us to assess the ‘predictive power’ of each glial dataset by calculating their contribution to the shortlisted 1,700 glial localised transcripts and to the 11 experimentally validated transcripts via in situ hybridisation. To address this point, we plan to add this information in the revised manuscript.

      * __ Did the authors check if any of the RNAi constructs are reducing levels of the target mRNA or protein? Doing so would strengthen the confidence in these important results significantly. In any case, the authors should also mention the caveat of potential off-target effects of RNAi. __*

      We thank the reviewer for their useful comment and agree that the extent to which the RNAi expression reduces the levels of mRNA is not specifically known. We will add a FISH experiment on lac, pdi and gs2 RNAi showing very strong reduction in mRNA levels. We will also add an explanation of the caveats of the use of the RNAi system to the discussion.

      Methods: what is the justification for assuming that if the RNAi cross caused embryonic or larval lethality then the 'next most suitable' RNAi line is reporting on a phenotype specific to the gene. If the authors want to claim the effect is associated with different degrees of knockdown they should show this experimentally. An alternative explanation is that the line used for phenotypic analysis in glia is associated with an off-target effect.

      We thank the reviewer for this comment. We agree that off target effects cannot in principle be completely ruled out without considerable additional experimental analysis beyond the scope of this manuscript. To address the criticism we will remove the expression data of the lines that cause lethality and revise the discussion to explain that the level of knockdown in each line is unknown, and would require further experimental exploration.

      Minor points:

      1. It would be helpful to have in the Introduction (rather than the Results, as is currently the case) an operational definition of mRNA localization in the context of the study. And is it known whether or not localization in protrusions is the norm in mammalian glia or the Drosophila larval glia? I ask because it may be that almost all mRNAs diffuse into the protrusion, so this is not a selective process. One interesting approach to test this idea might be to test if the 1700 shortlisted transcripts have a significant underrepresentation of 'housekeeping' functions. We thank the reviewer for this excellent suggestion. To address the comment, we will move our explanation of the operational definition of mRNA localization to the Introduction. We will also perform enrichment analysis of housekeeping genes within 1,700 shortlisted transcripts compared to the transcriptome background, as the reviewer suggested.

      Reviewer #3 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. The authors have pooled data from different studies across different type of glial cells performed from in vitro to in vivo. While pooling datasets may reveal common transcripts enriched in processes, this may not be the best approach considering these are completely different types of glial cells with distinct function in neuronal physiology. We thank the reviewer for highlighting the need for us to further justify why we pooled datasets. We will revise the manuscript to better emphasise that the overarching goal of our study was to try to discern a common set of localised transcripts shared between the cells. The problem with analysing and comparing individual data sets is that much of the variation may be due to differences in the methods used and amount of material, rather than differences in the type of cells used. We will revise the discussion to make this point and plan to explain that our approach corresponds well with a previous publication pooling localised mRNA datasets in neurons (von Kugelgen & Chekulaeva 2021).

      von Kuegelgen & Chekulaeva, 2020 DOI: 10.1002/wrna.1590

      It is important to note the limitations of the study. For example, DeSeq2 is biased for highly expressed transcripts. How robust was the prediction for low abundance transcripts?

      The presented 1,700 transcripts were shortlisted based on their presence and expression level (TPM) in glial protrusions rather than their relative enrichment. Nevertheless, the reviewer makes a valid criticism of our use of DESeq2, where we compared enriched transcripts in glial and neuronal protrusions in Figure 1D. To address this point we will discuss this caveat in the relevant section.

      The issue raised regarding low abundance transcript prediction raises an important question: does the likelihood of localisation to cell extremities correlate with mRNA abundance? We have already partially addressed this point, since our analysis of the fraction of localised transcripts per expression level quantiles shows only limited correlation. To address this comment, we will add these results in the revised manuscript as a supplementary figure.

      The authors identify 1,700 transcripts that they classify as "predicted to be present" in the projections of the Drosophila PNS glia. This was based on the comparison to all the mammalian glial transcripts. Since the authors have access to a transcriptomic study from Perisynaptic Schwann cells (PSCs), the nonmyelinating glia associated with the NMJ isolated from mice; it would be more convincing to then validate the extent of overlap between Drosophila peripheral glial with the mammalian PSCs. This may reveal conserved features of localized transcripts in the PNS, particularly associated with the NMJ function.

      Thank you for the valuable suggestion. A similar point was also raised by __[Reviewer #2 - Major point 2] __to re-run our pipeline excluding the PSCs dataset and intersect with the PSC transcriptome post-hoc. Please see the above section for our detailed response.

      Fig 2: What is the extent of overlap between the translating fractions versus the localized fraction? It will be informative to perform the functional annotation of the translating glial transcripts as identified from Fig 1D.

      This is an interesting question. To address this point, we plan to: (i) compare transcripts that are translated vs. localised in glial protrusions, and (ii) perform functional annotation enrichment analysis on the translated fraction of genes.

      "We conclude predicted group of 1,700 are highly likely to be peripherally localized in Drosophila cytoplasmic glial projections". To validate their predictions, the authors test some of these candidates in only one glial cell type. It might be worthy to extend this for other differentially expressed genes localized in another glial type as well.

      The presented in vivo analyses made use of the repo-GAL4 driver, which is active in all glial subtypes, including subperineurial, perineurial and wrapping glia that make distal projection to the larval neuromuscular junction. We agree that subtype-specific analysis would be highly informative, but we believe this is outside the scope of the current work where we aimed to identify conserved localised transcriptomes across all glial subtypes. Nevertheless, to address the comment, we plan to further clarify our use of pan-glial repo-GAL4 driver in the Results and Method section of the revised manuscript.

      Figure 5: The authors perform KD of candidate transcripts to test the effect on synapse formation. However, these are KD with RNAi that spans across the entire cell. To make the claim about the importance of "target" RNA localization in glia stronger, ideally, they should disrupt the enrichment specifically in the glial protusions and test the impact on bouton formation. Do these three RNAs have any putative localization elements?

      We agree with the review, that we would ideally test the effect of disruption of mRNA localization (and therefore localised translation). However, we feel these experiments are beyond the scope of this current study, as they will require a long road of defining localisation signals that are small enough to disrupt without affecting other functions. To address this comment we will revise the Discussion section to mention those difficulties explicitly, and clarify the limitations of the approach used in our study for greater transparency.

      Reviewer #4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      Major points:

      1. The authors use FISH to validate the glial expression of their target genes, though these experiments are not quantified, and no controls are shown. The authors should provide a supplemental figure with "no probe" controls, and/or validate the specificity of the probe via glial knockdown of the target gene (see point 2). Furthermore, these data should be quantified (e.g. number of puncta colocalized with NMJ glia membrans). Thank you for requesting further information regarding the YFP smFISH probes. We have validated the specificity and sensitivity of the YFP probe in our recent publication (Titlow et al, 2023, Figure 1 and S1). Specifically, we demonstrated the lack of YFP probe signal from wild-type untagged biosamples and showed colocalization of YFP spots with additional probes targeting the endogenous exon of the transcript. Nevertheless, we will address this comment by adding control image panels of smFISH in wild-type (OrR) neuromuscular junction preparations.

      Titlow et al, 2023 DOI: 10.1083/jcb.202205129

      For the most part, the authors only use one RNAi line for their functional studies, and they only show data for one line, even if multiple were used. To rule out potential false negatives, the authors should leverage their FISH probes to show the efficacy of their knockdowns in glia. This would serve the dual purpose of validating the new probes (see point 1).

      Thank you for the suggestion. This point was also raised by [Reviewer #2 - Major point 3]. Please see above for our detailed response.

      In Figure 5 E, given the severe reduction in size in the stimulated Pdi KD animals, the authors should show images of the unstimulated nerve as well. Do the nerve terminals actually shrink in size in these animals following stimulation, rather than expand? The NMJ looks substantially smaller than a normal L3 NMJ, though their quantification of neurite size in F suggests they're normal until stimulation.

      We share the same interpretation of the data with the reviewer that the neurite area is reduced post-potassium stimulation in pdi knockdown animals. We will follow the reviewer’s suggestion and add an image showing unstimulated neuromuscular junctions.

      Minor points:

      The authors claim that there is an enrichment of ASD-related genes in their final list of ~1400 genes that are enriched in glial processes. It is well-appreciated that synaptically-localized mRNAs are generally linked to ASDs. Can the authors comment on whether the transcripts localized to glial processes are even more linked to ASDs and neurological disorders than transcripts known to be localized to neuronal processes?

      This is an interesting point. To address the comment, we will add a comparison of the degree of enrichment of ASD-related genes in neurite vs. glial protrusions in the revised manuscript.

      __*

      *__

      • *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1


      1. The use of blue/green or blue/green/magenta is difficult to resolve in some places. Swapping blue for cyan would greatly aid in visualizing their data.
      2. *

      This comment is much appreciated. We have swapped blue for cyan in Figures 4 and S4. We have also changed Figure S1 to increase contrast and visibility as per reviewer’s comment.

      Make the colouring/formatting of the tables more consistent, its distracting when its constantly changing (also there is no need for a blue background.. just use a basic white table).

      This comment is much appreciated. We have applied a consistent colour palette to the Tables without background colourings and made the formatting uniform.

      • *

      Reviewer #2

      • *

      Introduction: 'Asymmetric mRNA localization is likely to be as important in glia, as it is in neurons,...'. Remove commas

      Thank you for pointing this mistake out. We have made the corresponding edits.

      • *

      Reviewer #3

      RNA localization in oligodendrocytes has been well studied and characterized. The authors should cite and discuss those papers (PMID: 18442491; PMID: 9281585).

      We thank the reviewer for this useful suggestion. We have added these references to the paper.

      • *

      • *

      Reviewer #4

      • *

      • In Figure 5D, the authors should include a label to indicate that these images are from an unstimulated condition. We thank the reviewer for pointing this out. We have added the label as requested.

      The authors are missing a number of key citations for studies that have explored the functional significance of mRNA trafficking in glia, and those that have validated activity-dependent translation:

      - ____https://pubmed.ncbi.nlm.nih.gov/18490510____/

      -____https://pubmed.ncbi.nlm.nih.gov/7691830____/

      -____https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001053

      -____https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7450274____/

      -_https://pubmed.ncbi.nlm.nih.gov/36261025_*/

      *__

      We thank the reviewer for the comment. We have added these references to the text.

      • *

      4. Description of analyses that authors prefer not to carry out

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the Reviewers for their helpful and constructive comments. In response to these suggestions we have performed new experiments and amended the manuscript, as we describe in our detailed response below.

      2. Point-by-point description of the revisions

      Reviewer #1

      1. The Reviewer notes that while our analysis of centrosome size was comprehensive, we provided no analysis of centrosomal MTs, pointing out that while centrosome size declines as the embryos enter mitosis, the ability of centrosomes to organise MTs might not. This is a good point, and we now provide an analysis of centrosomal-MT behaviour (Figure 2). We find that there is a dramatic decline in centrosomal MT fluorescence at NEB, although the pattern of centrosomal MT recruitment prior to NEB is surprisingly complex.

      The Reviewer questions how PCM client proteins can be recruited in different ways by the same Cdk/Cyclin oscillator. We apologise for not explaining this properly. It is widely accepted that Cdk/Cyclins drive cell cycle progression, in part, by phosphorylating different substrates at different activity thresholds (e.g. Coudreuse and Nurse, Nature, 2010; Swaffer et al., Cell, 2016). Moreover, it is also clear that Cdk/Cyclins can phosphorylate the same protein at different sites at different activity thresholds (e.g. Koivomagi et al., Nature, 2011; Asafa et al., Curr. Biol., 2022; Ord et al., Nat. Struct. Mol. Biol., 2019). Thus, we hypothesise that rising Cdk/Cyclin cell cycle oscillator (CCO) activity phosphorylates multiple proteins at different times and/or at different sites to generate the complicated kinetics of centrosome growth. We now explain this point more clearly throughout the manuscript.

      The Reviewer is puzzled as to how we conclude that Cdk/Cyclins phosphorylate Spd-2 and Cnn at all the potential Cdk/Cyclin phosphorylation sites we mutate in our study. The Reviewer is right that we cannot make this conclusion, and we did not intend to make this claim. As we now clarify (p11, para.1), although it is unclear if Cdk/Cyclins phosphorylate Spd-2 or Cnn on all, some, or none of these sites, if either protein can be phosphorylated by Cdk/Cyclins, then these mutants should not be able to be phosphorylated in this way—allowing us to address the potential significance of any such phosphorylation. We now also note that several of these sites have been shown to be phosphorylated in embryos in Mass Spectroscopy screens (Figure S6).

      The Reviewer highlights differences in how Spd-2 and Cnn help recruit γ-tubulin to centrosomes (Figure 6). They ask for a more detailed description, and are puzzled as to how this is compatible with direct regulation by a single oscillator. We now explain our thinking on this important point in much more detail. It appears that Spd-2 helps recruit γ-tubulin throughout S-phase, while Cnn has a more prominent role in late S-phase (Figure 6). This is consistent with our overall hypothesis of CCO regulation, as we postulate that low-level CCO activity promotes the Spd-2/γ-tubulin interaction in early S-phase, while higher CCO activity promotes the Cnn/γ-tubulin interaction in late-S-phase, potentially explaining the increase in the rate of γ-tubulin (but not γ-TuRC) recruitment we observe at this point (see minor comment #1, below, for an explanation of the various γ-tubulin complexes in flies). This is consistent with recent literature showing that CCO activity promotes γ-tubulin (but not γ-TuRC) recruitment by Cnn/SPD-5 in worms and flies (Ohta et al., 2021; Tovey et al., 2021).

      The Reviewer was not convinced by our model (Figure 8, now Figure 9), raising two major concerns. First, they were unsure how a single oscillator could generate different patterns of protein recruitment. We addressed this in point #2 and #4, above, where we explain how different thresholds of CCO activity trigger different events, so there is no expectation that we should observe steady changes in recruitment over time as CCO activity rises. Second, they questioned how modest levels of Cdk/Cyclin activity can promote recruitment, while high levels of activity can inhibit recruitment. In point #1, above, we cite several examples where such positive and negative regulation by different Cdk/Cyclin activity levels have been described. We also now explain throughout the manuscript why this hypothesis provides a plausible explanation for our results: with moderate CCO activity promoting Spd-2-dependent PCM-client recruitment in early S-phase; higher CCO activity promoting a decrease in Spd-2 recruitment in mid-late-S-phase (so centrosomal Spd-2 levels decline); and even higher levels of CCO activity leading to a decrease in the interactions between the client proteins and the Spd-2/Cnn scaffold as the embryos enter mitosis (so the client proteins are rapidly released from the centrosome).

      The Reviewer also raised the important point here that our model does not explain why the mutant forms of Spd-2 and Cnn accumulate to higher levels at the start of S-phase, and not just at the end of S-phase/entry into mitosis. We apologise for not explaining this properly. The accumulation of the mutant proteins (particularly Spd-2, Figure 5C) in early-S-phase occurs because the excess mutant protein that accumulates at centrosomes in late-S-phase/mitosis is not removed properly from centrosomes during mitosis (presumably because there is insufficient time). Thus, centrosomes still have too much mutant Spd-2 at the start of the next S-phase. We show this in Reviewer Figure 1 (attached to this letter), which tracks Spd-2 behaviour further into mitosis, and now explain this in more detail in the text (p12, para.1).

      The Reviewer questions how the CCO can both induce centrosome growth and also switch it off, as it is unclear how an oscillator that only phosphorylates sites to decrease centrosome binding could also promote growth. They ask if we can identify and mutate any Cdk/Cyclin sites in centrosome proteins that promote centrosome recruitment. As we now clarify, we did not intend to claim that the CCO only phosphorylates sites that decrease the centrosome binding of proteins, although we do hypothesise that such phosphorylation is important for switching off centrosome growth in mitosis. In addition, we hypothesise that moderate levels of CCO initially promote centrosome growth, and our data suggests that the CCO does this, at least in part, by promoting Polo recruitment (Figure 8). We speculate that the CCO phosphorylates specific Polo-box-binding sites in Ana1 and Spd-2, the main proteins that recruit Polo to centrioles. We agree that identifying these sites is an important next step, but it is complicated as our studies indicate that multiple sites contribute in a complex manner. Importantly, it is well established that the CCO triggers centrosome growth as cells prepare to enter mitosis, so our hypothesis that moderate levels of CCO activity initiate centrosome growth is not new or controversial.

      Minor Comments

      1. The reviewer asks how we explain the different incorporation profiles we observe for the different subunits of the γ-tubulin ring complex. We apologise for not discussing this point. In flies there is a “core” γ-tubulin-small complex (γ-TuSC) and a larger γ-tubulin-ring complex (γ-TuRC) that contains the Grip71, Grip75 and Grip128 subunits we analyse here (Oegema et al., JCB, 1999). The γ-TuSC functions independently of the γ-TuRC so γ-tubulin and γ-TuRC components can behave differently.

      The Reviewer questions why we claim an “inverse-linear” relationship between S-phase length and the centrosome growth rate when the relationship is not linear (Figure 3, now Figure S3). I was originally confused by this as well but, mathematically, a linear relationship means y is proportional to x, whereas an inverse-linear relationship means y is proportional to 1/x. Thus, an inverse-linear relationship between x and y does not plot as a straight line, but rather as the curves we show on the graphs. We now explain this in text (p9, para.2).

      Reviewer #2

      This Reviewer found the manuscript hard to follow, so we are very grateful that they took the time to try to understand it. We agree that the subject matter is complicated, and that our presentation was not always helpful. The Reviewer’s comments have been very useful in helping us to identify (and hopefully improve) areas of particular difficulty.

      Major points:

      1. The Reviewer highlights that the two experimental approaches underpinning our main conclusions are problematic: (1) Experiments with mutants of Spd-2 and Cnn that theoretically cannot be phosphorylated by Cdk/Cyclins are hard to interpret as these mutations may have other effects; (2) It is unclear whether reducing Cyclin B levels reduces peak CDK activity or simply slows the time it takes to reach peak levels. They suggest a more direct test of our model would be to analyse PCM recruitment in embryos arrested in S-phase or mitosis. (1) We agree that the mutations designed to prevent Cdk/Cyclin phosphorylation could perturb function in other ways, but this is true for any such mutation, and there are many papers that infer a function for Cdk/Cyclin phosphorylation from such experiments. Importantly, the centrosomal accumulation of the phospho-null mutants actually slightly increases compared to WT (Figure 5C and I), and we now show that the centrosomal accumulation of a phosphomimicking Spd-2-Cdk20E mutant slightly decreases (Figure S8). We now acknowledge the potential caveat of a non-specific perturbation of protein function, but feel that the reciprocal behaviour of the phospho-null and phospho-mimicking mutants somewhat mitigates this concern (p12, para.2). (2) Fortunately, and as we now clarify, it has recently been shown that reducing Cyclin levels does not reduce peak Cdk activity, but rather slows the time it takes to reach peak activity (Figure 2A, Hayden et al., Curr. Biol., 2022). Thus, the cyclin half-dose experiments provide an excellent alternative test of our hypothesis as they show that the WT proteins can exhibit similar behaviour to the mutants if the rate of Cdk/Cyclin activation is slowed. We feel the evidence supporting our hypothesis is strong enough that it warrants serious consideration. The suggestion to look at PCM recruitment in embryos arrested in either S-phase or M-phase is a good one, but these experiments produce complicated data. In M-phase arrested embryos, for example, Cnn levels continue to rise (see Figure 1G, Conduit et al., Dev. Cell, 2014), but the other PCM proteins do not (unpublished); in S-phase arrested embryos (arrested by mitotic cyclin depletion) centrosomes continue to duplicate, but now do so asynchronously, greatly complicating the analysis (McCleland and O’Farrell, Curr. Biol.., 2008; Aydogan et al., Cell, 2020). The centrosomes that don’t duplicate, however, reach a constant steady-state size (where the rate of centrosome protein addition is balanced by the rate of loss). These observations are consistent with our recent mathematical modelling of mitotic PCM assembly (Wong et al., 2022) if we additionally account for cell cycle regulation (which was not considered in our original model). We believe such analyses are beyond the scope of the current paper and we plan to publish a second paper incorporating our new hypothesis into our mathematical modelling.

      The Reviewer questions whether our methods accurately measure centrosomal protein accumulation, pointing out that γ-tubulin and Grip128 occupy different centrosomal areas—which should not be possible if they are part of the same complex. They suspect that our use of different transgenes with different promotors could explain these differences. As we should have described (see point #1 in our response to the minor comments of Reviewer #1), γ-tubulin exists in two complexes in flies, only one of which contains Grip128, so γ-tubulin and Grip128 exhibit different localisations. Moreover, as we now show (Figure S2), using different promotors does not seem to make a difference to overall recruitment kinetics. Thus, we are confident that our methods measure centrosome protein recruitment dynamics accurately.

      The Reviewer is concerned that our measurements of centrosome size based on fluorescence intensity (Figure 1) and centrosomal area (Figure S1) do not always match. They suggest a potential reason for this is that proteins are not uniformly distributed within centrosomes, and this may impact our ability to measure protein accumulation based on 2D projections (noting, for example, that Polo and Spd-2 are concentrated at centrioles and in the PCM, potentially explaining the different shape of their growth curves compared to the client proteins). When the centrosome-fluorescence-intensity and centrosome-area recruitment profiles of a protein do not match, the average “centrosome-density” of that protein must be changing over time. In some cases, we understand why density changes. Cnn, for example, stops flaring outwards on the centrosomal MTs during mitosis so its centrosomal area decreases even as its fluorescence intensity increases (leading to an increase in its centrosomal-density). We agree (and now discuss—p19, para.3) that the prominent accumulation of Spd-2 and Polo at centrioles could help to explain why Spd-2 and Polo accumulation dynamics differ from the client proteins.

      Other points:

      The Reviewer suggests it would be good to know how much Polo at the centrosome is active____. We agree, but although commercial antibodies against PLK1 phosphorylated in its activation loop work in cultured fly cells, we cannot get them to work in embryos. Moreover, the recruitment of Polo/PLK1 to its site of action by its Polo-Box Domain is sufficient to partially activate the kinase independently of phosphorylation (Xu et al., NSMB, 2013). Thus, it seems likely that all the Polo/PLK1 recruited to centrosomes will be at least partially activated, even if it is not necessarily phosphorylated on its activation loop.

      The Reviewer asks if it is clear that less Spd-2 and Cnn are recruited to centrosomes in the half gene-dosage embryos. We apologise for not mentioning that this is indeed the case. We showed this previously for Cnn (Conduit et al., Curr. Biol., 2010) and we now state that this is also the case for Spd-2. We do not show the Spd-2 data as we plan to publish a comprehensive dose-response curve of Spd-2 (and Cnn) recruitment in our next modelling paper.

      Would it not be relevant to examine Polo ½ dosage embryos? We do have this data (Reviewer Figure 2), attached to this letter, but it is quite complicated to interpret (as we explain in the legend). We feel it would be more appropriate to include this in our next modelling paper where we can properly explain the behaviours we observe. Publishing this data here would distract from our main message without changing any of our conclusions.

      The Reviewer asks why the non-phosphorylatable Spd-2 protein is also present at higher levels on centrosomes at the start of S-phase (not just the end of S-phase). This was also raised by Reviewer #1 (point #5), so please see the second paragraph of our response there.

      Minor/Discussion Points:

      We thank the Reviewer for highlighting that absolute and relative centrosome size control are different things and we have amended the manuscript accordingly.

      The Reviewer questions whether it is accurate to describe Spd-2 and Polo as scaffold proteins, noting that only Cnn has been shown to have scaffolding properties. There is strong evidence that Spd-2 has Cnn-independent scaffolding properties in flies (e.g. Conduit et al., eLife, 2014), but this is a fair point for Polo. We think it is justified to separate Polo from other client proteins as Polo is essential for scaffold assembly, whereas other client proteins are not. We now define our scaffold/client terminology to avoid confusion (p4, para.3).

      The Reviewer highlights several points related to differences in recruitment kinetics (also touched on in points #2 and #3, above), noting we don’t discuss properly the idea of two different modes of PCM recruitment. These are all good points, largely addressed in our response to points #2 and #3, above. We now discuss much more prominently the two different modes of client protein recruitment throughout the manuscript.

      As we now clarify, in all our experiments we use centrosome separation and nuclear envelope breakdown (NEB) to define the start and end of S-phase, respectively.

      The Reviewer quotes the landmark Woodruff paper (Cell, 2017) as showing that the ability to concentrate client proteins (including ZYG-9, the worm homologue of Msps) is an intrinsic property of the PCM scaffold, so how do we explain that Msps departs prior to NEB while Cnn continues to accumulate? It is indeed a striking observation of our study that all PCM client proteins (not just Msps) start to leave the centrosome prior to NEB, even as Cnn levels continue to accumulate. Our hypothesis is that this ‘leaving’ event is triggered by a threshold level of Cdk/Cyclin activity—explaining why these client proteins all start to leave the PCM at the same time (just prior to NEB) irrespective of nuclear cycle length. This is not incompatible with the Woodruff paper, which did not attempt to reconstitute any potential regulation by Cdk/Cyclins in their in vitro studies.

      The Reviewer questions why Spd-2 that cannot be phosphorylated by Cdk/Cyclins (Spd-2-Cdk20A) accumulates abnormally at centrosomes in late S-phase, yet γ-tubulin (which is recruited by Spd-2) seems to leave centrosomes more slowly in the presence of the mutant protein. As we now explain more clearly, there is no contradiction here. Spd-2-Cdk20A accumulates to abnormally high levels in late-S-phase/early mitosis (Figure 5C), and this reduces the γ-tubulin dissociation rate, as we would predict (Figure 7B, right most graph). It does not “prevent” dissociation, however, (as the Reviewer seems to suggest it should?), but this is probably because these experiments have to be performed in the presence of large amounts of the WT Spd-2 (Figure 5A).

      The referencing error has been corrected.

      The Reviewer asks why in Figure 1 not all of the centrosome proteins could be followed for the full time period (as we mention in the legend, but do not explain). There are different reasons for different proteins: (1) Polo cannot be followed in mitosis as it binds to the kinetochores, making it impossible to accurately track centrosomes (so the data for mitosis is missing for Polo); (2) Cnn exhibits extensive flaring at the end of mitosis/early S-phase (Megraw et al., JCS, 1999), so we cannot track individual separating centrosomes labelled with NG-Cnn in early S-phase until they have moved sufficiently far-apart (so the early S-phase time-points are missing for Cnn); (3) In addition, several of the client proteins bind to the mitotic spindle, so although we can still track and measure the centrosomes in late mitosis in the graphs, we don’t show pictures of these late mitosis centrosomes in the montage in Figure 1A as the images look a bit odd. We now explain these reasons in the Materials and Methods.

      We now indicate that nuclear cycle 12 (NC12) is being analysed in Figures 4-8.

      The reviewer questions why we don’t show the decrease rate for γ-tubulin in Figure 6 (the Spd-2 and Cnn half-dose experiments), when we do show it in Figure 7 (the Spd-2 and Cnn Cdk-mutant experiments), suspecting that it is slowed in both cases. The reviewer is correct and we now show this data for both sets of experiments.

      We have corrected the labelling error in Figure S1.

      The Reviewer suggest moving some of the data from the main Figures, and the entirety of Figures 2 and 3 to the Supplemental Information. We understand this point, and agree that the amount of data presented in Figures 1-3 is somewhat overwhelming. We have played around with the Figures a lot—in particular trying to show a few examples of the data and moving the rest to Supplementary—but it is hard to pick a “typical” example, and the power of comparing the behaviour of so many different centrosome proteins is somewhat lost. We have tidied up several Figures and, as a compromise, we keep Figure 2 (now Figure 3) in the main text, but have moved Figure 3 to Supplementary (now Figure S5).

      The Reviewer suggests that we should repeat the analysis of Spd-2, Polo and Cnn dynamics that we show here, as we already presented this data in a previous publication (Wong et al., EMBO. J, 2022). We understand this point, but feel this would be a less accurate comparison, as essentially all of the data shown in Figure 1 was obtained several years ago during a contiguous ~6month period. Since then, the lasers and software on our microscope system have been updated, so it would probably be less fair of a comparison to obtain new data for a subset of these proteins (and it seems overkill to perform the entire analysis again). We clearly state that this data has been presented previously, so we hope the Reviewer will agree that it is acceptable to present it again here so readers can more easily compare the data.

      Reviewer #3

      This Reviewer is broadly supportive of the manuscript, but to publish in a prestigious journal they think additional experimental evidence will be required to support our hypothesis.

      The Reviewer notes that our only evidence that Cdk/Cyclins directly phosphorylate Spd-2 comes from our analysis of the Spd-2-Cdk20A mutant, as the effect of reducing Cyclin B dosage on WT Spd-2 behaviour is very modest. They request that we analyse the behaviour of a Spd-2-Cdk20E phospho-mimicking mutant. The effect of halving the dose of Cyclin B on Spd-2 behaviour is modest, but this is what we would predict as all we are doing in this experiment is slowing S-phase by ~15%, so Spd-2 should accumulate at centrosomes for a slightly longer time and to a slightly higher level (as we observe, Figure 5E). A great advantage of the early fly embryo system is that we can compare the behaviour of many hundreds of centrosomes, so even subtle differences like this are usually meaningful. To illustrate this point, we have now repeated the Spd-2 analysis in WT and CycB1/2 embryos (but now using a CRISPR/Cas9 Spd-2-NG knock-in line) and we see the same subtle differences (Figure S9). In addition, as requested, we have now analysed the behaviour of a Spd-2Cdk20E mutant protein using an mRNA injection assay (as it would have taken too long to generate and test new transgenic lines). In this assay we injected embryos with mRNA encoding either WT Spd-2-GFP, Spd-2-Cdk20A-GFP or Spd-2-Cdk20E-GFP. The mRNA is quickly translated, and we computationally measured the fluorescence intensity of the centrosomes in mid-S-phase (i.e. at the Spd-2 peak) (Figure S8). This analysis confirms that Cdk20A accumulates to slightly higher levels, and reveals that Cdk20E accumulates to slightly lower levels, than the WT protein. Together, these new experiments strongly support our original conclusions.

      The Reviewer notes that we propose that the CCO initially promotes centrosome growth by stimulating Polo recruitment to centrosomes, but states that we only provide indirect evidence for this by showing that centrosomal Polo levels are strongly reduced in Cyclin B half-dose embryos. They suggest we determine Spd-2 levels in Polo half-dose embryos, and/or the centrosome levels of mutant forms of Spd-2 that cannot be phosphorylated by Polo. We believe the Cyclin B half-dose experiment provide direct support for our hypothesis that Cdk/Cyclin activity influences Polo recruitment (Figure 8), although, clearly, we have not identified the mechanism. We do, however, suggest a plausible mechanism: Ana1 and Spd-2 are largely responsible for recruiting Polo to centrosomes, and we have previously shown that several of the potential phosphorylation sites in these proteins that help recruit Polo to centrosomes are Cdk/Cyclin or Polo phosphorylation sites (Alvarez-Rodrigo et al., eLife, 2020 and JCS, 2021; Wong et al., EMBO J., 2022). We are currently testing this hypothesis, but progress is slow as it is clear that multiple sites in both proteins can influence this process.

      As the Reviewer requests, we have now also examined how Spd-2 and Cnn behave in Polo half-dose embryos (Reviewer Figure 2, attached to this letter). As we describe in the Figure legend, this data is informative, but is complicated. With relatively minor, but mechanistically important, tweaks to our previous mathematical modelling we can explain these behaviours, but introducing such a significant mathematical modelling element would be beyond the scope of this paper. As described above, these findings will form the basis of a follow-up paper that is more mathematically oriented.

      It is a great idea to look at mutant forms of Spd-2 that cannot be phosphorylated by Polo, but the consensus Polo phosphorylation site (N/D/E-X-S, with the N/D/E at -2 and the S at 0 being preferences, rather than a strict rule) is less well-defined than the consensus Cdk/Cyclin phosphorylation site (where the Pro at -1 is essentially invariant). Thus, we cannot accurately predict which sites would need to be mutated to generate such a mutant.

      The Reviewer requests that we analyse the behaviour of TACC in embryos expressing the Spd-2-Cdk20A and Cnn-Cdk6A (as we do in Figure 7 for γ-tubulin). This is a reasonable request, but we prefer not to show this data as we have recently identified an interesting interaction between TACC, Spd-2 and Aurora A that will be the subject of another paper we hope to submit shortly. This data is hard to interpret without explaining these interactions properly, which is beyond the scope of the current manuscript.

      We hope the Reviewers will agree that these changes have improved the manuscript substantially, and that it is now suitable for publication. We would like to thank them again for taking the time to read this rather complicated paper so thoroughly.

      We look forward to hearing from you.

      Yours sincerely,

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the reviewers for their appreciation of the interest, novelty and quality of our study, and their useful feedback to improve its presentation.

      We have revised the manuscript addressing all the points they made, as detailed below, section by section, following the organization in the reviews. The corresponding changes are highlighted in yellow (new text) or crossed out (deleted text) in our revised manuscript.

      In case it is useful for the editor to check how each individual point was addressed, we also have extracted from the reviews each individual reviewer’s comment and our direct response, listed as bullet points at the end of this text.

      2. Point-by-point description of the revisions

      I - General criticisms

      Reviewer #1: My main criticism is unfortunately inherent to the approach: comparative studies are absolutely critical, but they can only provide a very sparse sampling of diversity. Fortunately, thanks to high-throughput sequencing, bioinformatic analyses can now be performed on a large number of species, but experimental validation is typically restricted to two or three species. The consequence of this for the present manuscript is that while the functional conservation of the Gwl site is convincingly shown, the exact mechanisms responsible for the reduced effect of PKA phosphorylation remain relatively vaguely defined. Indeed, in their Discussion the authors list a number of experimental approaches to address this - but I understand that these would all involve substantial efforts to address. In particular, testing chimeric constructs around the consensus PKA site and from multiple species could be very informative.

      We completely agree with the reviewer that comparative approaches are critical to understanding biological mechanisms, and are excited by the increasing possibilities to perform not only sequence and descriptive comparisons but functional studies across a range of emerging model organisms. We hope that more and more researchers in cell and molecular biology will profit from experimental tools and techniques now available in such species, and to pioneer new ones. Of course, and he/she rightly points out, conclusions are currently limited by the number of species studied, but comparisons between two judiciously chosen species can already be very informative. Thus, in our study, the use of Xenopus and Clytia allowed us to make significant progress towards our main objective of understanding the cAMP-PKA paradox in the control of oocyte maturation; specifically by showing both that PKA phosphorylation of Clytia ARPP19 is lower in efficiency and that the phosphorylated protein has a lower effect on oocyte maturation than the Xenopus protein. As the reviewer points out, unravelling the exact mechanisms underlying these differences will require a large amount of additional work and is beyond the scope of the current study. Actually, we have embarked on several series of experiments to this end using some of the approaches listed in the Discussion. Specifically, we are testing the biochemical and functional properties of chimeric constructs containing the consensus PKA site from various species. This is a substantial undertaking which will require one to two years to complete, but is already giving some very interesting findings.

      Reviewer #1: The figures and text could be slightly condensed down to about 6 figures.

      We have reduced the number of figure panels but we prefer to maintain the number of figures, because the experimental data presented in them is essential to the interpretation of our results and the overall conclusions of the article. If the journal editor would like us to reduce the number of figures, we could do this by displacing Figure 4 and some panels of other figures (to then fuse some of them) to supplementary material, but this would be a pity.

      ______________II - Abstract__

      As recommended by Reviewer #2, we have reworked the Abstract to make it more accessible to new readers, attempting to bring out more clearly and simply the main results and conclusions of the study. We correspondingly simplified and shortened the title of the article. Changes: Page 2.

      ____________III- Introduction points__

      Reviewer #2: I believe that it would be interesting to include some time-references when introducing the prophase arrest of Clytia and Xenopus oocytes. How long is prophase arrest in Xenopus compared to Clytia or other organisms? How can this affect the prophase arrest mechanisms? It seems that the prophase arrest in Xenopus oocytes is found to be significantly more prolonged compared to Clytia and various other organisms, and also meiotic maturation proceeds much more rapidly in Clytia than in Xenopus. This should be indicated in the introduction with a short introduction of why, and not others, were these species chosen for this study.

      Differences in timing of oocyte prophase arrest and in maturation kinetics across animals are indeed highly relevant in relation to the underlying biochemical mechanisms. Unfortunately, not enough information is currently available concerning the duration of the successive phases of oocyte prophase arrest across species to make any meaningful correlations with PKA regulation of maturation initiation. We have nevertheless expanded the Introduction to cover this issue as follows:

      • We start the introduction by mentioning how the length of the prophase arrest varies across species. __Changes: Page 3, lines 5-11. __
      • We have added examples of species which likely have similar durations of prophase arrest but show cAMP-stimulated vs cAMP-inhibited release. Changes:____ Page 4, lines 28-35.
      • We have specified the temporal differences in meiotic maturation in Xenopus (3-7 hrs) and Clytia (10-15 min). Changes: Page 5, lines 32-33. Reviewer #2: why, and not others, were these species [Xenopus, Clytia] chosen for this study. A brief justification is included in lines 1-page 5 "..a laboratory model hydrozoan species well suited to oogenesis studies", but it does not explain why this and not other hydrozoan species like Hydra, that has also been used for meiosis studies.

      As requested by Reviewer #2, fuller details are now included about the advantages of Clytia compared to other hydrozoan species, citing several articles and recent reviews here and also in the Discussion. Changes: Page 5, lines 21-32 & 37-39.

      Hydra is a classic cnidarian experimental species and has proved an extremely useful model for regeneration and body patterning, but is not suitable for experimental studies on oocyte maturation because spawning is hard to control and fully-grown oocytes cannot easily be obtained, manipulated or observed. In contrast many hydromedusae (including Clytia, Cytaeis, and Cladonema) have daily dark/light induced spawning and accessible gonads, so provide great material for studying oogenesis and maturation. Of these, Clytia has currently by far the most advanced molecular and experimental tools.

      Reviewer #2: The proteins MAPK is not introduced properly, as it is first mentioned in the results section in line 12. Given the importance of the results provided with it, it should be presented in the introduction prior to the results section.

      As requested by Reviewer #2, the involvement of MAPK activation during Xenopus oocyte meiotic maturation is now introduced, explaining how its phosphorylation serves as a marker of Cdk1 activation. Changes: Page 5, lines 1-5.

      Reviewer #2: *These sentences need a more elaborate explanation: Page 4 Lines 16-17 "... no role for cAMP has been detected in meiotic resumption, which is mediated by distinct signaling pathways" Which pathways? *

      We now give the example of the well-characterized pathway Gbg-PI3K pathway for oocyte maturation initiation in the starfish. Changes: Page 4, lines 1-15.

      Reviewer #2: Page 4 line 34-39. Introduction indicates that the phosphorylation of ARPP19 on S67 by Gwl is a poorly understood molecular signaling cascade (line 34). However, the positive role of ARPP19 on Cdk1 activation, through the S67 phosphorylation by Gwl, appears to be widespread across all eukaryotic mitotic and meiotic divisions studied (lines 36-37). These two sentences seem a little contradictory. If the general pathway has been identified but the signaling cascade is still not well described, please indicate that in a clearer way.

      We apologise that the wording we used was not clear and implied that the mechanisms of PP2A inhibition by Gwl-phosphorylated ARPP19 were poorly understood. On the contrary, they are very well studied. The part that remains mysterious concerns the upstream mechanisms. We have reworded the paragraph to make this point unambiguous. Changes: Page 5, lines 1-8.

      ______________IV - Results__

      Reviewer #2: The text of the results is generally well described; however, all the sections start with a long introductory paragraph. I believe this facilitates the contextualization of the experiments, but please try to summarize when possible. For example, in page 5 lines 12-25, or page 7 lines 30-37, are all introduction information.

      As requested by Reviewer #2, we have shortened or removed the introductory passages of the Results section paragraphs, which were redundant with the information given in the introduction. We did not restrict to the two examples cited by the reviewer, but have shortened all the Results passages that repeat information already provided in the Introduction. Changes: Page 7, lines 3-4 & 14-16 & 36-37 - Page 8, lines 12-15 - Page 8, lines 37-40 & Page 9, lines 1-6.

      Reviewer #2: Page 7, Lines 14-19 present a general conclusion of the findings explained in lines 20-27. I think these results are important and they should be explained better, in my opinion they are slightly poorly described.

      We have followed the reviewer's recommendation. The explanation of the experiments and the results are more detailed and the paragraph ends with a general conclusion which came too early in the previous version. Changes: Page 8, lines 22-24 & 32-34.

      Reviewer #2: Page 8, lines 16-17: "It was not possible to increase injection volumes or protein concentrations without inducing high levels of non-specific toxicity". What are the non-specific toxicity effects? How was this addressed? What fundaments this conclusion?

      Clytia oocytes are relatively fragile. Sensitivity of oocytes to injection varies between batches, while in general increasing injection volumes or protein concentrations increases the levels of lysis observed. We do not know exactly what causes this but lysis can happen either immediately following injection or during the natural exaggerated cortical contraction waves that accompany meiotic maturation, suggesting that it relates to mechanical trauma. We have expanded this paragraph and the legend of Fig. 3C to explain these injection experiments more fully in the text and to clarify these issues. Changes: Page 9, lines 16-29 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      Same paragraph: Lines 25-27 of page 8. Text reads, "These results suggest that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia, although we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD.". Please provide evidence if higher concentrations of OA or Gwl were tested to state this conclusion.

      As explained above, we could not increase the concentrations of ARPP19 protein beyond 4mg/ml. It is important to note that at the same concentration, both Clytia and Xenopus proteins induce activation of Cdk1 and GVBD in the Xenopus oocyte.

      Concerning OA, it is well documented in many systems including Xenopus, starfish and mouse oocytes as well as mammalian cell cultures, that high concentrations lead to cell lysis/apoptosis as a result of a massive deregulation of protein phosphorylation (Goris et al, 1989; Rime & Ozon, 1990; Alexandre et al, 1991; Boe et al, 1991; Gehringer, 2004; Maton el al, 2005; Kleppe et al, 2015). Specific tests in Xenopus oocytes, have shown that injecting 50 nl of 1 or 2 mM OA specifically inhibits PP2A, while injecting 5 mM also targets PP1 and higher OA concentrations inhibit all phosphatases. For these reasons, we did not increase OA concentrations over 2 mM. When injected in Xenopus oocyte at 1 or 2 mM, OA induces Cdk1 activation, GVBD but then the cell dies because PP2A has multiple substrates essential for cell life. When injected at 2 mM in Clytia oocytes, OA does not induce Cdk1 activation nor GVBD but promotes cell lysis. This supports the conclusion that 2 mM OA is sufficient to inhibit PP2A (and possibly other phosphatases) but that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia.

      We have reworded the relevant text to make these points clearer. The previous statement that “we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD” has been removed because it was unnecessarily cautious in the context of the literature cited above, as now fully explained. Changes: Page 9, lines 31-35 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      References: Alexandre et al, 1991, doi: 10.1242/dev.112.4.971; Boe et al, 1991, doi: 10.1016/0014-4827(91)90523-w; Gehringer, 2004, doi: 10.1016/s0014-5793(03)01447-9; Goris et al, 1989, doi: 10.1016/0014-5793(89)80198-x; Kleppe et al, 2015, doi: 10.3390/md13106505; Maton el al, 2005, doi: 10.1242/jcs.02370; Rime & Ozon, 1990, doi: 10.1016/0012-1606(90)90106-s

      Reviewer #2: Lines 12-13: the sentence "This in vitro assay thus places S81 as the sole residue in ClyARPP19 for phosphorylation by PKA." is overstated. As not all residues had been tested, please indicate that "it is likely that" or "among the residues tested", in contrast to "the sole residue in ClyARPP19".

      We realise that we had not explained clearly enough how the thiophosphorylation assay works. In this assay, γ-S-ATP will be incorporated into any amino acid of ClyARPP19 phosphorylatable by PKA. The observed thiophosphorylation of the wild-type protein, demonstrates that one or more residues are phosphorylated by PKA. This thiophosphorylation was completely prevented by mutation of a single residue, S81. This experiment thus shows that S81 is entirely responsible for phosphorylation by PKA in this assay. We have rewritten this section more clearly. Changes: Page 10, lines 18-28.

      ______________V - Figures and text related to the figures__

      Figure 1A

      Reviewer #2: *Why is mouse not included in Figure 1A? Although it might be very similar to human, given that mouse is the species that is most commonly use as a mammalian model, I believe it could be included. However, this is optional upon decision by the authors. *

      We have replaced the human sequence in Figure 1A with the mouse sequence as suggested. The sequences of each of the mouse and human ENSA/ARPP19 proteins are indeed virtually identical across mammals. Changes: Fig. 1A.

      Figure 1C

      Reviewer #2: *There should be a better explanation in the text of the results sections for the image included in in Fig1 C. Note that Clytia is not a commonly used species, therefore images should be properly explained for general readers. Please indicate in the text that ClyARPP19 mRNA is expressed in previtellogenic oocytes and not in vitellogenic, plus any additional information needed to understand the image. In addition, the detection of ARPP19 in the nerve rings is intriguing. This is mentioned in the discussion section, any idea of its function there? Please include some additional information or additional references, if they exist. *

      We have expanded the explanations of Fig. 1C in the text and in the figure legend. We have also added cartoons to the figure to help readers understand the organisation of the Clytia jellyfish and gonad. As now explained, ClyARPP19 mRNA is detected in oocytes at all stages, but the signal is much stronger in pre-vitellogenic oocytes because all cytoplasmic components including mRNAs are significantly diluted by high quantity of yolk proteins as the oocytes grow to full size. Changes: page 7, line 40 & page 8, lines 1-9 - Fig. 1C - Legend page 31, lines 19-31.

      Nothing is known about the function of ARPP19 in the Clytia nervous system. The only data linking ARPP19 and the nervous system concerns mammalian ARPP16, an alternatively spliced variant of ARPP19. ARPP16 is highly expressed in medium spiny neurons of the striatum and likely mediates effects of the neurotransmitter dopamine acting on these cells (Andrade et al, 2017; Musante et al, 2017). This point is included in the Discussion in relation to the hypothesis that PKA phosphorylation of ARPP19 proteins in animals first arose in the nervous system and only later was coopted into oocyte maturation initiation. Changes: page 16, lines 12-13 & 17-20 - page 19, lines 6-9.

      Figure 2A

      Reviewer #1: Fig. 2A (and similar plots in subsequent figures): is it really necessary to cut the x axis? Would it be possible to indicate the number of oocytes for each experiment (maybe in the legend in brackets)?

      As requested by reviewer #1, the x-axis is no longer cut. The number of oocytes for each experiment is now provided in the legend of Fig. 2A and in similar plots of Fig. 5A and 5D. Changes: Fig. 2A - Legends page 31, lines 37-38 (Fig. 2A), page 33, line 25 (Fig. 5A) - page 33, line 34 (Fig. 5D).

      Figure 2D-E (as well as Figure 6C-D and Figure 8B-C)

      Reviewer #1: *Fig. 2D (and all similar plots below): I am lacking the discrete data points that were measured. Without these it is impossible to evaluate the fits. The half-times shown in 2E are somewhat redundant, and the information could be combined on a single plot. *

      We added all the data points to the concerned plots: 2D, 6C and 8B. As recommended by reviewer #1, we combined on a single plot the phosphorylation levels and the half-times. 2D-E => 2D, 6C-D => 6C and 8B-C => 8B. Changes: Figs 2D, 6C and 8B - Legends page 32, lines 9-14 (Fig. 2D), page 34, lines 24-30 (Fig. 6C) - page 35, lines 13-18 (Fig. 8B).

      Figure 3A and 3B

      Reviewer #1: Fig. 3: why is the blot for PKA substrates cut into 3 pieces? It would be clearer to show the entire membrane.

      In western blot experiments using Clytia oocytes, the amount of material was limited so the membranes were cut into three parts. The central part was incubated sequentially in distinct antibodies. We finally incubated all three parts of the membrane with the anti-phospho-PKA substrate antibody to reveal the full spectrum of proteins recognized by this antibody. The 3 pieces in Fig. 3A therefore together make up the same original membrane. We had separated them on the figure to make it clear that the membrane had been cut. In the new presentation, the 3 pieces are shown next to each other, making it clear that all the membrane is present, with dotted lines indicating the cut zone as explained in the legend. Changes: Fig. 3A and 3B - Legend page 32, lines 22-25 (Fig. 3A), lines 30-33 (Fig. 3B) - Page 24, lines 3-6 (Methods).

      Figure 3C

      Reviewer #2: Fig. 3C needs a better explanation in the text. The way these graphs are presented is somehow confusing. The meaning of the dots is not self-explanted in the graph, and it seems that each experiment was done independently but then the complete set of results is presented. Legend says that "each dot represents one experiment" but this is difficult to read as in every analysis the figure also indicates the average and the total number of oocytes. If authors wish so, they can keep the figure as it is, but then please explain this graph better in the text, and please include statistical analysis. These results are very robust, but a comparison between the number of oocytes that go through spontaneous GVBD of lysis in the different conditions will benefit their understanding.

      This figure is intended to provide an overview of all the Clytia oocyte injection experiments that we performed, for which full details are given in Supplementary Table 1. Since these experiments were not equivalent in terms of exact timing and types of observation (or films) made and oocyte sensitivity to injection -as ascertained by buffer injections-, it is not justified to make statistical comparisons between groups. We apologise that the presentation was misleading in this respect and hope that the new version is easier to understand. We removed from the figure the average percentage of maturation for each condition between experiments to avoid any misunderstanding of the nature of the data, and rather represent the values of each experiment independently. We also now explain the data included in the figure fully in the text and figure legend. Changes: Page 9, lines 16-39 - Fig. 3C and Supplementary Table 1 - Legend page 32, lines 34-41 & page 33, lines 1-11.

      Reviewer #2: Also, please provide in the text a plausible explanation for the cause of oocyte lysis for all experimental conditions (Fig 3C). Given that in the control experiments with buffer this effect is also observed in some oocytes, please explain if this is caused by a mechanical disruption of the oocyte during the injection. In contrast, okadaic acid induces the lysis in all the 14/14 oocytes analyzed, is this due also to the mechanical approach? Or is there other reason more related to the PP2A inhibition? Please explain.

      These points are treated above in the response to this reviewer concerning the Results section.

      Figure 5

      Reviewer #2: In Figure 5 D-F, cited in page 9 lines 35-35. Can you provide an explanation of why the time course of meiosis resumption was delayed?

      The binding partners/effectors of XeARPP19-S109D that are involved in maintaining the prophase arrest have not yet been identified. The most probable explanation of the delay in meiotic maturation induced by ClyARPP19-S109D is that Clytia protein recognizes less efficiently these unknown ARPP19 effectors that mediate the prophase arrest. As a result, maturation would be delayed, but not blocked. This explanation was provided in the Discussion (page 17, lines 14-17) and is now mentioned in the Results section. Changes: page 11, lines 16-19.

      ______________VI - Discussion__

      Reviewer #2: Although it presents highly interesting suggestions, discussion may border on being overly speculative, especially from line 37 of page 15 till the end.

      We agree and have reduced the speculation in this part of the discussion, in particular regrouping and reformulating ideas about evolutionary scenarios in a single paragraph. Changes: page 17, lines 37-41 - page 18, lines 1-41 - page 19, lines 1-18.

      SUMMARY - ____Point by point responses to individual reviewers’ comments in their order of appearance.

      Reviewer 1

      • The figures and text could be slightly condensed down to about 6 figures. We have reduced the number of figure panels but we prefer to maintain the number of figures, because the experimental data presented in them is essential to the interpretation of our results and the overall conclusions of the article. If the journal editor would like us to reduce the number of figures, we could do this by displacing Figure 4 and some panels of other figures (to then fuse some of them) to supplementary material, but this would be a pity.

      • The exact mechanisms responsible for the reduced effect of PKA phosphorylation remain relatively vaguely defined. Indeed, in their Discussion the authors list a number of experimental approaches to address this - but I understand that these would all involve substantial efforts to address. In particular, testing chimeric constructs around the consensus PKA site and from multiple species could be very informative. As the reviewer points out, unravelling these exact mechanisms will require a large amount of additional work and is beyond the scope of the current study.

      • 2A (and similar plots in subsequent figures): is it really necessary to cut the x axis? Would it be possible to indicate the number of oocytes for each experiment (maybe in the legend in brackets)? Fig. 2A has been changed in line with the reviewer's request (as well as similar plots in Fig. 5A and 5D). Changes: Fig. 2A - Legends page 31, lines 37-38 (Fig. 2A), page 33, line 25 (Fig. 5A) - page 33, line 34 (Fig. 5D).

      • 2D (and all similar plots below): I am lacking the discrete data points that were measured. Without these it is impossible to evaluate the fits. The half-times shown in 2E are somewhat redundant, and the information could be combined on a single plot. Fig. 2D has been changed in line with the reviewer's request (as well as similar plots in Figs 6C-D and 8B-C). Changes: Fig. 2D, 6C and 8B - Legends page 32, lines 9-14 (Fig. 2D), page 34, lines 24-30 (Fig. 6C) - page 35, lines 13-18 (Fig. 8B).

      • 3: why is the blot for PKA substrates cut into 3 pieces? It would be clearer to show the entire membrane. In western blot experiments using Clytia oocytes, the amount of material was limited so the membranes were cut into three parts. The central part was incubated sequentially in distinct antibodies. We finally incubated all three parts of the membrane with the anti-phospho-PKA substrate antibody to reveal the full spectrum of proteins recognized by this antibody. The 3 pieces in Fig. 3A therefore together make up the same original membrane. In the new presentation, the 3 pieces are shown next to each other, making it clear that all the membrane is present, with dotted lines indicating the cut zone as explained in the legend. Changes: Fig. 3A and 3B - Legend page 32, lines 22-25 (Fig. 3A), lines 30-33 (Fig. 3B) - Page 24, lines 3-6 (Methods).

      Reviewer 2

      • Abstract needs to be simplified if wants to reach a broader range of readers. We have reworked the Abstract to make it more accessible to new readers. Changes: Page 2.

      • It would be interesting to include some time-references when introducing the prophase arrest of Clytia and Xenopus oocytes. This should be indicated in the introduction with a short introduction of why, and not others, were these species chosen for this study. We have expanded the Introduction to cover the issue of time-references. Fuller details are now included about the advantages of Clytia compared to other hydrozoan species. Changes: Page 3, lines 5-11, page 4, lines 28-35, page 5, lines 32-33, page 5, lines 21-32 & 37-39.

      • The proteins MAPK is not introduced properly, as it is first mentioned in the results section. The involvement of MAPK activation during Xenopus oocyte meiotic maturation is now introduced. Changes: Page 5, lines 1-5.

      • Page 4 Lines 16-17 "... no role for cAMP has been detected in meiotic resumption, which is mediated by distinct signaling pathways" Which pathways? We now give the example of the well-characterized pathway Gbg-PI3K pathway for oocyte maturation in starfish, also mentioning that in many species the pathways are still unknown. Changes: Page 4, lines 1-15.

      • Page 4 line 34-39. Introduction indicates that the phosphorylation of ARPP19 on S67 by Gwl is a poorly understood molecular signaling cascade (line 34). However, the positive role of ARPP19 on Cdk1 activation, through the S67 phosphorylation by Gwl, appears to be widespread across all eukaryotic mitotic and meiotic divisions studied (lines 36-37). These two sentences seem a little contradictory. The mechanisms of PP2A inhibition by Gwl-phosphorylated ARPP19 are very well studied. The part that remains mysterious concerns the upstream mechanisms. We have reworded the paragraph to make this point unambiguous. Changes: Page 5, lines 1-8.

      • Why is mouse not included in Figure 1A? We have replaced the human sequence in Figure 1A with the mouse sequence. Changes: Fig. 1A.

      • 1C: There should be a better explanation in the text of the results sections for the image included in in Fig1 C. Please indicate in the text that ClyARPP19 mRNA is expressed in previtellogenic oocytes and not in vitellogenic. We have expanded the explanations of Fig. 1C in the text. We have also added cartoons to the figure to help readers understand the organisation of the Clytia jellyfish and gonad. As now explained, ClyARPP19 mRNA is detected in oocytes at all stages, but the signal is much stronger in pre-vitellogenic oocytes because all cytoplasmic components are significantly diluted by high quantity of yolk proteins. Changes: page 7, line 40 & page 8, lines 1-9 - Fig. 1C - Legend page 31, lines 19-31.

      • In addition, the detection of ARPP19 in the nerve rings is intriguing. Any idea of its function there? The only data linking ARPP19 and the nervous system concerns a mammalian variant of ARPP19 that is highly expressed in the striatum. This point is included in the Discussion. __Changes: __page 16, lines 12-13 & 17-20 - page 19, lines 6-9.

      • Figure 3C. The way these graphs are presented is somehow confusing. If authors wish so, they can keep the figure as it is, but then Also, please provide in the text a plausible explanation for the cause of oocyte lysis for all experimental conditions. please explain this graph better in the text, and please include statistical analysis. This figure is intended to provide an overview of all the Clytia oocyte injection experiments, for which full details are given in Supplementary Table 1. We have modified the figure and now clarified this fully in the text and figure legend. Clytia oocytes are relatively fragile. Sensitivity of oocytes to injection varies between batches, while in general increasing injection volumes or protein concentrations increases the levels of lysis observed. We do not know exactly what causes this but it probably relates to mechanical trauma. We now explain these injection experiments more fully in the text. Changes: Page 9, lines 16-39 - Fig. 3C and Supplementary Table 1 - Legend page 32, lines 34-41 & page 33, lines 1-11.

      • In Figure 5 D-F, cited in page 9 lines 35-35. Can you provide an explanation of why the time course of meiosis resumption was delayed? The most probable explanation is that Clytia protein recognizes less efficiently the unknown ARPP19 effectors that mediate the prophase arrest in Xenopus. This explanation is provided in the Results section. Changes: page 11, line 16-19.

      • All the sections start with a long introductory paragraph. I believe this facilitates the contextualization of the experiments, but please try to summarize when possible. As requested, we have shortened or removed the introductory passages of the Results section paragraphs, which were redundant with the information given in the introduction. Changes: Page 7, lines 3-4 & 14-16 & 36-37 - Page 8, lines 12-15 - Page 8, lines 37-40 & Page 9, lines 1-6.

      • Page 7, Lines 14-19 present a general conclusion of the findings explained in lines 20-27. I think these results are important and they should be explained better, in my opinion they are slightly poorly described. The explanation of the experiments and the results are now more detailed and the paragraph ends with a general conclusion which came too early in the previous version. Changes: Page 8, lines 22-24 & 32-34.

      • Page 8, lines 16-17: "It was not possible to increase injection volumes or protein concentrations without inducing high levels of non-specific toxicity". What are the non-specific toxicity effects? How was this addressed? What fundaments this conclusion? As explained above, increasing injection volumes or protein concentrations increases the levels of lysis observed due probably to mechanical trauma. But it is important to note that at the same concentration, both Clytia and Xenopus proteins induce activation of Cdk1 and GVBD in the Xenopus oocyte. Changes: Page 9, lines 16-29 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      • Lines 25-27 of page 8. "These results suggest that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia, although we cannot rule out that the quantity of OA or Gwl thiophosphorylated ARPP proteins delivered was insufficient to trigger GVBD." Please provide evidence if higher concentrations of OA or Gwl were tested to state this conclusion. High OA concentrations lead to cell lysis/apoptosis as a result of a massive deregulation of protein phosphorylation. For these reasons, we cannot increase OA concentrations over 2 µM. When injected in Xenopus oocyte at 1 or 2 µM, OA induces Cdk1 activation, but then the cell dies because PP2A has multiple substrates essential for cell life. When injected at 2 µM in Clytia oocytes, OA does not induce Cdk1 activation but promotes cell lysis. This supports the conclusion that 2 µM OA is sufficient to inhibit PP2A but that PP2A inhibition is not sufficient to induce oocyte maturation in Clytia. We have reworded the relevant text. Changes: Page 9, lines 31-35 - Page 32, lines 34-41 & Page 33, lines 1-11 - Supplementary Table 1.

      • Lines 12-13: the sentence "This in vitro assay thus places S81 as the sole residue in ClyARPP19 for phosphorylation by PKA." is overstated. As not all residues had been tested, please indicate that "it is likely that" or "among the residues tested", in contrast to "the sole residue in ClyARPP19". The observed thiophosphorylation of the wild-type protein demonstrates that one or more residues are phosphorylated by PKA. This thiophosphorylation was completely prevented by mutation of a single residue, S81. This experiment thus shows that S81 is entirely responsible for phosphorylation by PKA in this assay. We have rewritten this section more clearly. Changes: Page 10, lines 18-28.

      • Some parts of the discussion are a bit speculative. We have reduced the speculation in this part of the discussion, in particular regrouping and reformulating ideas about evolutionary scenarios into a single paragraph. Changes: page 17, lines 37-41 - page 18, lines 1-41 - page 19, lines 1-18.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      *REVIEW COMMENT *

      *The article titled "The tRNA thiolation-mediated translational control is essential for plant immunity" by Zheng et al. highlights the critical role of tRNA thiolation in Arabidopsis plant immunity through comprehensive analysis, including genetics, transcriptional, translational, and proteomic approaches. Through their investigation, the authors identified a cbp mutant, resulting in the knockout of ROL5, and discovered that ROL5 and CTU2 form a complex responsible for catalyzing the mcm5s2U modification, which plays a pivotal role in immune regulation. The findings from this study unveil a novel regulatory mechanism for plant defense. Undoubtedly, this discovery is innovative and holds significant potential impact. However, before considering publication, it is necessary for the authors to address the various questions raised in the manuscript concerning the experiments and analysis to ensure the reliability of the study's conclusions. *

      __Response: Thank you very much for your support and suggestions! __

      *Here is Comments: *

      *Line 64-65: *

      *The author mentioned that 'While NPR1 is a positive regulator of SA signaling, NPR3 and NPR4 are negative regulators.' However, several recent discoveries are suggesting that it may not be a definitive fact that NPR3 and NPR4 are negative regulators. Therefore, I recommend the authors to review this section in light of the findings from recent papers and make necessary edits to reflect the most current understanding. *

      __Response: Thank you for your feedback. Since we mainly focused on NPR1 in this study, we removed this sentence to avoid confusion. We provided additional information about NPR1 in the Introduction section to emphasize the importance of NPR1 (Line 64-68). __

      *Line 182- & Figure 4: *

      *The author conducted RNA-seq, Ribo-seq, and proteome analysis. Describing the analysis as "transcriptional and translational" using RNA-seq and proteome data seems not entirely accurate. Proteome data compared with RNA-seq not only reflects translational changes but may also encompass post-translational regulations that contribute to the observed differences. To maintain precision, the title of this section should either be modified to "transcriptional and protein analysis" or, alternatively, compare RNA-seq and Ribo-seq data to demonstrate the transcriptional and translational changes more explicitly. *

      __Responses: Thank you for your suggestions. We agree with you and thus change the description accordingly throughout the manuscript. __

      *Line 229-235 and Figure 5C: *

      *The interpretation of Figure 5C's polysome profiling results is inconclusive. There does not seem to be a noticeable difference in polysomal fractions between the cab mutant and CAM. The observed differences in the overlay of multiple polysome fractions between cab and COM could be primarily influenced by baseline variations rather than a significant decrease in the polynomial fractions in cpg. Therefore, it is necessary to carefully review other relevant papers that discuss polysome fraction data and their analysis. By doing so, the authors can make the appropriate corrections to ensure accurate interpretations. *

      __Responses: Thank you for your comments. We agree that the difference between cgb and COM was not dramatic visually. This is a common feature of ____polysome profiling assay (e.g. Extended Data Fig. 1f in Nature 545: 487–490; Fig. 1c in Nature Plants, 9: 289–301). In our case, the difference between polysome fractions was unlikely due to the baseline variation for two reasons. First, baseline variation affects monosome and polysome fractions in the same way. However, our results showed the monosome fraction of cgb is higher than that of COM, whereas the polysome fraction of cgb is lower than that of COM. Second, this result was repeatedly detected. For better visualization, we adjusted the scale of Y axis in the revised manuscript (Figure 5D). __

      *Line 482 Ion Leakage assay: *

      I could not find the ion leakage assay in this manuscript, so I wonder why it is mentioned.

      __Response: We are sorry for the mistake. The Ion leakage data were included in previous visions of the manuscript. We removed the data but forgot to remove the corresponding method in the present version. __

      *Materials and Methods: *

      *To enhance the reproducibility of the study, the authors should provide a more detailed description of the materials and methods, especially for critical experiments like the Yeast-two-hybrid assays. Clear documentation of specific reagents, strains, and protocols used, along with information on controls, will bolster the validity of the results and facilitate future research in this area. *

      __Response: Thank you for your suggestions. We provided more details in the methods. For y____east two-hybrid assays, the vector information was included in “Vector constructions” section. __

      *Minor Point: *

      Line 61: There is a space between ')' and '.', which needs to be edited.

      Response: The space was deleted.

      *Reviewer #1 (Significance (Required)): *

      *This study holds significant importance within the field of plant immunity research. The authors have made valuable contributions through their comprehensive analysis, encompassing genetics, transcriptional, translational, and proteomic approaches, to elucidate the critical role of tRNA thiolation in plant immunity. One of the major strengths of this study lies in its ability to shed light on a previously unknown regulatory mechanism for plant defense. By identifying the cbp mutant and investigating the role of ROL5 and CTU2 in catalyzing the mcm5s2U modification, the authors have unveiled a novel aspect of plant immune regulation. This innovative discovery provides a deeper understanding of the intricate molecular processes governing immunity in plants. *

      *Moreover, the study's findings are not limited to the immediate field of plant immunity but also have broader implications for the scientific community. By employing diverse methodologies, the authors have demonstrated how tRNA thiolation exerts control over both transcriptional and translational reprogramming, revealing intricate links between these processes. This integrative approach sets a precedent for future research in the field of plant molecular biology and opens up new avenues for investigating other aspects of immune regulation. *

      In terms of its relevance, the study's findings have the potential to captivate researchers across various disciplines, such as plant biology, molecular genetics, and translational research. The insights gained from this study may inspire researchers to explore further the role of tRNA in other regulation.

      Response: Thank you very much for your positive comments and support!

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors presented an intriguing and previously unknown mechanism that the tRNA mcm5s2U modification regulates plant immunity through the SA signaling pathway, specifically by controlling NPR1 translation. The manuscript was well-written and logically structured, allowing for a clear understanding of the research. The authors provided strong and persuasive data to support their key claims. However, further improvement is required to strengthen the conclusion that mcm5s2U regulates plant immunity by controlling NPR1 translation.

      __Response: Thank you very much for your positive comments and support! __

      Major comments:

      • NPR1 translation should be examined to verify the Mass Spec (Figure 5B) and polysome profiling data (Figure 5D) by checking the NPR1 protein and mRNA level using antibodies and qPCR, respectively, in the cgb mutant background to establish a concrete confirmation of CGB regulation in NPR1 translation. * Response: This is a very constructive suggestion. We performed these experiments and found that the transcription levels of NPR1 were similar between COM and cgb both before and after ____Psm_ES4326 infection (Figure S2), _which is consistent with RNA-Seq data____. Consistent with the Mass Spec and polysome profiling data, _the NPR1 protein level was much higher in COM than that in cgb(Figure 5C) after _Psm____ ES4326 infection. Together, these data further supported our conclusion that translation of NPR1 is impaired in cgb. __

      • Analyzing the genetic epistasis of CGB and NPR1 to check if CGB regulates plant immunity through the NPR1-dependent SA signal pathway. If the authors' claim is valid, I would expect no addictive effect on bacterial growth in the cgb/npr1 double mutant compared to the single mutants. Due to the broad impact of CGB on plant signaling (Figures 4E and 4F), the SA protection assay, which concentrates on the SA signal pathway, needs to be tested in WT, cgb and npr1 plants as an alternative assay to the genetic epistasis analysis. I expect that the SA-mediated protection is also compromised in cgb mutant background.*

      __Response: Thank you for your suggestions. We did examine the growth of Psm ES4326 in the cgb npr1double mutant and found that cgb npr1 was significantly more susceptible than npr1 and cgb (Figure below). Although the additive effects were observed, this result was not against our conclusion for the following reasons. First, the translation of NPR1 was reduced rather than completely blocked in cgb. In other words, NPR1 still has some function in cgb. But in the cgb npr1 double mutant, the function of NPR1 is completely abolished, which explains why cgb npr1 was more susceptible than cgb. Second, in addition to NPR1, some other immune regulators (such as PAD4, EDS5, and SAG101) were also compromised in cgb(Figure 5B), which explained why cgb npr1 was more susceptible than npr1. Since the result of the genetic analysis was not intuitive, we decided not to include it in the manuscript. __

      __SA signaling is known to regulate both basal resistance and systemic acquired resistance (SA-mediated protection). We have shown that cgb is defective in the defect of basal resistance, which cgb is sufficient to support our conclusion that the tRNA thiolation is essential for plant immunity. We agree that it is expected that the SA-mediated protection is also compromised in cgb. We will test this in the future study. __

      • Could the authors comment on why using COM instead of WT as a control to perform the majority of the experiments? __Response: Thank you for your comments. In addition to ROL5, the cgb mutant may have other mutations compared with WT.COM is a complementation line in the cgb background. Therefore, the genetic background between COM and cgb may be more similar than that of WT and cgb*. __

      • In Figure 5E, why does ACTIN2 have an enhanced translation while NPR1 shows a compromised one in cgb mutant? How does the mcm5s2U distinguish NPR1 and ACTIN2 codons? Does mcm5s2U modification have both positive and negative roles in regulating protein translation? __Response: Thank you for raising this question. As previously reported, _loss of the mcm5s2U modification causes ribosome pausing at AAA and CAA codons. Therefore, the translation of the mRNAs with more _GAA/CAA/AAA codons (called s2 codon) is likely to be affected more dramatically in cgb*. We have analyzed the percentage of s2 codon at whole-genome level (Figure below). The average percentage is 8.5%, while NPR1 contains 10.1% s2 codon and actin contains only 4.5% s2 codon. When fewer ribosomes are used for translation of the mRNAs with high s2 codon percentage, more ribosomes are available for translation of the mRNAs with low s2 codon percentage, which may account for the enhanced translation efficiency. To focus on NPR1 and to avoid confusion, we removed the ACTIN data in the revised manuscript. __

      • Specify the protein amount used for the in vitro pull-down assay and agrobacteria concentration used for the tobacco Co-IP assay in the protocol section.*

      Response: We added this information in Method section in the revised manuscript.

        1. Delete the SA quantification and Ion leakage assay in the protocol, which are not used in the study.*

      __Response: We are sorry for the mistake. The ____SA quantification and ion leakage data were included in previous visions of the manuscript. We removed the data but forgot to remove the corresponding method in the present version. We deleted them in the revised manuscript. __

      • The strain Pst DC3000 avrRPT2 was not used in this study. Please remove it.*

      Response: We are sorry for the mistake. ____The strain Pst DC3000 avrRPT2 was used for ion leakage assay in previous visions of the manuscript. We deleted it in the revised manuscript.

      • In Figure 5F, did the 59 genes tested overlap with the 366 attenuated proteins in the cgb mutant? Were the 59 genes translationally regulated?*

      __Response: Thank you for your suggestion. Venn diagram analysis revealed that 12 genes (about 20%) are also attenuated proteins, suggesting that ____the mcm5s2U modification regulates the translation of some SA-responsive genes. __

      Reviewer #2 (Significance (Required)):

      The authors' study is significant as it establishes the first connection between tRNA mcm5s2U modification and plant immunity, specifically by regulating NPR1 protein translation. This research expands our understanding of the biological role of tRNA mcm5s2U modification and highlights the importance of translational control in plant immunity. It is likely to captivate scientists working in this field.

      Response: Thank you very much for your positive comments and support!

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, the authors identified a cgb mutant that carries a mutation in the ROL5 gene Both the cgb mutant and the newly created rol5-c mutant are susceptible to the bacterial pathogen Psm. The authors showed that ROL5 interacts with CTU2, the Arabidopsis homologous protein of the yeast tRNA thiolation enzyme NCS2. A ctu2-1 mutant is also susceptible to Psm, suggesting the tRNA thiolation may play a role in plant immunity. Indeed, tRNA mcm5S2U levels are undetectable in rol5-c and ctu2-1 mutants. The authors found that the cgb mutation significantly attenuated basal and Psm-induced transcriptome and proteome changes. Furthermore, it was found that the translation efficiency of a group of SA signaling-related proteins including NPR1 is compromised.

      The manuscript provides solid evidence for the involvement of ROL5 and CTU2 in plant immunity using the rol5 and ctu2 mutants. The authors may consider the following suggestions and comments to improve the manuscript.

      Response: Thank you very much for your support and suggestions!

      • The function of the Elongator complex in tRNA modification/thiolation has been extensively studied. In Arabidopsis Elongator mutants, mcm5S2U levels are very low, similar to the levels in the rol5 and ctu2 mutants (Mehlgarten et al., 2010, Mol Microbiology, 76, 1082-1094; Leitner et al., 2015 Cell Rep). In elp mutants, the PIN protein levels are reduced without reduced mRNA levels (Leitner et al., 2015), indicating that Elongator-mediated tRNA modification is involved in translation regulation. The Elongator complex plays an important role in plant immunity, though the reduced mcm5S2U levels in elp mutants were not proposed as the exclusive cause of the immune phenotypes. In fact, it would be difficult to establish a cause-effect relationship between tRNA modification and immunity. These results should be discussed in the manuscript.* Response: Thank you very much for your insightful comment on the role of the ELP complex in tRNA modification and plant immunity. We added a paragraph ____discussing the ELP complex in the revised manuscript (Line 280-295).

      __In addition to tRNA modification, the ELP complex has several other distinct activities including histone acetylation, α-tubulin acetylation, and DNA demethylation. Therefore, it is difficult to dissect which activity of the ELP complex contributes to plant immunity. However, the only known activity of ROL5 and CTU2 is to catalyze _tRNA thiolation. Considering that the elp, rol5, and ctu2 mutants are all defective in tRNA thiolation, it is likely the _tRNA modification activity of the ELP complex underlies its function in plant immunity. __

      • The interaction between CTU2 and ROL5 in Y2H has previously been reported (Philipp et al., 2014). The same report also showed reduced tRNA thiolation in the ctu2-2 mutant using polyacrylamide gel. These results should be mentioned/discussed in the manuscript.*

      __Response: Thank you for pointing them out. We added this information in the revised version (Line 146-147). __

      • tRNA modification unlikely plays a unique role in plant immunity. It can be inferred that mutations affecting tRNA modification (rol5, ctu2, elp, etc.) would delay both internal and external stimulus-induced signaling including immune signaling.*

      Response: We agree with you that tRNA modification has other roles in addition to plant immunity. In the Discussion section, we have mentioned that “it was found that tRNA thiolation is required for heat stress tolerance ____(Xu et al., 2020)____. ……It will also be interesting to test whether tRNA thiolation is required for responses to other stresses such as drought, salinity, and cold.” (Line276-279).

      • It would be interesting to conduct statistical analyses on the genetic codons used in the CDSs whose translation was attenuated as described in the manuscript. Do these genes including NPR1 use more than average levels of AAA, CAA, and GAA codons? If not, why their translation is impaired?*

      __Response: Thank you for your suggestion. We called _GAA/CAA/AAA codons s2 codon. We have analyzed the percentage of s2 codon at whole-genome level (Figure below). NPR1 does contain more s2 codon (10.1%) than the average level (8.5%). We are preparing another manuscript, which will report the relationship between _s2 codon and translation. __

      **Referees cross-commenting**

      It is important to put current research in the context of available knowledge in the field. The digram in Figure 3C shows that the Elongator complex functions upstream of ROL5 & CTU2 in modifying tRNA. The function of Elongator in plant immunity has been well established. The similarities and differences should be discussed. Additionally, it may no be a good idea to claim that the results are novel.

      __Response: Thank you for your comments. We added a paragraph ____discussing the ELP complex in the revised manuscript (Line 280-295). The ELP complex catalyzes the cm5U modification, which is the precursor of mcm5s2U catalyzed by ROL5 and CTU2. In addition to tRNA modification, the ELP complex has several other distinct activities including histone acetylation, α-tubulin acetylation, and DNA demethylation. Therefore, it is difficult to dissect which activity of the ELP complex contributes to plant immunity. However, the only known activity of ROL5 and CTU2 is to catalyze tRNA thiolation. Considering that the elp, rol5, and ctu2 mutants are all defective in tRNA thiolation, it is likely the tRNA modification activity of the ELP complex underlies its function in plant immunity. Therefore, our study improved our understanding of the ELP complex in plant immunity. We have deleted the words “new” and “novel” throughout the manuscript. __

      Reviewer #3 (Significance (Required)):

      *The manuscript provides solid evidence for the involvement of ROL5 and CTU2 in plant immunity. However, the authors did not acknowledge the existing results about the Elongator complex that functions in the same pathway in modifying tRNA. The involvement of Elongator in plant immunity has been well established. The cause-effect relationship between tRNA modification and plant immunity is difficult to demonstrate. *

      Response: We think that t____he cause-effect relationship between the activities of the ELP complex and plant immunity is difficult to demonstrate because the ELP complex has several distinct activities other than tRNA modification. However, since the only known activity of ROL5 and CTU2 is to catalyze tRNA thiolation, the cause-effect relationship between tRNA thiolation and plant immunity is clear, which indicated that ____the ____tRNA modification activity of the ELP complex contributes to plant immunity.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear Editor and Reviewers,

      *We thank you for the thorough and detailed examination of our preprint and providing the very valuable comments that helped to even better present and interpret our data. *

      *Thank you in particular for appreciating the extensive set of microscopic techniques that we have combined to study in a unique manner the characteristics and functionalities of FIT nuclear bodies in living plant cells. *

      We prepared a revised preprint in which we address all reviewer comments. Our revision includes a NEW experiment (in four repetitions) that addresses one comment made by the reviewers with regard to the effects of the environmental FIT NB-inducing situation:

      • NEW Supplemental Figures S6 and S7: Analysis of previously reported intron retention splicing variants of Fe deficiency genes FIT, BHLH039, IRT1, FRO2 in new gene expression experiments (Four independent repetitions of the experiments with three biological replicates of each sample – white/blue light treatment, sufficient and deficient iron supply). In the following, please find our detailed response to all reviewer comments.

      With these changes, we hope that our peer-reviewed preprint can receive a positive vote,

      We are looking forward to your response,

      Sincerely

      Petra Bauer and Ksenia Trofimov on behalf of all authors

      Comments to the reviews:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this paper entitled " FER-LIKE IRON DEFICIENCY-INDUCED TRANSCRIPTION FACTOR (FIT) accumulates in homo- and heterodimeric complexes in dynamic and inducible nuclear condensates associated with speckle components", Trofimov and colleagues describe for the first time the function of FIT in nuclear bodies. By an impressive set of microscopies technics they assess FIT localization in nuclear bodies and its dynamics. Finally, they reveal their importance in controlling iron deficiency pathway. The manuscript is well written and fully understandable. Nonetheless, at it stands the manuscript present some weakness by the lack of quantification for co-localization and absence controls making hard to follow authors claim. Moreover, to substantially improve the manuscript the authors need to provide more proof of concepts in A. thaliana as all the nice molecular and cellular mechanism is only provided in N. bentamiana. Finally, some key conclusions in the paper are not fully supported by the data. Please see below:

      Main comments:

      1) For colocalization analysis, the author should provide semi-quantitative data counting the number of times by eyes they observed no, partial or full co-localization and indicate on how many nucleus they used.

      Authors:

      We have added the information in the Materials and Method section, lines 731-734:

      In total, 3-4 differently aged leaves of 2 plants were infiltrated and used for imaging. One infiltrated leaf with homogenous presence of one or two fluorescence proteins was selected, depending on the aim of the experiment, and ca. 30 cells were observed. Images are taken from 3-4 cells, one representative image is shown.

      In all analyzed cases, except in the case of colocalization of FIT and PIF4 fusion proteins, the ca. 30 cells had the same localization and/or colocalization patterns. This information has also been added in the figure legends. Each experiment was repeated at least 2-3 times, or as indicated in the figure legend.

      2) Do semi-quantitative co-localization analysis by eyes, on FIT NB with known NB makers in the A. thaliana root. For now, all the nicely described molecular mechanism is shown in N. benthamiana which makes this story a bit weak since all the iron transcriptional machinery is localized in the root to activate IRT1.

      Authors:

      The described approach has been very optimal, and we were able to screen co-localizing marker proteins in FIT NBs in N. benthamiana to better identify the nature of FIT NBs. This has been successful as we were able to associate FIT NBs with speckles. The N. benthamiana system allowed optimal microscopic observation of fluorescence proteins and quantification of FIT NB characteristics in contrast to the root hair zone of Arabidopsis where Fe uptake takes place. FIT is expressed at a low level in roots and also in leaves, whereby fluorescence protein expression levels are insufficient for the here-presented microscopic studies. The tobacco infiltration system is also well established to study FIT-bHLH039 protein interaction and nuclear body markers. We discuss this point in the discussion, see line 489-500.

      3) The authors need to provide data clearly showing that the blue light induce NB in A. thaliana and N. benthamiana.

      Authors:

      *For tobacco, see Figure 1B (t = 0, 5 min) and Supplemental Movies S1. For Arabidopsis, please see Figure 1A (t = 0, 90 and 120 min) and Supplemental Figure S1A. We provide an additional image of pFIT:cFIT-GFP Arabidopsis control plants, showing that NB formation is not detected in plants that were grown in white light and not exposed to blue light before inspection (Supplemental Figure S1B). We state, that upon blue light exposure, plants had FIT NBs in at least 3-10 nuclei of 20 examined nuclei in the root epidermis in the root hair zone (in three independent experiments with three independent plants). White-light-treated plants showed no NB formation unless an additional exposure to blue light was provided (in three independent experiments, three independent plants per experiment and with 15 examined nuclei per plant). *

      4) Direct conclusion in the manuscript:

      • Line 170: At this point of the paper the author cannot claim that the formation of FIT condensates in the nucleus is due to the light as it might be indirectly linked to cell death induced by photodamaging the cell using a 488 lasers for several minutes. This is true especially with the ELYRA PS which has strong lasers made for super resolution and that Cell death is now liked to iron homeostasis. The same experiment might be done using a spinning disc or if the authors present the data of the blue light experiment mentioned above this assumption might be discarded. Alternatively, the author can use PI staining to assess cell viability after several minutes under 488nm laser.

      Authors:

      As stated in our response to comment 3, we have included now a white light control to show that FIT NB formation is not occurring under the normal white light conditions. Since the formation of FIT NBs is a dynamic and reversible process (Figure 1A), it indicates that the cells are still viable, and that cell death is not the reason for FIT NB formation.

      • Line 273: I don't agree with the first part of the authors conclusion, saying that "wild-type FIT had better capacities to localize to NBs than mutant FITmSS271AA, presumably due its IDRSer271/272 at the C-terminus. This is not supported by the data. In order to make such a claim the author need to compare the FA of FIT WT with FITmSS271AA by statistical analysis. Nonetheless, the value seems to be identical on the graphs. The main differences that I observed here are, 1) NP value for FITmSS271AA seems to be lower compared to FIT-WT, suggesting that the Serine might be important to regulate protein homedimerization partitioning between the NP and the NB. 2) To me, something very interesting that the author did not mention is the way the FA of FITmSS271AA in the NB and NP is behaving with high variability. The FA of those is widely spread ranging from 0.30 to 0.13 compared to the FIT-WT. To me it seems that according to the results that the Serine 271/272 are required to stabilize FIT homodimerization. This would not only explain the delay to form the condensate but also the decreased number and size observed for FITmSS271AA compared to FIT-WT. As the homodimerization occurs with high variability in FITmSS271AA, there is less chance that the protein will meet therefore decreasing the time to homodimerize and form/aggregate NB.

      Authors:

      We fully agree. We meant to describe this result it in a similar way and thank you for help in formulating this point even better. Rephrasing might make it better clear that the IDRSer271/272 is important for a proper NB localization, lines 272-278:

      “Also, the FA values did not differ between NBs and NP for the mutant protein and did not show a clear separation in homodimerizing/non-dimerizing regions (Figure 3D) as seen for FIT-GFP (Figure 3C). Both NB and NP regions showed that homodimers occurred very variably in FITmSS271AA-GFP.

      In summary, wild-type FIT could be partitioned properly between NBs and NP compared to FITmSS271AA mutant and rather form homodimers, presumably due its IDRSer271/272 at the C-terminus.”

      • Line 301: According to my previous comment (line 273), here it seems that the Serine 271/272 are required only for proper partitioning of the heterodimer FIT/BHLH039 between the NP and NB but not for the stability of the heterodimer formation. However, it might be great if the author would count the number of BHLH039 condensates in both version FITmSS271AA and FIT-WT. To my opinion, they would observe less BHLH039 condensate because the homodimer of FITmSS271AA is less likely to occur because of instability.

      Authors:

      bHLH039 alone localizes primarily to the cytoplasm and not the nucleus, and the presence of FIT is crucial for bHLH039 nuclear localization (Trofimov et al., 2019). Moreover, bHLH039 interaction with FIT depends on SS271AA (Gratz et al., 2019). We therefore did not consider this experiment for the manuscript and did not acquire such data, as we did not expect to achieve major new information.

      5) To wrap up the story about the requirements of NB in mediating iron acquisition under different light regimes, provide data for IRT1/FRO2 expression levels in fit background complemented with FITmSS271AA plants. I know that this experiment is particularly lengthy, but it would provide much more to this nice story.

      Authors:

      Data for expression of IRT1 and FRO2 in FITmSS271AA/fit-3 transgenic Arabidopsis plants are provided in Gratz et al. (2019). To address the comment, we did here a NEW experiment. We provide gene expression data on FIT, BHLH039, IRT1 and FRO2 splicing variants (previously reported intron retention) to explore the possibility of differential splicing alterations under blue light (NEW Supplemental Figure S6 and S7, lines 454-466). Very interestingly, this experiment confirms that blue light affects gene expression differently from white light in the short-term NB-inducing condition and that blue light can enhance the expression of Fe deficiency genes despite of the short 1.5 to 2 h treatment. Another interesting aspect was that the published intron retention was also detected. A significant difference in intron retention depending on iron supply versus deficiency and blue/white light was not observed, as the pattern of expression of transcripts with respective intron retentions sites was the same as the one of total transcripts mostly spliced.

      Minor comments

      In general, I would suggest the author to avoid abbreviation, it gets really confusing especially with small abbreviation as NB, NP, PB, FA.

      Authors:

      *We would like to keep the used abbreviations as they are utilized very often in our work and, in our eyes, facilitate the understanding. *

      Line 106: What does IDR mean?

      Authors:

      Explanation of the abbreviation was added to the text, lines 105-108:

      “Intrinsically disordered regions (IDRs) are flexible protein regions that allow conformational changes, and thus various interactions, leading to the required multivalency of a protein for condensate formation (Tarczewska and Greb-Markiewicz, 2019; Emenecker et al., 2020).”

      Line 163-164: provide data or cite a figure properly for blue light induction.

      Authors:

      We have removed this statement from the description, as we provide a white light control now, lines 157-158:

      “When whole seedlings were exposed to 488 nm laser light for several minutes, FIT became re-localized at the subnuclear level.”

      Line 188: Provide Figure ref.

      Authors:

      Figure reference was added to the text, lines 184-185:

      “As in Arabidopsis, FIT-GFP localized initially in uniform manner to the entire nucleus (t=0) of N. benthamiana leaf epidermis cells (Figure 1B).”

      Line 194: the conclusion is too strong. The authors conclude that the condensate they observed are NB based on the fact the same procedure to induce NB has been used in other study which is not convincing. Co-localization analysis with NB markers need to be done to support such a claim. At this step of the study, the author may want to talk about condensate in the nucleus which might correspond to NB. Please do so for the following paragraph in the manuscript until colocalization analysis has not been provided. Alternatively provide the co-localization analysis at this step in the paper.

      Authors:

      We agree. We changed the text in two positions.

      Lines 176-178: “Since we had previously established a reliable plant cell assay for studying FIT functionality, we adapted it to study the characteristics of the prospective FIT NBs (Gratz et al., 2019, 2020; Trofimov et al., 2019).”

      Lines 192-193:We deduced that the spots of FIT-GFP signal were indeed very likely NBs (for this reason hereafter termed FIT NBs).”

      Line 214: In order to assess the photo bleaching due to the FRAP experiment the quantification of the "recovery" needs to be provided in an unbleached area. This might explain why FIT recover up to 80% in the condensate. Moreover, the author conclude that the recovery is high however it's tricky to assess since no comparison is made with a negative/positive control.

      Authors:

      In the FRAP analysis, an unbleached area is taken into account and used for normalization.

      We reformulated the description of Figure 1F, lines 212-214:

      “According to relative fluorescence intensity the fluorescence signal recovered rapidly within FIT NBs (Figure 1F), and the calculated mobile fraction of the NB protein was on average 80% (Figure 1G).”

      Line 220-227: The conclusion it's too strong as I mentioned previously the author cannot claim that the condensate are NBs at this step of the study. They observed nuclear condensates that behave like NB when looking at the way to induce them, their shape, and the recovery. And please include a control.

      Authors:

      Please see the reformulated sentences and our response above.

      Lines 176-178: “Since we had previously established a reliable plant cell assay for studying FIT functionality, we adapted it to study the characteristics of the prospective FIT NBs (Gratz et al., 2019, 2020; Trofimov et al., 2019).”

      Lines 192-193:We deduced that the spots of FIT-GFP signal were indeed very likely NBs (for this reason hereafter termed FIT NBs).”

      Line 239: It's unappropriated to give the conclusion before the evidence.

      Authors:

      Thank you. We removed the conclusion.

      Line 240: Figure 2A, provide images of FIT-G at 15min in order to compare. And the quantification needs to be provided at 5 minutes and 15 minutes for both FIT-G WT and FIT-mSS271AA-G counting the number of condensates in the nucleus. Especially because the rest of the study is depending on these time points.

      Authors:

      *This information is provided in the Supplemental Movie S1C. *

      Line 241: the author say that the formation of condensate starts after 5 minutes (line 190) here (line 241) the author claim that it starts after 1 minutes. Please clarify.

      Authors:

      In line 190 we described that FIT NB formation occurs after the excitation and is fully visible after 5 min. In line 241 we stated that the formation starts in the first minutes after excitation, which describes the same time frame. We rephrased the respective sentences.

      Lines 185-188: “A short duration of 1 min 488 nm laser light excitation induced the formation of FIT-GFP signals in discrete spots inside the nucleus, which became fully visible after only five minutes (t=5; Figure 1B and Supplemental Movie S1A).”

      Lines 239-242: “While FIT-GFP NB formation started in the first minutes after excitation and was fully present after 5 min (Supplemental Movie S1A), FITmSS271AA-GFP NB formation occurred earliest 10 min after excitation and was fully visible after 15 min (Supplemental Movie S1C).”

      Line 254: Not sure what the authors claim "not only for interaction but also for FIT NB formation ". To me, the IDR is predicted to be perturbed by modeling when the serines are mutated therefore the IDR might be important to form condensates in the nucleus. Please clarify.

      Authors:

      The formation of nuclear bodies is slow for FITmSS271AA as seen in Figure 2. Previously, we showed that FITmSS271AA homodimerizes less (Gratz et al., 2019.) Therefore, the said IDR is important for both processes, NB formation and homodimerization. We have added this information to make the point clear, lines 253-255:

      “This underlined the significance of the Ser271/272 site, not only for interaction (Gratz et al., 2019) but also for FIT NB formation (Figure 2).”

      Line 255: It's not clear why the author test if the FIT homodimerization is preferentially associated with condensate in the nucleus.

      Authors:

      We test this because both homo- and heterodimerization of bHLH TFs are generally important for the activity of TFs, and we unraveled the connection between protein interaction and NB formation. We state this in lines 228-232.

      Line 269-272: It's not clear to what the authors are referring to.

      Authors:

      We are describing the homodimeric behavior of FIT and FITmSS271AA assessed by homo-FRET measurements that are introduced in the previous paragraph, lines 256-268.

      Line 309: This colocalization part should be presented before line 194.

      Authors:

      We find it convincing to first examine and characterize the process underlying FIT NB formation, then studying a possible function of NBs. The colocalization analysis is part of a functional analysis of NBs. We thank the reviewer for the hint that colocalization also confirms that indeed the nuclear FIT spots are NBs. We will take this point and discuss it, lines 516-522:

      “Additionally, the partial and full colocalization of FIT NBs with various previously reported NB markers confirm that FIT indeed accumulates in and forms NBs. Since several of NB body markers are also behaving in a dynamic manner, this corroborates the formation of dynamic FIT NBs affected by environmental signals.”

      “In conclusion, the properties of liquid condensation and colocalization with NB markers, along with the findings that it occurred irrespective of the fluorescence protein tag preferentially with wild-type FIT, allowed us to coin the term of ‘FIT NBs’.”

      Line 328: add the ref to figure, please.

      Authors:

      Figure reference was added to the text, lines 330-332:

      “The second type (type II) of NB markers were partially colocalized with FIT-GFP. This included the speckle components ARGININE/SERINE-RICH45-mRFP (SR45) and the serine/arginine-rich matrix protein SRm102-mRFP (Figure 5).”

      Line 334: It seems that the size of the SR45 has an anormal very large diameter between 4 and 6 µm. In general a speckle measure about 2-3µm in diameter. Can the author make sure that this structure is not due to overexpression in N. benthamiana or make sure to not oversaturate the image.

      Authors:

      Thank you for this hint. Indeed, there are reports that SR45 is a dynamic component inside cells. It can redistribute depending on environmental conditions and associate into larger speckles depending on the nuclear activity status (Ali et al., 2003). We include this reference and refer to it in the discussion, lines 557-564:

      “Interestingly, typical FIT NB formation did not occur in the presence of PB markers, indicating that they must have had a strong effect on recruiting FIT. This is interesting because the partially colocalizing SR45, PIF3 and PIF4 are also dynamic NB components. Active transcription processes and environmental stimuli affect the sizes and numbers of SR45 speckles and PB (Ali et al., 2003; Legris et al., 2016; Meyer, 2020). This may indicate that, similarly, environmental signals might have affected the colocalization with FIT and resulting NB structures in our experiments. Another factor of interference might also be the level of expression.”

      Line 335: It seems that the colocalization is partial only partial after induction of NB. The FIT NB colocalize around SR45. But it's hard to tell because the images are saturated therefore creating some false overlapping region.

      Authors:

      The localization of FIT with SR45 is partial and occurs only after FIT has undergone condensation, see lines 335-338.

      Line 344-345: It's unappropriated to give the conclusion before the evidence.

      Authors:

      We explain at an earlier paragraph that we will show three different types of colocalization and introduce the respective colocalization types within separate paragraphs accordingly, see lines 314-321.

      Line 353: increase the contrast in the image of t=5 for UAP56H2 since it's hard to assess the colocalization.

      Authors:

      This is done as noted in the figure legend of Figure 6.

      Line 381-382: "In general" does not sound scientific avoid this kind of wording and describe precisely your findings.

      Authors:

      We rephrased the sentence, line 387-388:

      Localization of single expressed PIF3-mCherry remained unchanged at t=0 and t=15 (Supplemental Figure S5A).

      Line 384-385: Provide the data and the reference to the figure.

      Authors:

      We apologize for the misunderstanding and rephrased the sentence, line 389-391:

      After 488 nm excitation, FIT-GFP accumulated and finally colocalized with the large PIF3-mCherry PB at t=15, while the typical FIT NBs did not appear (Figure 7A)

      Line 386: The structure in which FIT-G is present in the Figure 7A t=15 is not alike the once already observed along the paper. This could be explained by over-expression in N. benthamiana. Please explain.

      Authors:

      Thank you for the hint. We discuss this in the discussion part, see lines 555-568.

      Line 393: Explain and provide data why the morphology of PIF4/FIT NB do not correspond to the normal morphology.

      Authors:

      Thank you for the valuable hints. Several reasons may account for this and we provide explanations in the discussion, see lines 555-568.

      Line 396-398: It seems also from the data that co-expression of PIF4 of PIF3 will affect the portioning of FIT between the NP and the NB.

      Authors:

      We can assume that residual nucleoplasm is depleted from protein during NB formation. This is likely true for all assessed colocalization experiments. We discuss this in lines 492-494.

      The discussion is particularly lengthy it might be great to reduce the size and focus on the main findings.

      Authors:

      *We shortened the discussion. *

      **Referees cross-commenting**

      All good for me, I think that the comments/suggestions from Reviewer #2 are valid and fair. If they are addressed they will improve considerably the manuscript.

      Reviewer #1 (Significance (Required)):

      This manuscript is describing an unprecedent very precise cellular and molecular mechanism in nutrition throughout a large set of microscopies technics. Formation of nuclear bodies and their role are still largely unexplored in this context. Therefore, this study sheds light on the functional role of this membrane less compartment and will be appreciated by a large audience. However, the fine characterization is only made using transient expression in N. Bentamiana and only few proofs of concept are provided in A. thaliana stable line.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Trofimov et al shows that FIT undergoes light-induced, reversible condensation and localizes to nuclear bodies (NBs), likely via liquid-liquid phase separation and light conditions plays important role in activity of FIT. Overall, manuscript is well written, authors have done a great job by doing many detailed and in-depth experiments to support their findings and conclusions.

      However, I have a number of questions/comments regarding the data presented and there are still some issues that authors should take into account.

      Major points/comments:

      1) Authors only focused on blue light conditions. Is there any specific reason for selecting only blue light and not others (red light or far red)?

      Authors:

      There are two main reasons: First, in a preliminary study (not shown) blue light resulted in the formation of the highest numbers of NBs. Second, iron reductase activity assays and gene expression analysis under different light conditions showed a promoting effect under blue light, but not red light or dark red light (Figure 9). This indicated to us, that blue light might activate FIT, and that active FIT may be related to FIT NBs.

      2) Fig. 3C and D: as GFP and GFP-GFP constructs are used as a reference, why not taking the measurements for them at two different time points for example t=0 and t=5 0r t=15???

      Authors:

      Free GFP and GFP-GFP dimers are standard controls for homo-FRET that serve to delimit the range for the measurements.

      3) Line 27-271: Acc to the figure 3d, for the Fluorescence anisotropy measurement of NBs appears to be less. Please explain.

      Authors:

      FA in NBs with FITmSS271AA is variable and the value is lower than that of whole nucleus but not significantly different compared with that in nucleoplasm. We describe the results of Figure 3D in lines 272-275.

      4) Figure 4: For the negative controls, data is shown at only t=0, data should be shown at t=5 also to prove that there is no decrease in fluorescence in these negative controls when they are expressed alone without bhlh39 as there is no acceptor in this case.

      Authors:

      Neither for FIT/bHLH039 nor the FITmSS271AA/bHLH039 pair, there is a significant decrease in the fluorescence lifetime values between t=0 and t=5/15. FIT-G is a control to delimit the range. The interesting experiment is to compare the protein pairs of interest between the different nuclear locations at t=5/15.

      5) Line 300-301: In Figure 4D and 4E. Fluorescence lifetime of G measurement at t=0 seems very similar for both FIT-G as well as FITmSS but if we look at the values of t=0 for FIT-G+bhlh039 it is greater than 2.5 and for FITmSS271AA-G+bhlh039 it is less which suggests more heterodimeric complexes to be formed in FITmSS271AA-G+bhlh039. Similar pattern is observed for NBs and NPs, according to the figure 4d and E.

      Therefore, heterodimeric complexes accumulated more in case of FITmSS271AA-G+bhlh039 as compared to FIT-G+bhlh039 (if we compare measurement values of Fluorescence lifetime of G of FITmSS271AA-G+bhlh039 with FIT-G+bhlh039).

      Please comment and elaborate about this further.

      Authors:

      These conclusions are not valid as the experiments cannot be conducted in parallel. Since the experiments had to be performed on different days due to the duration of measurements including new calibrations of the system, we cannot compare the absolute fluorescence lifetimes between the two sets.

      6) Figure 4: For the negative controls, data is shown at only t=0, data should be shown at t=5 also to prove that there is no decrease in fluorescence in these negative controls when they are expressed alone without bhlh39 as there is no acceptor in this case.

      Authors:

      Please see our response to your comment 4).

      7) Line 439-400: As iron uptake genes (FRO2 and IRT1) are more induced in WT under blue light conditions and FRO2 is less induced in case of red-light conditions. So, what happens to Fe content of WT grown under blue light or red light as compared to WT grown under white light. Perls/PerlsDAb staining of WT roots under different light conditions will add more information to this.

      Authors:

      We focused on the relatively short-term effects of blue light on signaling of nuclear events that could be related to FIT activity directly, particularly gene expression and iron reductase activity as consequence of FRO2 expression. These are both rapid changes that occur in the roots and can be measured. We suspect that iron re-localization and Fe uptake also occur, however, in our experience differences in metal contents will not be directly significant when applying the standard methods like ICP-MS or PERLs staining.

      Minor comments:

      Line 75-76: Rephrase the sentence

      Authors:

      We rephrased the sentence, lines 73-74:

      “As sessile organisms, plants adjust to an ever-changing environment and acclimate rapidly. They also control the amount of micronutrients they take up.”

      Line 119: Rephrase the sentence

      Authors:

      We rephrased the sentence, line 118-119:

      “Various NBs are found. Plants and animals share several of them, e.g. the nucleolus, Cajal bodies, and speckles.”

      Line 235-236: rephrase the sentence

      Authors:

      We rephrased the sentence, line 232-234:

      “In the work of Gratz et al. (2019), the hosphor-mimicking FITmS272E protein did not show significant changes in its behavior compared to wild-type FIT.”

      Line 444: Correct the sentence “Fe deficiency versus sufficiency”

      Authors:

      We corrected that, line 449-451:

      “In both, the far-red light and darkness situations, FIT was induced under iron deficiency versus sufficiency, while on the other side, BHLH039, FRO2 and IRT1 were not induced at all in these light conditions (Figure 9I-P).”

      **Referees cross-commenting**

      I agree with R1 suggestions/comments and i think manuscript quality will be much better if authors carry out the experiments suggested by R1. I believe this will also strengthen their conclusions.

      Reviewer #2 (Significance (Required)):

      Overall, manuscript is well written, authors have done a nice job by doing several key experiments to support their findings and conclusions. However, the results and manuscript can be improved further by addressing some question raised here. This study is interesting for basic scientists which unravels the crosstalk of light signaling in nutrient signaling pathways.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      In this manuscript, Hoskins et al describe analyses of the effects of sequence variation on RNA levels, protein levels, and ribosome loading for the COMT gene. They use multiple experimental approaches to assay these levels and report on how sequence differences affect expression. Overall, the paper is interesting in that it presents a very deep dive into the effects of sequence variation on gene expression, including in coding sequences. However, there are some issues with the polysome loading assay technique and there are substantial issues with the figure presentation, which is often confusing.

      __Response: __Thanks for the positive assessment of our manuscript and the constructive feedback regarding the issues with the figure presentation. We have addressed all of these below and they have significantly improved the clarity.

      • Major comments:*

      • 1) Figures:*

      • --Fig 1C needs a cartoon description to show where the UTRs are. Y-axis should say "Ribo-seq CPM"*

      __Response: __Fig 1C now includes a schematic and the y-axis is updated. Locations of the uORFs are also now included in Fig 1A.

      • --Sup Fig 1A confusing, what is "start" what is the point of this panel?*

      __Response: __We apologize for the confusing labeling of the panels in Sup Fig 1. “Start” refers to the MB-COMT start codon. We removed this annotation as it is irrelevant to the figure. We included Supplementary Figure 1A to show RNA probing data for the entire transcript. Figure 1A and B only show the regions that encompass the variants assayed in our study.

      • --Sup Fig 1B what is PCBP del?*

      Response: “PCBP del” refers to deletion of PCBP1/PCBP2 RNA binding protein motifs. The legend now specifies this.

      • --Sup Fig 1C what is "uORF B restore"? The description in the figure legend is not interpretable. Draw diagrams of the mutations that tell the reader what was assayed and why it was assayed. Why are there multiplication factors listed (e.g. 1.33X)? The data are depicted on a log scale, which makes it difficult to appreciate the fold-effects of the mutations (e.g. does uORFA mutation increase expression 1.5-fold?). Please calculate median expression values and report them on a bar graph or something like that so readers can interpret the results.*

      Response: “uORF B restore” refers to restoration of the endogenous uORF B frame with a silent variant in the Flag tag of the transgene. The multiplication factors listed were the fold change in median fluorescence between each mutant and the template (wild-type) transgene. We retained the figures as they show the raw distribution of fluorescence in each cell line, but in response to the reviewer’s suggestion we included a new figure displaying the effects as a bar graph (Supplementary Figure 1E).

      • --Fig 2A. It's hard to understand the cartoon diagram of the expression reporter construct. Why is +Dox shown here? Does that induce transcription?*

      __Response: __The reviewer is correct. “+Dox” indicated addition of Doxycycline to induce transcription before the data collection step. We agree that there may have been too much detail in this diagram and have now removed this for simplicity and indicated this in the Methods section.

      • --Fig 2B. What's on the x-axis? is it Log2(RNA/gDNA) from sequencing? is it Log2 or Log10 or Ln?*

      __Response: __Variant effects in each figure were derived from ALDEx2 analysis, which reports effect size as the median standardized difference between groups. The effect size is not directly interpretable as a log fold change; it takes into account the difference between groups as well as the dispersion. This analysis strategy has been previously demonstrated for analysis of SELEX experiments (Fernandes et al. 2014), which are used to select small populations of cells with specific phenotypes.

      ALDEx2 is a robust and principled choice for the analysis of count-compositional datasets, particularly after selection (e.g. sorted cell populations or low-input RNA fractions arising from polysome profiling). While we understand that this choice leads to less easily interpretable effect sizes, the mathematical advantages make ALDEx2 a more appropriate choice for this type of data. In the past, we had used other methods to analyze log frequencies (limma, a frequency based normalization-dependent analysis, as previously employed in Hoskins et al. 2023. Genome Biology) that directly reported fold changes. In our experience, the ALDEx2-derived effect sizes are well-correlated with those estimates (Pearson correlation 0.93 for variants significant at a FDR

      • --Fig 2C. What's on the y-axis (same question). I think it's LogX(mutant/wt)RNA level?*

      __Response: __For consistency with other figures, we replaced Figure 2C to report the effect size statistic as described above.

      • --Fig 2D. What's on the y-axis now? Fold-difference (not log transformed)?*

      __Response: __Please see our response above.

      • --Fig 2E. The scale bar is flipped vs. normal convention. This is also log transformed, but it's not labeled. Please label as log(whatever) and put the negative values on the left side of the bar (red on the left, blue on the right).*

      __Response: __Thanks for the suggestion, we have now updated the scale bar.

      --Fig 2F y-axis should say Ribo-seq CPM.

      __Response: __Done

      • --Fig 3A - please separate the graphs more. Did you sort cells from ROI2 into populations, or just cells from ROI1?*

      __Response: __Thanks for the suggestion, we now separate the graphs further. Cells were sorted for both ROI 1 and ROI 2 libraries.

      • --Fig3C-F What's the "effect size" mean on these graphs?*

      __Response: __Please see the response above regarding the effect size estimate from ALDEx2.

      • --Fig3D It looks like the colors have switched for positive / negative "effects" on the heat map*

      • compared to Figure 2E. Please define what "median effect" means and be consistent with*

      • comparison to figure 2E.*

      __Response: __We intentionally inverted colors for Figure 3. The rationale is that a variant causing low protein abundance corresponds to enrichment in P3 compared to gDNA, as opposed to depletion in P3. On the other hand, for effects on RNA abundance and ribosome load, a variant leading to low abundance for these measures is depleted.

      • --Figure 4 what does effect size mean, what's the log-transformed scale (log2, 10, etc) same issues from earlier figures.*

      __Response: __Please see response above.

      • --Figure 5 "effect size"*

      __Response: __The same definition of effect size was used with the exception that effect sizes are multiplied by -1 so that color schemes are consistent for deleterious effects.

      • 2) "Codon stability" should always be "Codon Stability Coefficient", maybe use "CSC". Otherwise it's confusing.*

      __Response: __Thanks for the suggestion. This has been updated throughout the manuscript.

      3) Flow cytometry section talks about "RNA fluorescence", which is confusing. You need to explain that it's IRES-driven mCherry as a proxy for the level of RNA first. It would also help to state explicitly that you sorted the cells into four populations, and define them all first before describing the results.

      __Response: __We apologize for the use of imprecise language with respect to this reporter. We revised the text to emphasize that mCherry is a proxy for RNA abundance and described the populations first as suggested.

      4) What are DeMask scores? How are they related to conservation or amino acid properties? If you define these, you can help the reader interpret the result.

      __Response: __Thanks for the suggestion. We now include a conceptual interpretation of the DeMask score in the relevant section. We also include a comparison to a recent large language model for variant effect prediction (ESM1b, Brandes et al. 2023) which is now reported in Supplementary Figure 5C.

      5) There are several issues with the Polysome gradient fractionation. The gradients did not separate 40S, 60S, and monosomal fractions, so it's hard to tell how many ribosomes correspond to each peak on the gradient graph in Figure S5. This is probably because the authors used a 20-50% gradient instead of a lower percentage on top. More significantly, variations in the coding region of COMT are likely affecting the polysome association in ways the authors didn't consider. Nonsense codons will simply make the orf a lot shorter, hence fewer ribosomes. This may have nothing to do with NMD. Silent and missense variants may have unpredictable effects because they may make translation faster (fewer ribosomes) or slower (more ribosomes) on the reporter. This could lead to more ribosomes with less protein or fewer ribosomes with more protein. The reporter RNA also has an IRES loading mCherry on it, which probably helps blunt or dampen the effects of the COMT sequence variants on polysome location distribution. Overall, the design of the polysome assay is probably very limited in power to detect changes in ribosome loading (four fractions, limited separation by 20-50 gradient, IRES loading, etc). This is partially addressed in the limitations section, but these issues could be discussed in more detail.

      __Response: __Given high polysomal association of endogenous COMT and our COMT transgene (Supplementary Figure 2B, Supplementary Figure 5B-C), we chose a 20-50% sucrose gradient to better resolve changes in ribosome load among heavy polysomes.

      We thank the reviewer for offering another valid explanation regarding the depletion of nonsensense variants. We have now included a sentence in the discussion to indicate lower ribosome load for nonsense variants may be due to a shorter ORF as opposed to NMD. We further include the potential limitation of the assay due to the presence of the IRES-mCherry.

      We agree that variants may have unpredictable effects due to effects on the dynamics of translation elongation. To address this potential limitation, we attempted to devise a selective ribosome profiling strategy by immunoprecipitating N-terminal Flag tagged peptides to enrich ribosomes translating COMT. However, we were unable to achieve significant enrichment, limiting our ability to measure variant effects on elongation in a high-throughput manner.

      Significance

      The study is novel in that it assays both 5' UTR and a wide range of protein coding sequence variants for effects on RNA and protein levels from a clinically important gene, COMT. The manuscript reports that most protein coding variants have modest effects on RNA levels, and that the minority of variants that do affect RNA levels are not predictable due to their affect on codon usage. The work also determines the distribution of effects of variants on protein levels, finding a variety of effects on expression. Interestingly, the authors found SNPs that affect ribosome loading generally affect RNA structure of the COMT coding region, rather than affecting codon usage.

      This should appeal to many different communities of biologists - gene expression experts, geneticists, and clinical neurobiologists who focus on COMT. So there is a potential for fairly broad interest. The main limitations to the work are in a lack of clarity in the figures and perhaps in the underdeveloped nature of the discussion section. The discussion section reports new results (SNP associations that affect expression). These would make more sense in the results section, such that the discussion could do a better job relating the impact of sequence variants on expression levels to prior work to highlight the novelty.

      __Response: __We thank reviewer #1 for their positive assessment of the broad significance of our study. We also thank them for constructive suggestions that led to increased clarity in the presentation. We have moved the analysis of gnomAD variants to the Results section and expanded the discussion.

      Reviewer #2

      Evidence, reproducibility and clarity

      Summary:

      Hoskins and colleagues expressed a reporter containing all silent, missense, and nonsense codons at 58 amino acid positions in the human COMT gene in HEK293T cells and measured levels of DNA, bulk RNA, and pooled polysomal mRNA. They included a C-terminal translational GFP fusion and a downstream transcriptional mCherry fusion in the reporter in order to also bin variants by their relative protein and mRNA levels by flow cytometry. They hypothesized that RNA structure, in-part by mediating uORF translation, influences COMT gene expression. The authors conclude by identifying previously-uncharacterized COMT variants that, in this reporter system, affect RNA abundance and ribosome load. We generally found the results of this paper convincing and clear. We do not have major comments, but have many minor comments that we hope the authors can address. These comments mostly deal with clarification on analysis metrics and giving recommendations on data presentation.

      __Response: __Thanks for highlighting the strengths of our study and the constructive suggestions to improve the presentation.

      Minor comments:

      In Figure 2C, the vertical axis reads "Median between-group difference". How was this metric calculated and normalized? We also agree that nonsense mutations having consistently-detrimental effects on RNA abundance is reassuring, but recommend more explanation as to why the difference in the effects of silence and missense mutations between regions may be biologically relevant.

      __Response: __Variant effects in each figure derive from ALDEx2 analysis, which reports effect size as the median standardized difference between groups. In particular, to avoid any distributional assumptions for standardization, ALDEx2 uses a permutation based non-parametric estimate of dispersion. The effect size is not directly interpretable as a log fold change; it takes into account the difference between groups as well as the max dispersion of the groups. We have now provided explicit references to the specific R functions that were used to calculate the effect size.

      ALDEx2 is robust for analysis of count-compositional datasets, particularly after selection and bottlenecking (e.g. sorted cell populations or low-input RNA fractions arising from polysome profiling). While we have used other methods to analyze log frequencies (limma, a frequency based normalization-dependent analysis, as previously employed in Hoskins et al. 2023. Genome Biology), we opted for the less-interpretable but more robust ALDEx2 analysis to report variant effects between varying nucleic acid inputs.

      We currently lack a mechanistic interpretation for the difference in RNA abundance effects between ROI 1 and 2. However, we observed consistent results using a different analysis framework, which makes use of variant frequencies (as in Hoskins et al. 2023 Genome Biology) instead of the centered log ratios used in ALDEx2 analysis, further supporting a biological difference between the two.

      In Figure 3, we believe that the authors are claiming that lower RNA abundance causes lower protein abundance in some variants. However, this data only reports on protein abundance relative to transcript abundance, not absolute protein abundance. We think the claim should be revised to (1) clarify that the authors are measuring protein per mRNA, and (2) express that lower mRNA amounts are more likely to co-occur with lower protein amounts, but that this data does not support any causative model.

      __Response: __Thanks for the suggestion. We have now included an explicit description of the experimental design in the results section and noted that we are unable to assign protein abundance effects to underlying RNA abundance effects. In the current setup, we did not sort cells based on the ratio of moxGFP/mCherry fluorescence (protein per mRNA), but rather we defined gates based on the 2D plot of moxGFP versus mCherry. This is explicitly marked in Figure 3A.

      On page 9, the authors claim that their data supports a model that rs4633 increases RNA

      abundance, leading to higher COMT expression. Can the authors rule out a model whereby rs4633 facilitates translation initiation, as suggested by Tsao et al. 2011, leading to both an increase in mRNA and protein abundance?

      __Response: __Thanks for this question and opportunity to clarify. We have now added a sentence to the Discussion and included the following paragraph in the Supplementary Note:

      “Importantly, our study does not rule out a model where rs4633 facilitates translation initiation. Nevertheless, our data suggest a potential concurrent mechanism where rs4633 leads to higher protein abundance in human cell lines and in an in vitro translation assay (Tsao et al. 2011) by increasing RNA abundance. We note that Tsao et al did not directly measure RNA abundance in their study. In Supplementary Figure 3A of Nackley et al 2006, the APS haplotype containing rs4633 C>T showed slightly higher total RNA abundance compared to the LPS haplotype (in our study, the wild-type template). However, this was not statistically significant and was only observed for the S-COMT isoform. It is possible that our observations are compatible with the conclusions in Tsao et al. 2011. For example, increased translation of rs4633 C>T may lead to stabilization of the RNA.”

      The paper references "effect size" at multiple points (e.g. "polysome effect size") but we could not find this term explicitly defined (for example: for the polysome effect size, were RNA counts for each polysome fraction divided by the relative abundance of that RNA in total RNA?)

      __Response: __We apologize for this confusion. Please see our response above. We have also stated the definition of effect size explicitly in the revised manuscript.

      Could you elaborate on how you define "protein abundance and "effect size: in Figure 5G? How is enrichment in P3 or P1 calculated?

      __Response: __Effect size is defined as described above. Enrichment in P3 or P1 is calculated with respect to the abundance in gDNA (unsorted cells).

      Were 3396 variants considered for all readouts in this paper? How many of these variants were present in each ROI? It may be worth clarifying sample sizes.

      __Response: __Thanks for the suggestion. The reviewer is correct: 3396 variants were present in all biological replicates and all readouts (after excluding polysome metafractions 1 and 2 and flow cytometry population 4). The Methods were updated to include all readouts that were dropped. The number of variants in each ROI are now included in this section of the main text.

      How did Twist generate these mutagenized sequences? We assumed that they used error-prone PCR due to the mention of multiple nucleotide polymorphisms, but couldn't find an explicit answer.

      __Response: __Twist generates these mutagenized inserts using degenerate primers. This allows all alternate codons to be assayed (all silent, missense changes). This is now noted in the Methods.

      https://www.twistbioscience.com/resources/technical-note/solid-phase-dna-synthesis-allows-tight-control-combinatorial-library

      In the methods, it may be worth elaborating on the composition of the HsCD00617865 plasmid. For example: this COMT reporter is under the control of a constitutively-expressed T7 promoter, correct?

      __Response: __The HsCD00617865 plasmid was only used as a template for PCR amplification and generation of the transgene. The transgene is cloned into a vector containing attB sites for recombination into the landing pad cell line (Matreyek et al 2020). Transcription is induced by Doxycycline from the landing pad locus. Plasmid maps used for transfection into the landing pad line are now included in the GitHub repository.

      In Supplementary Figures 4 and 5, it would be helpful to explicitly say that you are reporting Pearson correlations between biological replicates.

      __Response: __Thanks for the suggestion. The legends have been updated accordingly.

      "After summarizing biological replicates (N=4) for each readout...": how did the authors summarize biological replicates? Were counts averaged?

      __Response: __Biological replicates were summarized using the median. This is now clarified in the Methods.

      The authors used pairwise correlations between flow cytometry fractions, polysome fractions, and total RNA/gDNA as indications of data quality. Do the authors expect for these counts to be strongly correlated? We would not necessarily expect to see a strong correlation between ribosome load and RNA/gDNA.

      __Response: __We used replicate correlation as an indicator of data quality. Our readouts of ribosome load reflect the abundance of a variant in a particular polysome fraction. Given that variants that are highly abundant in the RNA pool will on average be more highly represented in polysome fractions, we would expect a correlation between the abundance of a variant in total RNA and in polysome fractions.

      The authors may need to check that their standard deviations on fold changes are properly reported.

      __Response: __iIn the Figures and the main text, we specified the confidence intervals as calculated by ALDEx2 method instead of reporting standard deviations on fold changes,. Specifically, the confidence intervals were determined by Monte Carlo methods that produce a posterior probability distribution of the observed data given repeated sampling. Variants in which the confidence intervals do not cross 0 are considered true discoveries (section 5.4.1 of the ALDEx2 vignette on Bioconductor).

      https://www.bioconductor.org/packages/devel/bioc/vignettes/ALDEx2/inst/doc/ALDEx2_vignette.html#541_The_effect_confidence_interval

      We would expect standard deviation bounds to be symmetric for log fold changes, but not on unlogged fold changes - for example see page 8, for the sentence "our point estimate for nonsense variant effects on COMT RNA abundance was approximately a two-fold decrease relative to the gDNA frequency (fold change of 0.43 +/- 0.13; mean +/- standard deviation; Methods)."

      __Response: __Thanks for the suggestion. To avoid any confusion about the symmetry, we replaced the +/- notation, and explicitly noted the mean and standard deviation. To help the reader gain an intuition of the magnitude of variant effects, we conducted a frequency based normalization-dependent analysis using limma (as previously employed in Hoskins et al. 2023. Genome Biology). We now report a fold change (unlogged) for RNA abundance compared to gDNA abundance. The point estimate is the mean and s.d. across all nonsense variants.

      On page 10, the authors say that their data suggests that hydrophobicity in the early coding region of COMT may be important for COMT folding. If this is the case, would we expect to see this effect in flow cytometry data (which is affected by protein degradation) and not polysome profiling (which is unaffected by post-translational protein degradation)?

      __Response: __We apologize as we are uncertain about the reviewer’s intended question. The section that refers to the importance of hydrophobicity indeed refers to the flow cytometry data. While there are specific instances in which the amino acid properties encoded by the mRNA influences translation dynamics, these are not universally true. Consequently, we did not expect these impacts to be observed at the level of polysome profiling.

      We believe that we would have some trouble replicating the analysis from this paper from the raw data, given that the bulk of the analysis on GitHub is presented as a single R Markdown file, with references to local files to which we do not have access. We recommend that the authors add additional documentation to their repository to facilitate re-analysis.

      __Response: __Thanks for the opportunity to address this issue of critical importance. To facilitate replication, we have now deposited all analysis files to Zenodo and refactored the code to enable replication by simply running a markdown file.

      In Figure 1B, indicating that more signal indicates less structure (in the legend or the figure itself) may assist readers who are unfamiliar with DMS-seq.

      __Response: __Thanks for the suggestion. This is now updated.

      Figure 1C does a great job presenting evidence for the translation of uORFs, but does not seem to flow with the overall argument of the paper, so may fit better in the supplement.

      __Response: __We considered this suggestion, and opted for keeping its placement as it gives evidence that our transgene is translated primarily as the MB-COMT isoform. This ensures that, for variants upstream of the S-COMT isoform, we can assay effects on ribosome load that are tied to mechanisms of translation elongation and codon stability.

      We believe there is a typo in the Figure 1 legend that should read "K562" instead of "H562".

      __Response: __Thank you, this was indeed a typo.

      You also gated to separate into P1-P4, correct? Can you also show the bounds of that gating

      strategy in Figure 3A?

      __Response: __This has been updated. We also added the gating strategy in response to comments from reviewer #1.

      We find Figure 3F very compelling. Do you have any theories as to why mutating I59-H66 to

      nonpolar, uncharged residues leads to increased COMT expression?

      __Response: __We do not have any theories for why this may be. However, we noted that with the exception of V63, residues I59-H66 are not evolutionarily constrained (based on DeMask entropy values). This suggests mutational tolerance for nonpolar, uncharged residues in this region (with the exception of V63 and H66; see Figure 3D).

      There appears to be a non-negligible proportion of di- and tri- nucleotide polymorphisms in Supplementary Figure 4. Were these excluded in downstream analyses?

      __Response: __These variants are expected from the Twist mutagenesis strategy and included in analysis. We believe they are at lower frequency compared to SNPs due to less favorable annealing of the degenerate primers.

      A minor typo in the discussion reads "fluoresce".

      __Response: __Done

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This work investigated the regulatory effects of thousands of coding variants in the COMT gene, focusing on two regions with clinical significance, by using high-throughput reporter assays. The results from this will be useful for clinical scientists interested in understanding the impacts of COMT mutations and be a useful framework for other systems/computational biologists to understand the impacts of coding mutations across different levels of regulatory function. Mutations in protein regions, if having a function, are classically known to interfere with protein function. There are fewer large-scale efforts to understand the impacts of coding mutations affecting expression through potentially changing of RNA structure or codon optimization - this work has contributed towards that frontier.

      Place the work in the context of the existing literature (provide references, where appropriate). This is (as far as I am aware) the first paper that has integrated high-throughput screens massively parallel reporter assays from RNA degradation, ribosomal load, and flow cytometry. Previous papers have tended to measure on expression regulation on only one dimension (i.e. Greisemer et al. 2023 on RNA degradation, Sample et al. 2019 on ribosomal load, and de Boer at al. 2020 on protein expression).

      __Response: __Thanks for highlighting the novelty of our approach compared to existing strategies in the literature.

      State what audience might be interested in and influenced by the reported findings.

      Clinicians/researchers interested in COMT, computational biologists, geneticists and potentially structural biologists interested in understanding the consequences of amino acid mutations on RNA/protein expression

      __Response: __Thanks for noting the broad significance of our study.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Genomics, Massively parallel reporter assays, High-throughput regulatory screens.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *This manuscript reports on transcript sequence variants that affect expression of the gene COMT. Targeted analysis of SNPs identifies 5' UTR variants that affect COMT, leading to the identification of translated uORFs. Common coding sequence SNPs do not affect COMT expression, however. Massively parallel analyses of mRNA abundance, protein abundance, and translation are combined to look more broadly at coding sequence variants. These analyses focus on regions of predicted structure in the COMT transcript. Both silent and missense mutations that increase mRNA abundance are identified. Protein abundance is then measured and many missense mutations are found to change protein levels. To address translation directly, analysis of polysome loading is performed and significant differences are identified, although technical challenges limit data quality in these experiments. These different experiments are then analyzed jointly to classify mutation effects and identify a class of silent mutations with expression effects, leading to a proposal that these act through structure. *

      *The joint, integrative analysis of COMT variants through a range of methods allows clearer insights into interconnected post-transcriptional effects. The massively parallel experiments generate high-quality data, although targeted validation of key results would strengthen the work. The findings advance our understanding of silent variant effects, which remains an open question, and technical innovations could find broader applications. *

      __Response: __Thanks for the positive assessment of the quality of the data generated and the potential for the broader application of the technical innovations.

      *I do have concerns with the present version of this work. *

        • There is no validation presented for high-throughput experimental data. I would say that validating the effects of M152T and V63V variants from Figure 2B would substantially strengthen the work and support key conclusions. * __Response: __Our experiments collectively enabled nearly 10,000 measurements of variant effect (summed over three layers of gene expression). The goal of our study was to identify broad mechanisms of variant effect. While we are excited about the specific variants uncovered, targeted experimental methods for validating changes to RNA abundance, such as RT-qPCR, are unlikely to be sufficiently sensitive. For example, RNA abundance effects in our study had a median effect size of 1.47 for variants up in RNA, and 0.4 for variants down in RNA. This likely corresponds to less than one Ct difference between the variant and the reference allele. Indeed, previous studies such as Findlay et al., 2018 Nature that reported similar effect sizes (FGF7 and FOS, respectively (Figure 4B).

      Thus, for time and cost concerns, we respectfully suggest that targeted experiments involving V63V and M152T are beyond the scope of our study. Nevertheless, to further strengthen our conclusions, we have computationally confirmed our findings using a different analysis framework. We found 75/76 of the variants significant by ALDEx2 analysis were also significant by limma analysis (a frequency based normalization-dependent analysis, as previously employed in Hoskins et al. 2023. Genome Biology) using the same FDR (0.1).

      • In the fluorescent reporter scheme, it seems that variants reducing mRNA abundance should be enriched in the "P2" gate region relative to "P1", as they would have lower mRNA abundance and correspondingly lower protein abundance. However, this analysis is not performed, and instead P1 and P3 are compared (Figure 3G), which would seem to focus on protein-level effects. *

      __Response: __Our initial hesitation in comparing P2 to P1 is that the P2 population may be enriched for cells that underwent inefficient induction of transcription with Doxycycline. Hence technical factors as opposed to the effect of the variants may dominate this comparison. In response to the reviewer’s comments, we carried out the suggested analysis (new Supplementary Figure 5B). We found that variants that are down in RNA are enriched in P2 relative to P1 as expected. This is now noted in the Results section.

      • In general the work classifies variants in several different ways and it would help to be a little clearer in naming these classes. For instance, in describing the FACS-based analysis of variant expression it is written, "protein fluorescence conditioned on RNA fluorescence" which is confusing at best-it's a fluorescence-based measurement that is used indirectly to measure COMT reporter abundance. *

      __Response: __Thanks for the suggestion. We agree that our initial word-choice was imprecise. We rewrote this section to indicate mCherry fluorescence is an indirect proxy for RNA abundance.

      • Likewise, the populations with shifted GFP/mCherry ratio in this assay are described as "uncorrelated" populations, which is opaque and somewhat inaccurate-there seems to be a correlation in this group but at a different ratio. *

      __Response: __We have revised the language in the manuscript. We opted for “low or high RNA/protein abundance” to indicate the relationship between GFP and mCherry fluorescence in populations P3 and P4.

      • In the same way, "deleterious variants" is used to describe protein abundance changes, but this term implies a fitness effect and is not very specific. *

      __Response: __We apologize for the confusing word choice. We did away with this term in favor of “variants with low protein abundance”.

      • In discussing the effects of missense COMT variants on protein levels, there is an implicit assumption that degradation of mis-folded protein (or perhaps properly-folded protein with excess hydrophobic exposure?) explains these effects. This is plausible, but it would help to lay out this reasoning more clearly. *

      __Response: __Thanks for the suggestion. We have added a sentence at the end of the section that specifies this assumption and cites a recent study reporting that rare missense variants in COMT may be misfolded and degraded by the proteasome (Larsen et al. 2023).

      • It is written that,"In line with codon stability as a predictor of translational efficiency (Presnyak et al., 2015), variants with low codon optimality were depleted from polysomes compared to variants with optimal codons". However, this mis-states the conclusions of the cited study, which notes, "Importantly, under normal conditions the ribosome occupancy of the HIS3 opt and non-opt constructs was determined to be similar (Fig. 6B)". *

      __Response: __We apologize for mis-stating the conclusions of Presnyak et al. 2015. We have now revisited the relevant literature to more accurately place our conclusions in the context of literature. While Presnyak et al. and several other studies (Bazzini et al., 2016; Mauger et al., 2019) have clearly linked the association between codon choice and mRNA stability. We now reference Mauger et al. 2019 who used elegant experiments to demonstrate that mRNA secondary structure is a driver of increased protein production and synergizes with codon optimality (Figure 5B). Their results further support the role of codon optimality on RNA stability while providing evidence of additive impact on translation efficiency.

      • It is written that, "One intriguing possibility is to develop multiplexed assays of variant effect on RNA folding, using mutational profiling RNA probing methods (Weng et al., 2020; Zubradt et al., 2017)." How would this differ from the "Mutate and Map" approach in doi://10.1038/nchem.1176 and subsequent work from the same group? *

      __Response: __Thanks for pointing out the more recent work following the initial papers in 2010-2011. We have missed the work from the Das lab that extended the Mutate and Map approach to utilize mutational profiling (Cheng and Kladwang et al., 2017). We updated our Discussion to indicate that the proposed assay has been pioneered and is a viable approach for high-throughput determination of variant effects on RNA folding.

      Because mutational profiling methods leverage reverse transcriptase readthrough and mismatch incorporation, they enable deeper and more uniform coverage of sequencing reads, particularly for longer transcripts. A key design principle of the proposed assay is to mutagenize only certain types of variants in the library such that they do not overlap RT mismatch signatures arising from the RNA probing reagent/RT enzyme. For example, readthrough of DMS base adducts largely generates A>N or C>N mismatches, so a variant library would be designed to only contain variants at G or T bases. This ensures variants in the library can be differentiated from signals of the RNA probing method.

      ***Referees cross-commenting** *

      *I generally agree with the other reviewers and found that many small points on the figures were confusing, and in some cases the values being computed and displayed were under-specified. *

      *I agree with Reviewer 1 that the polysome fractionation probably has limited power due to experimental design, and that the interpretation of changed ribosome loading is subtle. *

      __Response: __In response to these helpful comments, we have clarified the points highlighted by the reviewers and expanded the limitations section related to the ribosome loading assay. Thanks for these constructive suggestions to strengthen our study.

      *Reviewer #3 (Significance (Required)): *

      *The joint, integrative analysis of COMT variants through a range of methods allows clearer insights into interconnected post-transcriptional effects. The massively parallel experiments generate high-quality data, although targeted validation of key results would strengthen the work. The findings advance our understanding of silent variant effects, which remains an open question, and technical innovations could find broader applications. *

      __Response: __Thanks for pointing out the high-quality of the generated data and the broad significance of our study. The goal of our study was to identify broad mechanisms of variant effect instead of focusing on differential expression for any specific variants.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Hoskins and colleagues expressed a reporter containing all silent, missense, and nonsense codons at 58 amino acid positions in the human COMT gene in HEK293T cells and measured levels of DNA, bulk RNA, and pooled polysomal mRNA. They included a C-terminal translational GFP fusion and a downstream transcriptional mCherry fusion in the reporter in order to also bin variants by their relative protein and mRNA levels by flow cytometry. They hypothesized that RNA structure, in-part by mediating uORF translation, influences COMT gene expression. The authors conclude by identifying previously-uncharacterized COMT variants that, in this reporter system, affect RNA abundance and ribosome load.

      We generally found the results of this paper convincing and clear. We do not have major comments, but have many minor comments that we hope the authors can address. These comments mostly deal with clarification on analysis metrics and giving recommendations on data presentation.

      Minor comments:

      In Figure 2C, the vertical axis reads "Median between-group difference". How was this metric calculated and normalized? We also agree that nonsense mutations having consistently-detrimental effects on RNA abundance is reassuring, but recommend more explanation as to why the difference in the effects of silence and missense mutations between regions may be biologically relevant.

      In Figure 3, we believe that the authors are claiming that lower RNA abundance causes lower protein abundance in some variants. However, this data only reports on protein abundance relative to transcript abundance, not absolute protein abundance. We think the claim should be revised to (1) clarify that the authors are measuring protein per mRNA, and (2) express that lower mRNA amounts are more likely to co-occur with lower protein amounts, but that this data does not support any causative model.

      On page 9, the authors claim that their data supports a model that rs4633 increases RNA abundance, leading to higher COMT expression. Can the authors rule out a model whereby rs4633 facilitates translation initiation, as suggested by Tsao et al. 2011, leading to both an increase in mRNA and protein abundance?

      The paper references "effect size" at multiple points (e.g. "polysome effect size") but we could not find this term explicitly defined (for example: for the polysome effect size, were RNA counts for each polysome fraction divided by the relative abundance of that RNA in total RNA?)

      Could you elaborate on how you define "protein abundance and "effect size: in Figure 5G? How is enrichment in P3 or P1 calculated?

      Were 3396 variants considered for all readouts in this paper? How many of these variants were present in each ROI? It may be worth clarifying sample sizes.

      How did Twist generate these mutagenized sequences? We assumed that they used error-prone PCR due to the mention of multiple nucleotide polymorphisms, but couldn't find an explicit answer.

      In the methods, it may be worth elaborating on the composition of the HsCD00617865 plasmid. For example: this COMT reporter is under the control of a constitutively-expressed T7 promoter, correct?

      In Supplementary Figures 4 and 5, it would be helpful to explicitly say that you are reporting Pearson correlations between biological replicates.

      "After summarizing biological replicates (N=4) for each readout...": how did the authors summarize biological replicates? Were counts averaged?

      The authors used pairwise correlations between flow cytometry fractions, polysome fractions, and total RNA/gDNA as indications of data quality. Do the authors expect for these counts to be strongly correlated? We would not necessarily expect to see a strong correlation between ribosome load and RNA/gDNA.

      The authors may need to check that their standard deviations on fold changes are properly reported. We would expect standard deviation bounds to be symmetric for log fold changes, but not on unlogged fold changes - for example see page 8, for the sentence "our point estimate for nonsense variant effects on COMT RNA abundance was approximately a two-fold decrease relative to the gDNA frequency (fold change of 0.43 +/- 0.13; mean +/- standard deviation; Methods)."

      On page 10, the authors say that their data suggests that hydrophobicity in the early coding region of COMT may be important for COMT folding. If this is the case, would we expect to see this effect in flow cytometry data (which is affected by protein degradation) and not polysome profiling (which is unaffected by post-translational protein degradation)?

      We believe that we would have some trouble replicating the analysis from this paper from the raw data, given that the bulk of the analysis on GitHub is presented as a single R Markdown file, with references to local files to which we do not have access. We recommend that the authors add additional documentation to their repository to facilitate re-analysis.

      In Figure 1B, indicating that more signal indicates less structure (in the legend or the figure itself) may assist readers who are unfamiliar with DMS-seq.

      Figure 1C does a great job presenting evidence for the translation of uORFs, but does not seem to flow with the overall argument of the paper, so may fit better in the supplement.

      We believe there is a typo in the Figure 1 legend that should read "K562" instead of "H562".

      You also gated to separate into P1-P4, correct? Can you also show the bounds of that gating strategy in Figure 3A?

      We find Figure 3F very compelling. Do you have any theories as to why mutating I59-H66 to nonpolar, uncharged residues leads to increased COMT expression? There appears to be a non-negligible proportion of di- and tri- nucleotide polymorphisms in Supplementary Figure 4. Were these excluded in downstream analyses?

      A minor typo in the discussion reads "fluoresce".

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This work investigated the regulatory effects of thousands of coding variants in the COMT gene, focusing on two regions with clinical significance, by using high-throughput reporter assays. The results from this will be useful for clinical scientists interested in understanding the impacts of COMT mutations and be a useful framework for other systems/computational biologists to understand the impacts of coding mutations across different levels of regulatory function. Mutations in protein regions, if having a function, are classically known to interfere with protein function. There are fewer large-scale efforts to understand the impacts of coding mutations affecting expression through potentially changing of RNA structure or codon optimization - this work has contributed towards that frontier.

      Place the work in the context of the existing literature (provide references, where appropriate).

      This is (as far as I am aware) the first paper that has integrated high-throughput screens massively parallel reporter assays from RNA degradation, ribosomal load, and flow cytometry. Previous papers have tended to measure on expression regulation on only one dimension (i.e. Greisemer et al. 2023 on RNA degradation, Sample et al. 2019 on ribosomal load, and de Boer at al. 2020 on protein expression).

      State what audience might be interested in and influenced by the reported findings.

      Clinicians/researchers interested in COMT, computational biologists, geneticists and potentially structural biologists interested in understanding the consequences of amino acid mutations on RNA/protein expression

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Genomics, Massively parallel reporter assays, High-throughput regulatory screens.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Through Review Commons, we received some highly favorable and constructive feedback from reviewers who are clearly knowledgeable about phylogenomics and/or the field of bacterial anti-phage immunity. We have responded to all suggestions made by the reviewers, which we feel have substantially improved and clarified the manuscript. We thank all three reviewers for their thoughtfulness and time.

      Reviewer #1

      Evidence, reproducibility and clarity

      Culbertson and Levin present an elegant computational analysis of the evolutionary history of several families of immune proteins conserved in bacteria and metazoan cells. The authors' work is impressive, revealing interesting insight into previously known connections and identifying exciting new connections that further link bacterial anti-phage defense and animal innate immunity. The results are overall well-presented and will have an important impact on multiple related fields. I have a few comments for the authors to help explain some of the new connections observed in their findings and clarify the results for a general audience.

      We thank the reviewer for their kind appraisal of our manuscript as well as their helpful comments. We found their comments to be very useful in strengthening our work and increasing the clarity of the writing.

      Comments: 1) The authors adeptly navigate difficult and changing nomenclature around cGAS-STING signaling but there may be room for clarifying terminology. Although historically the term "CD-NTase" has been used to describe both bacterial and animal enzymes (including by this reviewer's older work as well), the field has now settled on consistent use of the name "CD-NTase" to describe bacterial cGAS/DncV-like enzymes and the use of the names "cGAS" and "cGLR" to describe animal cGAS-like receptor proteins. Nearly all papers describing bacterial signaling use the term CD-NTase, and since 2021 most papers describing divergent cGAS-like enzymes in animal signaling now use the term "cGLR" (for recent examples see primary papers Holleufer et al 2021 PMID 34261128; Slavik et al 2021 PMID 34261127; Li et al 2023 PMID 37379839; Cai et al 2023 PMID 37659413 and review articles Cai et al 2022 PMID 35149240; Slavik et al 2023 PMID 37380187; Fan et al 2021 PMID 34697297; West et al 2021 PMID 34373639 Unterholzner Cell 2023 PMID 37478819). Kingdom-specific uses of CD-NTase and cGLR may help add clarity to the manuscript especially as each group of enzyme is quite divergent and many protein members synthesize signaling molecules that are distinct from cyclic GMP-AMP (i.e. not cGAS).

      Related to this point, the term "SMODS" is useful for describing the protein family domain originally identified in the elegant work of Burroughs and Aaravind (Burroughs et al 2015 PMID 26590262), but this term is rarely used in papers focused on the biology of these systems. "eSMODS" is a good name, but the authors may want to consider a different description to better fit with current terminology.

      We appreciate the reviewer’s suggestion and have updated the text to try to be more clear (ex: using cGLR as a more specific term whenever possible). However, as OAS is distinctly not a cGLR, strict kingdom-specific use of the terms CD-NTase and cGLR is not possible. We have updated the Mab21 superfamily to be re-named as the cGLR superfamily, as those seem to be synonymous based on recent literature. At this time we are choosing to stick with the eSMODS terminology as it remains to be shown that these eukaryotic proteins have a CD-NTase-like biochemical function.

      An example of how we have tried to navigate this naming issues is:

      “The cGLR superfamily passed all four of these HGT thresholds, as did another eukaryotic clade of CD-NTases that were all previously undescribed. We name this clade the eukaryotic SMODS (eSMODS) superfamily, because the top scoring domain from hmmscan for each sequence in this superfamily was the SMODS domain (PF18144), which is typically found only in bacterial CD-NTases (Supplementary Data).”

      2) The authors state that proteins were identified using an iterative HMM-based search until they "began finding proteins outside of the family of interest" (Line 86). Is it possible to please explain in more detail what this means? A key part of the analysis pipeline is knowing when to stop, especially as some proteins like CD-NTases and cGLRs share related-homology to other major enzyme groups like pol-beta NTases while other proteins like STING and viperin are more unique.

      We have updated the text to better explain how we determined that a given protein sequence was excluded:

      “After using this approach to create pan-eukaryotic HMMs for each protein family, we then added in bacterial homologs to generate universal HMMs (Fig. 1A and Supp. Fig. 1), continuing our iterative searches until we either failed to find any new protein sequences or began finding proteins outside of the family of interest (Supp. Fig. 1). To define the boundaries that separated our proteins of interest from neighboring gene families, we focused on including homologs that shared protein domains that defined that family (see Materials and Methods for domain designations) and were closer to in-group sequences than the outgroup sequences on a phylogenetic tree (outgroup sequences are noted in the Materials and Methods). “

      We also added a section to the Methods specifically defining our outgroups:

      “As outgroup sequences, we used Poly(A) RNA polymerase (PAP) sequences for the CD-NTases, and molybdenum cofactor biosynthetic enzyme (MoaA) for viperin. We did not have a suitable outgroup for STING domains, nor did any diverged outgroups come up in our searches.”

      3) The authors comment on several controls to guard against potential contaminating bacterial sequences present in metazoan genome sequencing datasets (Lines 174-182). It may be helpful to include this very important part of the analysis as part of the stepwise schematic in Figure 1a. Additionally, have the authors used other eukaryotic features like the presence of introns or kingdom specific translation elements (e.g. Shine-Dalgarno- vs. Kozak-like sequences) as part of the analysis?

      We agree that it will be very interesting to look for these eukaryotic gene features, both to rule out contamination and to discern how eukaryotes have acquired and domesticated bacteria-like immune proteins. However, one limitation when working with the data in EukProt is that many species are represented by de novo transcriptome datasets and therefore information about the local gene environment, introns, or promoters are unavailable.

      4) A particularly surprising result of the analysis is a proposed connection between oligoadenylate synthase-like (OAS-like) enzymes and bacterial Clade C CD-NTases. A concern with these results is that previous structural analysis has demonstrated that bacterial CD-NTase enzymes and animal cGLRs are more closely related to each other than they are to OAS (Slavik et al 2021 PMID 34261127). Can the authors provide further support for a connection between OAS and Clade C CD-NTases? The C-terminal alpha-helix bundle of OAS is known to be distinct (Lohöfener et al 2015 PMID 25892109) and perhaps AlphaFold2 modeling of bacterial Clade C CD-NTases and additional OAS sequences may provide further bioinformatic evidence to support the authors' conclusions.

      We were also surprised by this finding as it seems to be in opposition to structural comparisons in studies such as Whiteley et. al 2019 (PMID 30787435). As the reviewer suggests,e used AlphaFold to predict the structures of two CD-NTases, that of Bacterioides uniformis (Clade C016) and Escherichia coli (Clade C018) as well as a previously uncharacterized OAS-like protein (Tripos fusus P058904) and compared those structural predictions to those of cGAS (PDB: 6CTA), OAS1 (PDB: 4RWO), and DncV (PDB: 4TY0). We used the DALI server to make these all vs all comparisons.

           We have not included these analyses in the manuscript as the results were largely inconclusive. The average pairwise z-score between any of these structures was around 20, with a narrow range of scores between 16 (e.g. OAS vs. DncV) and 22 (e.g. DncV vs. the Clade C CD-NTases). For reference, the z-score of a given protein compared to itself was ~50 and a z-score of 20 is a general DALI benchmark used to determine if structures are homologous ( z-scores between 8-20 are in a gray area, and 20+ are generally considered homologous).
      

      In our view, these pairwise structural comparisons suffer from essentially the same problem that is evident in phylogenetic trees containing only animal and bacterial homologs. Namely, all structures/sequences under consideration are extremely different from each other, on very long branches that are difficult to place with confidence when few homologs are being considered. The benefit of our approach is that we have the ideal species diversity to break up the long branches (particularly with respect to the OAS superfamily), allowing us to place those sequences confidently on the phylogeny.

      That said, while we have strong support for the topology of OAS within the CD-NTase tree, the interpretation of the relationships relies partly on the inferred root of the tree. In our analyses, we opted not to include a distant outgroup such as pol-beta for rooting purposes, as these sequences aligned poorly with the CD-NTases, resulting in a substantial decrease in alignment and tree quality. Instead, in Fig. 2 we present a tree that is arbitrarily rooted within the bacterial CD-NTases, as this root allows for clade C to be phylogenetically coherent. Our data are also consistent with an alternative rooting, placing OAS as an outgroup. If so, this would yield a tree that implies that OAS-like sequences could have given rise to all other CD-NTases and that, within the non-OAS sequences, all bacterial CD-NTases emerged from within Clade C. We thought it slightly more likely that the root of CD-NTases was solidly within bacteria, hence the display we chose. However, we were not intending to rule out an OAS-outgroup model here. As this response to reviewers will be publically available alongside the final manuscript, we hope this clarifies our claims about the placement of OAS.

      5) One of the most exciting results in the paper is identification of a family of putative CD-NTase enzymes conserved in metazoans. Although full description may be beyond the scope of this paper, if possible, some more analysis would be interesting here: a. Are these CD-NTase enzymes in a conserved gene neighborhood within the metazoan genomes (i.e. located next to a potential cyclic nucleotide receptor?) b. Do these metazoan genomes encode other known receptors for cyclic nucleotide signaling (PFAM searches for CARF or SAVED domains for instance). c. Similar to points 3 and 4, is it possible to add further evidence for support of these proteins as true metazoan sequences that have predicted structural homology to bacterial CD-NTase enzymes?

      Yes agreed, we think point a is an exciting avenue of questioning to pursue. However, as mentioned above, the Eukprot dataset often does not provide the relevant information for the analyses proposed. Therefore, we feel that answering questions about the genomic region of these proteins is beyond the scope of the current manuscript. In particular, all 6 of the eSMODS species are represented only by transcriptomes, making these analyses impossible.

      For point b, we searched EukProt with HMMs for SAVED domains (PF18145), finding 24 total SAVED-containing proteins in EukProt. (We did not find a CARF HMM in Pfam, Tigrfam or other databases, and so could not easily carry out these searches.) Five of the 24 SAVED-containing sequences came from species encoding an eSMODS gene. This represented 3 species out of the total 20 species where we detected a SAVED domain. While this is a potentially intriguing overlap, we cannot make a strong claim about whether these SAVED sequences derive from eukaryotes vs. bacterial contamination without undergoing the extensive searching and phylogenetic tree construction methods for SAVED domains that we have performed for our three families of interest. We expect this will be an interesting line of inquiry for a future study.

      For point c, we agree that additional evidence to support the finding that the eSMODS are eukaryotic rather than bacterial sequences would be helpful. To us, the strongest pieces of evidence would be: 1) presence of eukaryotic gene architecture, 2) adjacency to clearly eukaryotic genes in the contig, and/or 3) fluorescence in situ hybridization experiments in these species to localize where the genes are encoded. Unfortunately, the transcriptome data available does not provide this level of information. We hope that other groups will follow up on these genes and species to decide the matter more definitively. In the meantime, we feel that our filters for HGT vs. contamination have done as much as possible with the existing dataset. We have modified the text in this region to leave open potential scenarios that could be fooling us, such as the presence of unusual, long-term, eukaryote-associated symbionts in the taxa where we detect eSMODS:

      “For species represented only by transcriptomes, these criteria may still have difficulty distinguishing eukaryote-bacteria HGT from certain specific scenarios such as the long-term presence of dedicated, eukaryote-associated, bacterial symbionts. However, because these criteria allow us to focus on relatively old HGT events, they give us higher confidence these events are likely to be real. ”

      6) The authors state that obvious CD-NTase/cGLR enzymes are not present in organisms that encode the group of divergent eukaryotic "blSTINGs". Have the authors analyzed the protein-coding genes encoded immediately upstream and downstream of the blSTING proteins with AlphaFold2 and FoldSeek? It would be very exciting if putative cyclic nucleotide generating enzymes are predicted to be encoded within the nearby gene neighborhood.

      Similar to the eSMODS, the majority of the species with blSTINGs were represented by transcriptomes (22/26). We do agree that this type of analysis would be very interesting. However, we feel that this is beyond the scope of this manuscript.

      7) Line 144 appears to reference the incorrect supplementary figure. SI Figure 4 may be the correct reference?

      We agree and have made this change. We thank the reviewer for catching this error.

      I hope the authors will find my comments useful, thank you for the opportunity to read this exciting manuscript.

      Significance

      Culbertson and Levin present an elegant computational analysis of the evolutionary history of several families of immune proteins conserved in bacteria and metazoan cells. The authors' work is impressive, revealing interesting insight into previously known connections and identifying exciting new connections that further link bacterial anti-phage defense and animal innate immunity. The results are overall well-presented and will have an important impact on multiple related fields. I have a few comments for the authors to help explain some of the new connections observed in their findings and clarify the results for a general audience.

      Reviewer #2

      Evidence, reproducibility and clarity

      Describe your expertise? Molecular Evolution, Mechanisms of Protein evolution, Phylogenomics, Adaptation.

      Summary: This manuscript broadly aims to improve our understanding the evolutionary relationships between eukaryote and bacterial protein families where members of those families have immune roles. The study focuses on three such families and samples deeply across the eukaryotic tree. The approaches taken include a nice application of the EukProt database and the use of homology detection approaches that are sensitive to the issues of assigning homology through deep time. The main findings show the heterogeneity in means by which these families have arisen, with some of the families originating at least as far back as the LCA of eukaryotes, in contrast the wide spread yet patchy distribution of other families is the result of repeated independent HGT events and/or convergent domain shuffling.

      We thank the reviewer for this excellent review and their helpful comments and suggestions. We firmly believe that these comments will strengthen and clarify our work.

      Major Comments: 1. Overall the level of detail provided throughout the manuscript is lacking, perhaps the authors were constrained by a word limit for initial submission, if so then this limit needs to be extended to include the detail necessary. In addition, there are some structural issues throughout, e.g. some of the very brief intro (see later comment) reads a little more like methods (paragraph 2) and abstract (paragraph 3). The results section is lacking detail of the supporting evidence from the clever analyses that were clearly performed and the statistics underpinning conclusions are not included.

      Good suggestion, we have updated the paper to include more details and statistics on the analyses that were performed. We have also expanded on some of the most interesting findings about these bacterial innate immune proteins in the introduction (see Comment 2 below for our changes), as well as shifting the methods-like paragraph mentioned (paragraph 2) to later on in the paper. For paragraph 3, we have slimmed this down to include fewer details, but leave the final paragraph of the Introduction as a brief synopsis to prime the reader for the rest of the paper.

      1. The intro and discussion both include statements about some recent discoveries that bacteria and mammals share mechanisms of innate immunity - but there is no further detail into what would appear to be important work leading to this study. This context needs to be provided in more detail therefore I would encourage the authors to expand on the intro to include specific detail on these significant prior studies. In addition, more background information on the gene families investigated in detail here would be useful e.g. how the proteins produced influence immunity etc should be a feature of the intro. A clear and concise rationale for why these 3 particular gene families (out of all the possible innate immune genes known) were selected for analysis.

      We have added in additional background about some of the most exciting discoveries made in the past few years. We also included specific rationale as to why we chose to look at cGAS, STING, and Viperin.

      Specifically, we have added the following to the introduction:

      “ For example, bacterial cGAS-DncV-like nucleotidyltransferases (CD-NTases), which generate cyclic nucleotide messengers (similar to cGAS), are massively diverse with over 6,000 CD-NTase proteins discovered to date. Beyond the cyclic GMP-AMP signals produced by animal cGAS proteins, bacterial CD-NTases are capable of producing a wide array of nucleotide signals including cyclic dinucleotides, cyclic trinucleotides, and linear oligonucleotides [11,14]. Many of these bacterial CD-NTase products are critical for bacterial defense against viral infection[8]. Interestingly, these discoveries with the CD-NTases mirror what has been discovered with bacterial viperins. In mammals, viperin proteins restrict viral replication by generating 3’-deoxy-3’,4’didehdro- (ddh) nucleotides[4,15–17] block RNA synthesis and thereby inhibit viral replication[15,18]. Mammalian viperin generates ddhCTP molecules while bacterial viperins can generate ddhCTP, ddhUTP, and ddhGTP. In some cases, a single bacterial protein is capable of synthesizing two or three of these ddh derivatives[4]. These discoveries have been surprising and exciting, as they imply that some cellular defenses have deep commonalities spanning across the entire Tree of Life, with additional new mechanisms of immunity waiting to be discovered within diverse microbial lineages. But despite significant homology, these bacterial and animal immune proteins are often distinct in their molecular functions and operate within dramatically different signaling pathways (reviewed here[5]). How, then, have animals and other eukaryotes acquired these immune proteins?”

      In regards to why we choose to investigate CD-NTases, STING, and Viperin specifically, we have added the following to the third paragraph of the introduction:

      “We choose to focus on the cGAS, STING, and Viperin for a number of reasons. First, in metazoans cGAS and STING are part of the same signaling pathway whereas bacterial CD-NTases often act independently of bacterial STINGs[21], raising interesting questions about how eukaryotic immune proteins have gained their signaling partners. Also, given the vast breadth of bacterial CD-NTase diversity, we were curious as to if any eukaryotes had acquired CD-NTases distinct from cGAS. For similar reasons, we investigated Viperin, which also has a wide diversity in bacteria but a much more narrow described function in eukaryotes.”

      1. Context: Genome quality is always a concern, and confirming the absence of an element/protein in a genome is challenging given the variation in quality of available genomes. Low BUSCO scores mean that the assessment of gene loss is difficult to evaluate (but we are not provided with said scores). Query: in the results section it states that the BUSCO completeness scores (which need to be provided) etc were insufficient to explain the pattern of gene loss. I would like to know how they reached this conclusion - what statistical analyses (ANOVA?? OTHER??) have been performed to support this statement and please include the associated P values etc. Similarly, throughout the paper, including in the discussion section, the point is brushed over. If, given a statistical test, you find that some of the disparity in gene presence is explained by BUSCO score, most of your findings are still valid. It would just be difficult to make conclusions about gene loss.

      We have rewritten this section to be more clear about what we feel we can and cannot say about gene loss and BUSCO scores. This section now reads:

      “However, outside of Metazoa, these homologs were sparsely distributed, such that for most species in our dataset (711/993), we did not recover proteins from any of the three immune families examined (white space, lack of colored bars, Fig. 1B). While some of these absences may be due to technical errors or dataset incompleteness (Supp. Fig. 2), we interpret this pattern as a reflection of ongoing, repeated gene losses across eukaryotes, as has been found for other innate immune proteins[27–29] and other types of gene families surveyed across eukaryotes[28,30–32]. Indeed, many of the species that lacked any of the immune homologs were represented by high-quality datasets (Ex: Metazoa, Chlorplastida, and Fungi). Thus, although it is always possible that our approach has missed some homologs, we believe the resulting data represents a fair assessment of the diversity across eukaryotes, at least for those species currently included within EukProt.”

      In addition, we direct readers to EukProt v3, where the BUSCO scores are publicly available.

      “BUSCO scores can also be viewed on EukProt v3 (https://evocellbio.com/SAGdb/images/EukProtv3.busco.output.txt).”

      1. In terms of the homolog search strategy - line 394 - can you please state what an "outgroup gene family" means in this context. It is unclear but very important to the downstream interpretation of results.

      We have updated the materials and methods to specifically name our outgroups:

      “As outgroup sequences, we used Poly(A) RNA polymerase (PAP) sequences for the CD-NTases, and molybdenum cofactor biosynthetic enzyme (MoaA) for viperin. We did not have a suitable outgroup for STING domains, nor did any diverged outgroups come up in our searches.”

      1. For reproducibility, the materials and methods section needs to provide more detail/sufficient detail to reproduce these results. E.g the section describing phase 1 of the euk searches the text here repeats what is in the results section for the crystal structure work but doesn't give me any information on how, what method was used to "align the crystal structures", what scoring scheme is used and how the scoring scheme identifies "the core"? What specific parameters are used throughout. Why is MAFFT the method of choice for some of the analyses? Whereas, in other cases both MAFFT and MUSCLE are employed. What are the specific settings used for the MAFFT alignments throughout - is it default (must state if that is the case) or is it MAFFT L-INS-I with default settings etc.

      We have updated the text to include the specific settings used each time a particular software package was deployed. We also have included information for STING as to how we aligned 3 published crystal structures to determine the boundaries of homology.

      Here is how we now discuss identifying the “core” STING domain:

      “ For STING, where the Pfam profile includes regions of the protein outside of the STING domain, we generated a new HMM for the initial search. First, we aligned crystal structures of HsSTING (6NT5), Flavobacteriaceae sp. STING (6WT4) and Crassostrea gigas STING (6WT7) with the RCSB PDB “Pairwise Structure Alignment” tool with a jFATCAT (rigid) option[73,74]. We defined a core “STING” domain, as the ungapped region of 6NT5 that aligned with 6WT7 and 6WT4 (residues G152-V329 of 6NT5).Then we aligned 15 eukaryotic sequences from PF15009 (all 15 of the “Reviewed” sequences on InterPro) with MAFFT(v7.4.71)[75] with default parameters and manually trimmed the sequences down to the boundaries defined by our crystal alignment (residues 145-353 of 6NT5). We then trimmed the alignment with TrimAI (v1.2)[76] with options -gt 0.2. The trimmed MSA was then used to generate an HMM profile with hmmbuild from the hmmer (v3.2.1) package (hmmer.org) using default settings. “

      We employed three alignment softwares at specific times throughout our analyses. MAFFT was used as our default aligner for most of the analysis. Hmmalign (part of the hmmer package) was used to make the alignments prior to hmmbuild. The overall goal of this work was to reconstruct the evolutionary history of these proteins via a phylogenetic tree. To ensure that this tree topology was as robust as possible we employed the more computationally intensive, but more accurate, tree builder MUSCLE. We have updated the text in the methods section to be more clear as to why we used each software.

      We have updated the methods section to read:

      “MUSCLE was deployed in parallel with MAFFT to generate these final alignments to ensure that the final tree topology would be as robust as possible. MUSCLE is a slightly more accurate but more computationally intensive alignment software[79].”

      1. The justification for the number of HMM searches needs to be included. The choice of starting points for the HMMs was cryptic - please provide details. It is likely that you ran the search until no more sequences were found or until sequences were added from a different gene family, and that these happened to be between 3 and 5 searches, but it reads like you wanted to run it 3 or 5 times and that corresponds to the above condition. Something like this would be clearer: "The profile was [...] until no more sequences were found or until sequences from other gene families were found which was between 3 and 5 times in all cases" - the same is true of figure 1.

      We agree that this could have been worded better. We have updated the text to make it more clear that we searched until saturation which happened to occur between 3-5 searches and not that we arbitrarily wanted to do 3-5 searches.

      We have updated the text, which now reads:

      “After using this approach to create pan-eukaryotic HMMs for each protein family, we then added in bacterial homologs to generate universal HMMs (Fig. 1A and Supp. Fig. 1), continuing our iterative searches until we either failed to find any new protein sequences or began finding proteins outside of the family of interest (Supp. Fig. 1). To define the boundaries that separated our proteins of interest from neighboring gene families, we focused on including homologs that shared protein domains that defined that family (see Materials and Methods for domain designations) and were closer to in-group sequences than the outgroup sequences on a phylogenetic tree (outgroup sequences are noted in the Materials and Methods). “

      We also updated the figure legend to Fig. 1. It now reads:

      “Each set of searches was repeated until few or no additional eukaryotic sequences were recovered which was between 3-5 times in all cases.”

      1. Why do you limit hits to 10 per species - might this lead to misleading findings about gene family diversity? Info and justification for approach is required (411-412).

      We limited the hits to 10 per species to limit the influence of any one species on our alignments and subsequent phylogenetic trees. This 10-per-species cap was never reached with any search for STING or Viperin, but was used to throttle the number of Metazoan hits when searching for CD-NTases. Because of this, we probably have missed some amount of the diversity of Metazoan Mab21-like/OAS-like sequences, although this was not a focus of our manuscript. We have updated the text to be more clear about why we have included this limit and when the limit was invoked.

      We have update the text, which now reads:

      “HMM profiles were used to search EukProt via hmmsearch (also from hmmer v3.2.1) with a statistical cutoff value of 1e-3 and -hit parameter set to 10 (i.e. the contribution of a single species to the output list is capped at 10 sequences). It was necessary to cap the output list, as EukProt v3 includes de novo transcriptome assemblies with multiple splice isoforms of the same gene and we wanted to limit the overall influence a single species had on the overall tree. We never reached the 10 species cap for any search for STING or viperin homologs; only for the CD-NTases within Metazoa did this search cap limit hits.”

      1. The information in Supplementary Figure 3 is quite difficult to assess visually, but I think that is what is expected from that figure. However, this is an important underpinning element of the work and should really be quantitatively assessed. A metric of comparison of trees, with defined thresholds etc there are many out there, even a simple Robinson-Foulds test perhaps? Essentially - comparing the panels in Supplementary Figure 3 by eye is unreliable and in this case not possible given there are no labels. It would also be important to provide these full set of phylogenies generated and associated RF/other scores as supplementary file.

      We agree that this Supplementary Figure is difficult to assess by eye, however we feel that it is vital to show this data. Visually, we do feel like this figure conveys the idea that while individual branches may move around, the major clades/areas of interest are stable across the different alignments and tree builders. To increase robustness, we have included the weighted Robinson-Foulds test results into a new panel of this figure (Supplementary Fig. 3B).

      We have added a section to the methods on how this weighted Robinson-Foulds test was conducted:

      “Weighted Robinson-Foulds distances for Supp. Fig. 3B were calculated with Visual TreeCmp (settings: -RFWeighted -Prune trees -include summary -zero weights allowed)[83].”

      We added the weighted Robinson-Foulds data to Supplemental Fig. 3 and have updated the figure legend to reflect this new data. The new legend for Supp. Fig. 3B reads:

      “(B) The average weighted Robinson-Foulds distances all pairwise comparisons between the four tree types (MAFFT/MUSCLE alignment built with IQTREE/RAXML-ng). Although the distances were higher for the CD-NTase tree (as expected for this highly diverse gene family), all of the key nodes defining the cGLR, OAS, and eSMODS superfamilies, as well as their nearest bacterial relatives, were well supported (>70 ultrafast bootstrap value).”

      1. Does domain shuffling mean that phylogenetic reconstruction is less valid? How was the alignment performed in these cases to account for this.

      Thank you for bringing this up, this is a point we have now clarified in the text. Our searches, alignments, and trees are all of single protein domains, as typically only conservation within domains is retained across the vast distances between bacteria and eukaryotes. As such, domain shuffling should have no impact on the validity of that phylogenetic reconstruction. We have updated the text to be more clear about the scope of the alignments and searches. We made changes to our wording throughout the manuscript. One specific example of this is:

      “Using maximum likelihood phylogenetic reconstruction on the STING domain alone, we identified STING-like sequences from 26 diverse microeukaryotes whose STING domains clustered in between bacterial and metazoan sequences, breaking up the long branch.”

      Minor Comments: 10. I am not sure about the use of the term "truly ancestral" or variants thereof, same issues with "significant homology" and "inherited since LECA and possibly longer" .. these are awkwardly phrased. E.g. I think perhaps "homologous across the whole length" might be clearer, and elsewhere "present in LECA and possibly earlier" may be more fitting.

             We have updated the text for these phrases throughout the manuscript and have replaced them with more specific language.
      
      1. Line 75 - "Detecting" rather than discovering?

      We appreciate the suggestion. However, because many of these gene families have never been described in the eukaryotic lineages considered here, we think ‘discovering’ is more appropriate. Indeed, the eSMODS lineage demonstrates that our search approach has the power to find not just new homologs but to discover totally new subfamilies of these eukaryotic proteins.

      1. 132-133 - more justification is needed for the choice of bacterial genes.

      We have clarified that our selection of bacterial CD-NTases included every known CD-NTase at the time of our analysis. The text now reads:

      “As representative bacterial CD-NTases, we used 6,132 bacterial sequences, representing a wide swath of CD-NTase diversity[43]. To our knowledge, this dataset included every known bacterial CD-NTase at the time of our analysis.”

      1. For the downsizing from 6000 to 500 what were the criteria and thresholds.

      We have updated the text to include the PDA software options for downsampling.The text now reads:

      “We downsampled the CD-NTase bacterial sequences from ~6000 down to 500 using PDA software (options -k 500) on a FastTree (default settings) tree built upon a MAFFT (default parameters) tree, to facilitate more manageable computation times on alignments and tree construction.“

      1. How are you rooting your trees e.g. figure 2? Information is provided for Viperin but not others.

      We have updated the text to ensure that the root of every tree is specifically stated.

      1. In the results section on CD-NTases I think it would be best to place the second paragraph detailing the role of cGAS earlier in this section, perhaps after the first sentence.

      We have moved the second paragraph, which introduces cGAS, OAS, and the other CD-NTases to the beginning of the CD-NTase section.The first paragraph of the CD-NTase section of the results now reads:

      “We next studied the evolution of the innate immune proteins, beginning with cGAS and its broader family of CD-NTase enzymes. Following infections or cellular damage, cGAS binds cytosolic DNA and generates cyclic GMP-AMP (cGAMP)[32–35], which then activates downstream immune responses via STING [34,36–38]. Another eukaryotic CD-NTase, 2’5’-Oligoadenylate Synthetase 1 (OAS1), synthesizes 2',5'-oligoadenylates which bind and activate Ribonuclease L (RNase L)[39]. Activated RNase L is a potent endoribonuclease that degrades both host and viral RNA species, reducing viral replication (reviewed here[40,41]). Some bacterial CD-NTases such as DncV behave similar to animal cGAS; they are activated by phage infection and produce cGAMP[8,42,43]. These CD-NTases are commonly found within cyclic oligonucleotide-based anti-phage signaling systems (CBASS) across many bacterial phyla and archaea[8,27,43].”

      1. Is FASTtree really necessary to include as it will underperform in all instances? Removing that method and comparing the remaining two (i.e. IQTREE and RAXML) - what level of disagreement do you find between the 2 alignment and 2 tree building methods? The cases that disagree should also be detailed.

      We agree that FASTtree underperforms against IQTREE and RAXML and have eliminated those trees from the supplement. We initially had included FASTtree, as it still seems to be widely used in phylogenetic analyses within the recent papers on bacterial immune homologs, but we completely agree with the reviewer and have removed it. In addition, we have calculated and added in the average weighted Robinson-Foulds Distance to Supplemental Figure 3. Our manuscript focuses on features of the phylogenetic trees that were consistent across all the replicate methods. However, given the numerous sequences and high degree of divergence involved, there were many cases where individual branches shifted between the methods, e.g. if individual CD-NTases within bacterial clade G swapped positions with one another. The differences we observed between the trees were inconsequential to our overall conclusions.

      1. Again a structural point - the start to paragraph "To understand the evolutionary history of CD-NTases we used the Pfam domain PF03281 as a starting point", I don't know at this point why or how you have done this. The sentence seems a little premature. I would therefore suggest that you start that paragraph with your motivation, "In order to..." and then finish that paragraph with your sentence in quotes above which actually summarizes the paragraph.

      We have updated the text to clear up this paragraph (in addition to other structural changes in the CD-NTase section. The paragraph containing information about how we started the HMM searches for the CD-NTases now reads:

      “ To begin our sequence searches for eukaryotic CD-NTases, we used the Pfam domain PF03281, representing the main catalytic domain of cGAS, as a starting point. As representative bacterial CD-NTases, we used 6,132 bacterial sequences, representing a wide swath of CD-NTase diversity[21]. Following our iterative HMM searches, we recovered 313 sequences from 109 eukaryotes, of which 34 were metazoans (Supplemental Data and Fig. 1B). Within the phylogenetic trees, most eukaryotic sequences clustered into one of two distinct superfamilies: the cGLR superfamily (defined by clade and containing a Mab21 PFAM domain: PF03281) or the OAS superfamily (OAS1-C: PF10421) (Fig. 2A). Bacterial CD-NTases typically had sequences matching the HMM for the Second Messenger Oligonucleotide or Dinucleotide Synthetase domain (SMODS: PF18144).”

      1. Line 148 - "within" change to "before"?

      We have updated the text with this suggestion.

      1. Unclear from text as is whether you found any STING homologs in arthropods (~line 157). Please update the text for clarity. Would also suggest that "agreeing" should be replaced with "aligning".

      We found several STING homologs in arthropods and have updated the text to specifically note this. We also have updated the text as per the suggestion of using the term “aligning” instead of “agreeing”.The text now reads:

      “Almost half of these species (10/19) were arthropods, aligning with prior findings of STING sparseness among arthropods(Wu et al. 2014). We did find STING homologs in 8/19 arthropod species in EukProt v3, including the previously identified STINGs of Drosophila melanogaster, Apis mellifera and Tribolium castaneum(Wu et al. 2014; Margolis, Wilson, and Vance 2017).”

      1. Line 169 - If clade D is not a clade, maybe it should be called something different.

      Yes, unfortunate naming, isn’t it? Clade D is not a coherent clade in our results nor when it was first described, but we feel that for consistency with the rest of the field, it is best if we adhere to previously published nomenclature.

      1. Line 188-190 - In principle, max likelihood should be able to infer the right tree even with high divergence.

      Yes, we agree that maximum likelihood methods should be able to infer the correct tree. However, we are not sure what change the reviewer is suggesting here.

      1. Paragraph starting at 199 - eSMODS - always unknown function or mostly - could be important.

      To our knowledge the function of the two closest bacterial CD-NTases to the eSMODS group have an unknown function.

      1. For calling HGT you state that one of the criteria is that the euk and bac sequences branched near one another, what is "near" in this scenario?

      “Near” in this case refers to being adjacent on the phylogenetic tree. We have updated the text for clarity. The text now reads:

      “To minimize such false positive HGT calls, we took a conservative approach in our analyses, considering potential bacteria-eukaryote HGT events to be trustworthy only if: 1) eukaryotic and bacterial sequences branched adjacent to one another with strong support (bootstrap values >70); 2) the eukaryotic sequences formed a distinct subclade, represented by at least 2 species from the same eukaryotic supergroup; 3) the eukaryotic sequences were produced by at least 2 different studies; and 4) the position of the horizontally transferred sequences was robust across all alignment and phylogenetic reconstruction methods used (Supp. Fig. 3A).”

      1. In legends be specific about what type of support value, e.g. bootstrap or jack-knife.. I think it is always bootstrap but would be good to have that precision.

      Our phylogenetic trees only use bootstrap values for support and so have updated the figure legends and methods to provide this information. Apologies for this lack of clarity.

      1. Throughout the text if stating e.g. "clustered robustly and with high support" please provide the appropriate values.

      We have updated the text to provide bootstrap values when invoking statements about support. An example of this is:

      “There are two clades of Chloroplastida (a group within Archaeplastida) sequences that branch robustly (>80 ultrafast bootstrap value) within the bacteria clade.”

      1. It is unclear from the text how the animal origin of the TIR domain is supported (~line 274). Please provide necessary details to support your statements in the results section.

      Our phylogenetic tree of TIR domains (Supp. Fig. 7), places C. gigas’ TIR domain (of its STING protein) clusters with high support next to other metazoan TIR domains.

      We have updated the STING section to include these lines:

      “We also investigated the possibility that C. gigas acquired the TIR-domain of its TIR-STING protein via HGT from bacteria, however this analysis also suggested an animal origin for the TIR domain (Supp. Fig. 7), as the C. gigas TIR domain clustered with other metazoan TIR domains such as Homo sapiens TICAM1 and 2 (ultrafast bootstrap value of 75). Eukaryotic TIR-STINGs are also rare, further supporting the hypothesis that this protein resulted from recent convergence, where animals independently fused STING and TIR domains to make a protein resembling bacterial TIR-STINGs, consistent with previous reports[19].”

      1. Replace similar with -> similar "to"

      We have accepted the suggestion and replaced “with” with “to”.

      1. Line 266: It was previously shown .. or it is known but not "it was previously known"

      We have rephrased the sentence to be clearer: “Some eukaryotes like C. gigas…”.

      1. The last sentence in paragraph ~line 277: "Our work also identified a number of non-metazoan STINGS...." Please expand on this and provide some of the details on this finding in the text or point to the figure that supports the statement and provide a little more detail here.

      The intent of the words on line 277 was a summary of what we had previously discussed in the STING section. For clarity we updated the text, which now reads:

      “Interestingly the non-metazoan, blSTINGs (Fig. 3C) that are found in the Stramenopiles, Haptista, Rhizaria, Choanoflagellates and Amoebozoa have a TM-STING domain architecture similar to animal STINGs but a STING domain more similar to bacterial STINGs..”

      blSTINGs are discussed in more detail earlier in the STING section (specifically paragraph 3) where we say:

      “Using maximum likelihood phylogenetic reconstruction on the STING domain alone, we identified STING-like sequences from 26 diverse microeukaryotes whose STING domains clustered in between bacterial and metazoan sequences, breaking up the long branch. We name these sequences the bacteria-like STINGs (blSTINGs) because they were the only eukaryotic group of STINGs with a bacteria-like Prok_STING domain (PF20300) and because of the short branch length (0.86 vs. 1.8) separating them from bacterial STINGs on the tree (Fig. 3C). While a previous study reported STING domains in two eukaryotic species (one in Stramenopiles and one in Haptista) [19], we were able to expand this set to additional species and also recover blSTINGs from Amoebozoa, Rhizaria and choanoflagellates. This diversity allowed us to place the sequences on the tree with high confidence (bootstrap value >70), recovering a substantially different tree than previous work[19]. As for CD-NTases, the tree topology we recovered was robust across multiple different alignment and phylogenetic tree construction algorithms (Supp. Fig. 3A).”

      1. Line 294: it is unclear which are the orphan taxa -we are directed to figure 1 but there is no notation for orphan taxa here perhaps add something to the figure to make obvious which these are.

      We have updated the text to mention these orphan taxa specifically by name.

      The text now reads:

      “The 194 viperin-like proteins we recovered came from 158 species spanning the full range of eukaryotic diversity, including organisms from all of the major eukaryotic supergroups, as well as some orphan taxa whose taxonomy remains open to debate (Fig. 1, Ancyromonadida, Hemimastigophora, Malawimonadida).”

      1. Lines 340-341 - some redundant use of eukaryotic/eukaryotes

      We have updated the text to reduce redundancy.

      1. Lines 475-480 - some further detail needed - how were sequences trimmed to the TIR domain? - what were your starting sequences? etc.

      We have updated the text detailing how we acquired a set of proteins from Interpro and how we used hmmscan to determine the coordinates for the TIR domains in those proteins. We then isolated the TIR domains (using the coordinates defined by hmmscan) and proceeded to align those sequences

      The text now reads:

      “We used hmmscan to identify the coordinates of TIR domains in a list of 203 TIR domain containing-sequences from InterPro (all 203 proteins from curated “Reviewed” selection of IPR000157 (Toll/interleukin-1 receptor homology (TIR) domain as of 2023-04-04)) and 104 bacterial TIR-STING proteins (the same TIR-STING proteins used in Fig. 3)[3]. Next, we trimmed the sequences down to the hmmscan identified TIR coordinates and aligned the TIR domains with MUSCLE (-super5). We trimmed the alignments with TrimAL and built a phylogenetic tree with IQtree (-s, -bb 1000, -m TEST, -nt AUTO).”

      1. Check that the colour schemes for branches etc are detailed in the legends of supplementary as well as main.

      We have updated the text of figure legends to be more clear about our maintenance of the same color scheme throughout the manuscript. This involved ensuring that the following statement (or an equivalent statement) was present in the figure legends of Figures 2, 3, 4, S2, S3,S4,S5,S6, and S7:

      “Eukaryotic sequences are colored according to eukaryotic group as in Fig. 1B.”

      1. The threshold set for gaps is very strict at 0.2. This seems quite strict given the sequences are potentially quite highly divergent. What length are the alignments that you are using after trimming - these details need to be included and considered.

      We have updated the text to specifically detail how long our alignments were after trimming and how that post-trimming length compares to the length of the alignment for each PFAM group.

      Specifically, the text now reads:

      “The length of these final alignments were 232, 175, and 346 amino acids long for CD-NTases, STING, and viperin respectively. These alignments represent ≥75% of the length of alignment their respective PFAM domain (PF3281 (Mab-21 protein nucleotidyltransferase domain) for CD-NTases, PF20300 (Prokaryotic STING domain) for STING, and PF404055 (Radical SAM family) for viperin.”

      1. How were sequences downsampled with PDA? Line 424.

      We have updated the text to include the PDA settings that were used to downsample sequences. The text now reads:

      “To ensure the combined HMM did not have an overrepresentation of either bacterial or eukaryotic sequences, we downsampled the bacterial sequences and eukaryotic sequences to obtain 50 phylogenetically diverse sequences of each, and then combined the two downsampled lists. To do this, eukaryotic and bacterial sequences were each separately aligned with MAFFT (default parameters), phylogenetic trees were built with FastTree (v2.1.10)[77], and the Phylogenetic Diversity Analyzer (pda/1.0.3)[78] software with options -k 50 or -k 500 with otherwise default parameters was run the the FastTree files to downsample the sequences while maximizing remaining sequence diversity.”

      1. Please provide adequate descriptions for the materials in the supplementary files for the manuscript, they currently lack description. They are useful and we fully support their inclusion with sufficient information.

      We have expanded the descriptions of the provided supplementary files.

      1. The starting sequences, hmm pipeline and scripts would be great to include, apologies if we have missed them.

      We have added the starting bacterial sequences to the supplementary data, as well as the final HMMs, and the one script that we used in our analysis. All other software (including the included script) is freely and publicly available.

      Significance

      This study provides us with examples of instances where a medley of different mechanisms have resulted in the emergence of innate immune proteins across eukaryotes. The study is entirely bioinformatic in nature and provides some nice cases for future study. The thorough search strategies are to be commended. The limitations of the work are that we don't know whether the functions have also been conserved across deep time and/or in the independent events described. Nevertheless, this work contributes to a growing body of evidence on the complex, and sometimes shared, nature of the evolution of animal and bacterial immunity. I would classify this nice study as a conceptual advance of our understanding of the evolution of protein families through deep time and would imagine it is of interest to a broad audience of biologists from immunologists to evolutionary biologists and structural biologists.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Culbertson and Levin takes a bioinformatic approach to investigate the evolutionary origins/trajectories of three different proteins domains involved in innate immunity in both bacteria and eukaryotes: cGAS/CD-NTases, STING, and Viperins. To perform this analysis, the authors apply an iterative homology search model to the EukProt database of eukaryotic genomes. Their analysis finds that that eukaryotic CD-NTases arose from multiple horizontal gene transfer events between bacteria and eukaryotes. They also fill in an important gap in understanding how STING from bacteria evolved into modern human STING by identifying blasting in diverse eukaryotes. Finally, they determine that Viperins are an ancient protein family that likely existed in LECA, but found two more recent HGT events for proteins related in Vipirin.

      Major comments

      1. The hypothesis for the origin of STING via convergent domain shuffling could be handled with a little more care in the text. The authors show that homologs of STING from animals can also be found in the genomes of diverse eukaryotes outside the metazoa, demonstrating (1) STING and cGAS have had different histories, and (2) that these sequences are more bacteria-like than metazoan STING. However, in multiple places (the title, line 275, elsewhere) the term "convergence" could be misleading. "Convergence" leaves the reader with the impression that there is no common ancestor between the STING domain from bacteria and eukaryotes. I understand that the authors are using "convergent domain shuffling" to draw this distinction, but I'm unsure if a naïve reader will glean the distinction between domain shuffling and STING itself converging. I would argue that we simply cannot place eukaryotic STING and blSTING proteins on the tree of bSTING sequences. i.e. blSTING are no more related to bacterial TM-STING than bacterial TIR-STING (likely the missing bSTING sequences are simply extinct?). Can the authors curate their language to state more simply that STING likely arose through horizontal gene transfer, but it is unlikely that bacterial TM-STING is the unequivocal progenitor?

      We thank the reviewer for this comment, and we absolutely agree that we should be clearer about the distinction between convergence and convergent domain shuffling. We have changed the title and edited the text to increase clarity. In addition, we have clarified what our data does and does say about the evolutionary history of STING. We feel that our STING tree (Fig.3 C), due to a general sparseness of eukaryotic and bacterial sequences, is insufficient to confidently call if eukaryotes acquired STING by HGT or if STING was present in the LECA.

      We have added the following to clear up this issue:

      “Overall, the phylogenetic tree we constructed (Fig. 3C) suggests that there is domain-level homology between bacterial and eukaryotic STINGs, but due to sparseness and lack of a suitable outgroup, this tree does not definitively explain the eukaryotic origin of the STING domain. However, the data does clearly support a model in which convergent domain shuffling in eukaryotes and bacteria generated similar TM-STING and TIR-STING proteins independently.”

      Minor Comments

      1. Spelling error in Figure 3B and 3C: "cannoical"

      Thanks, we have corrected this error.

      1. Figure 5 could be improved to more clearly articulate the findings of the manuscript. In A, it's unclear how OAS relates to Mab21 and a reader not paying close attention might think that OAS was part of the gene duplications after Mab21 was acquired. The LECA origins of OAS are also not presented (albeit, these are still defined in the legend). In B, this panel would suggest that there was not horizontal transfer of STING from bacteria to eukaryotes but rather both domains of life received STING from a separate source. My understanding is STING did likely arise in bacteria, however, the assumption that extant TM-STING in bacteria is the predecessor of TM-STING in eukaryotes is not well supported. Similarly for the TIR domain.

      We have updated Fig. 5 to more clearly show that OAS was likely in the LECA and that eSMODS and cGLRs were HGT’d from bacteria to other eukaryotic lineages. For STING, it was not our intent to imply that the extant TM-STING in bacteria is the predecessor of TM-STING in eukaryotes, and we agree with the reviewer that this is unlikely. Although we do not have sufficient data to speak to the origin of the STING domain itself, we do feel confident in our evidence of domain shuffling. Our illustration in Fig 5B was meant to correspond to the following statement: “Drawing on a shared ancient repertoire of protein domains that includes STING, TIR, and transmembrane (TM) domains, bacteria and eukaryotes have convergently evolved similar STING proteins through domain shuffling.” We believe this inference valid and best describes our results for STING.

      1. Line 119: While the role of Mab21L1-2 are established for development, I'm unaware of a role for MB21D2 in development (or any other phenotype).

      We agree with the reviewer that MB21D2 has not been shown to have any phenotype and have corrected the wording to clarify this point.

      The line now reads “However, the immune functions of Mab21L1 and MB21D2 remain unclear, although Mab21L1they has been shown to be important for development[29–31].”

      1. Line 210: "Gamma" should be "genes"

      We have corrected this error and replaced the word.

      Reviewer #3 (Significance (Required)):

      This work is of high quality, is timely, and will have a large impact on shaping the field. The origins and evolution of antiviral immunity from bacteria to eukaryotes have been investigated from multiple angles. While the phylogeny and evolutionary trajectory of these genes have been traced in bacteria, there have been relatively fewer analyses across diverse (non-metazoan) eukaryotes. For this reason, I am confident that this manuscript will help future researchers select homologs for investigation and guide similar analyses of other bacterial defense systems.

      A particular challenge of this work is accounting for gene loss across taxa and weighing that possibility against horizontal gene transfer. The authors are conservative in their conclusions and well-reasoned. The comments I have can be addressed with changes to the writing and emphasis of certain points.

      I expect these findings to be of interest to a broad audience of evolutionary biologists, microbiologists, and immunologists.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Describe your expertise? Molecular Evolution, Mechanisms of Protein evolution, Phylogenomics, Adaptation.

      Summary: This manuscript broadly aims to improve our understanding the evolutionary relationships between eukaryote and bacterial protein families where members of those families have immune roles. The study focusses on three such families and samples deeply across the eukaryotic tree. The approaches taken include a nice application of the EukProt database and the use of homology detection approaches that are sensitive to the issues of assigning homology through deep time. The main findings show the heterogeneity in means by which these families have arisen, with some of the families originating at least as far back as the LCA of eukaryotes, in contrast the wide spread yet patchy distribution of other families is the result of repeated independent HGT events and/or convergent domain shuffling.

      Major Comments:

      1. Overall the level of detail provided throughout the manuscript is lacking, perhaps the authors were constrained by a word limit for initial submission, if so then this limit needs to be extended to include the detail necessary. In addition, there are some structural issues thorughout, e.g. some of the very brief intro (see later comment) reads a little more like methods (paragraph 2) and abstract (paragraph 3). The results section is lacking detail of the supporting evidence from the clever analyses that were clearly performed and the statistics underpinning conclusions are not included.
      2. The intro and discussion both include statements about some recent discoveries that bacteria and mammals share mechanisms of innate immunity - but there is no further detail into what would appear to be important work leading to this study. This context needs to be provided in more detail therefore I would encourage the authors to expand on the intro to include specific detail on these significant prior studies. In addition, more background information on the gene families investigated in detail here would be useful e.g. how the proteins produced influence immunity etc should be a feature of the intro. A clear and concise rationale for why these 3 particular gene families (out of all the possible innate immune genes known) were selected for analysis.
      3. Context: Genome quality is always a concern, and confirming the absence of an element/protein in a genome is challenging given the variation in quality of available genomes. Low BUSCO scores mean that the assessment of gene loss is difficult to evaluate (but we are not provided with said scores). Query: in the results section it states that the BUSCO completeness scores (which need to be provided) etc were insufficient to explain the pattern of gene loss. I would like to know how they reached this conclusion - what statistical analyses (ANOVA?? OTHER??) have been performed to support this statement and please include the associated P values etc. Similarly, throughout the paper, including in the discussion section, the point is brushed over. If, given a statistical test, you find that some of the disparity in gene presence is explained by BUSCO score, most of your findings are still valid. It would just be difficult to make conclusions about gene loss.
      4. In terms of the homolog search strategy - line 394 - can you please state what an "outgroup gene family" means in this context. It is unclear but very important to the downstream interpretation of results.
      5. For reproducibility, the materials and methods section needs to provide more detail/sufficient detail to reproduce these results. E.g the section describing phase 1 of the euk searches the text here repeats what is in the results section for the crystal structure work but doesn't give me any information on how, what method was used to "align the crystal structures", what scoring scheme is used and how the scoring scheme identifies "the core"? What specific parameters are used throughout. Why is MAFFT the method of choice for some of the analyses? Whereas, in other cases both MAFFT and MUSCLE are employed. What are the specific settings used for the MAFFT alignments throughout - is it default (must state if that is the case) or is it MAFFT L-INS-I with default settings etc.
      6. The justification for the number of HMM searches needs to be included. The choice of starting points for the HMMs was cryptic - please provide details. It is likely that you ran the search until no more sequences were found or until sequences were added from a different gene family, and that these happened to be between 3 and 5 searches, but it reads like you wanted to run it 3 or 5 times and that corresponds to the above condition. Something like this would be clearer: "The profile was [...] until no more sequences were found or until sequences from other gene families were found which was between 3 and 5 times in all cases" - the same is true of figure 1.
      7. Why do you limit hits to 10 per species - might this lead to misleading findings about gene family diversity? Info and justification for approach is required (411-412)
      8. The information in Supplementary Figure 3 is quite difficult to assess visually, but I think that is what is expected from that figure. However, this is an important underpinning element of the work and should really be quantitatively assessed. A metric of comparison of trees, with defined thresholds etc there are many out there, even a simple Robinson-Foulds test perhaps? Essentially - comparing the panels in Supplementary Figure 3 by eye is unreliable and in this case not possible given there are no labels. It would also be important to provide these full set of phylogenies generated and associated RF/other scores as supplementary file.
      9. Does domain shuffling mean that phylogenetic reconstruction is less valid? How was the alignment performed in these cases to account for this.

      Minor Comments:

      1. I am not sure about the use of the term "truly ancestral" or variants thereof, same issues with "significant homology" and "inherited since LECA and possibly longer" .. these are awkwardly phrased. E.g. I think perhaps "homologous across the whole length" might be clearer, and elsewhere "present in LECA and possibly earlier" may be more fitting.
      2. Line 75 - "Detecting" rather than discovering?
      3. 132-133 - more justification is needed for the choice of bacterial genes.
      4. For the downsizing from 6000 to 500 what were the criteria and thresholds.
      5. How are you rooting your trees e.g. figure 2? Information is provided for Viperin but not others.
      6. In the results section on CD-NTases I think it would be best to place the second paragraph detailing the role of cGAS earlier in this section, perhaps after the first sentence.
      7. Is FASTtree really necessary to include as it will underperform in all instances? Removing that method and comparing the remaining two (i.e. IQTREE and RAXML) - what level of disagreement do you find between the 2 alignment and 2 tree building methods? The cases that disagree should also be detailed.
      8. Again a structural point - the start to paragraph "To understand the evolutionary history of CD-NTases we used the Pfam domain PF03281 as a starting point", I don't know at this point why or how you have done this. The sentence seems a little premature. I would therefore suggest that you start that paragraph with your motivation, "In order to..." and then finish that paragraph with your sentence in quotes above which actually summarises the paragraph.
      9. Line 148 - "within" change to "before"?
      10. Unclear from text as is whether you found any STING homologs in arthropods (~line 157). Please update the text for clarity. Would also suggest that "agreeing" should be replaced with "aligning".
      11. Line 169 - If clade D is not a clade, maybe it should be called something different.
      12. Line 188-190 - In principle, max likelihood should be able to infer the right tree even with high divergence.
      13. Paragraph starting at 199 - eSMODS - always unknown function or mostly - could be important.
      14. For calling HGT you state that one of the criteria is that the euk and bac sequences branched near one another, what is "near" in this scenario?
      15. In legends be specific about what type of support value, e.g. bootstrap or jack-knife.. I think it is always bootstrap but would be good to have that precision.
      16. Throughout the text if stating e.g. "clustered robustly and with high support" please provide the appropriate values.
      17. It is unclear from the text how the animal origin of the TIR domain is supported (~line 274). Please provide necessary details to support your statements in the results section.
      18. Replace similar with -> similar "to"
      19. Line 266: It was previously shown .. or it is known but not "it was previously known"
      20. The last sentence in paragraph ~line 277: "Our work also identified a number of non-metazoan STINGS...." Please expand on this and provide some of the details on this finding in the text or point to the figure that supports the statement and provide a little more detail here.
      21. Line 294: it is unclear which are the orphan taxa -we are directed to figure 1 but there is no notation for orphan taxa here perhaps add something to the figure to make obvious which these are.
      22. Lines 340-341 - some redundant use of eukaryotic/eukaryotes
      23. Lines 475-480 - some further detail needed - how were sequences trimmed to the TIR domain? - what were your starting sequences? etc.
      24. Check that the colour schemes for branches etc are detailed in the legends of supplementary as well as main.
      25. The threshold set for gaps is very strict at 0.2. This seems quite strict given the sequences are potentially quite highly divergent. What length are the alignments that you are using after trimming - these details need to be included and considered.
      26. How were sequences downsampled with PDA? Line 424.
      27. Please provide adequate descriptions for the materials in the supplementary files for the manuscript, they currently lack description. They are useful and we fully support their inclusion with sufficient information.
      28. The starting sequences, hmm pipeline and scripts would be great to include, apologies if we have missed them.

      Significance

      This study provides us with examples of instances where a medley of different mechanisms have resulted in the emergence of innate immune proteins across eukaryotes. The study is entirely bioinformatic in nature and provides some nice cases for future study. The thorough search strategies are to be commended. The limitations of the work are that we don't know whether the functions have also been conserved across deep time and/or in the independent events described. Nevertheless, this work contributes to a growing body of evidence on the complex, and sometimes shared, nature of the evolution of animal and bacterial immunity. I would classify this nice study as a conceptual advance of our understanding of the evolution of protein families through deep time and would imagine it is of interest to a broad audience of biologists from immunologists to evolutionary biologists and structural biologists.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thanks the reviewers for their critique of our report and our responses to all of their comments are given below.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary Toxoplasma gondii is an obligate intracellular parasite. Intracellular survival critical depends on secretory vesicles named dense granules. These vesicles are predicted to contain >100 different proteins that are released into PV, PV membrane and the host cell to control the parasites intracellular environment and host cell gene expression and immune response. How and where these vesicles are released from the parasite is a long-standing question in the field because T. gondii, and other apicomplexan parasites contained a complex pellicular cytoskeletal structure called the IMC which limits dense granule access to the plasma membrane. In this manuscript by Chelaghma, Ke and colleagues demonstrates for the first time that dense granules are secreted from the parasite at pore structures called the apical annuli. The authors used their previously generated HyperLOPIT data set and identified a plasma membrane protein that is specifically enriched at the apical annuli. Using BioID the authors then identify three SNARE proteins that also localize at the apical annuli. The localization of these proteins is determined using excellent super-resolution structured illumination microscopy. Conditional protein knockdowns for all four proteins were created and both proteomics and microscopy used to demonstrate a reduction in dense granule secretion in the absence of these proteins. Collectively, these data make new and substantial contributions to our understanding of mechanisms of dense granule secretion. Major comments: Overall, these data is convincing and well-described. The text is clear and well written. There are a few instances (see below) where the authors doesn't adequately describe the data or over state the strength of the results. These issues could all be addressed editorially or by process existing data.

      Comment 1.1

      The authors use proteomics and IFA to show that there is a reduction (rather than an inhibition of) in dense granule secretion. However, from the phase images in figure 5, the vacuoles of KD parasites look normal and so not have the phenotypes that one would expect after a significant reduction in dense granule secretion, such as the "bubble" phenotype described for GRA17 and GRA23 knockouts (Gold et al 2015; PMID: 25974303). Authors should describe their findings in the context of the expected phenotypes based on the published literature. The statement on line 369-371 is too strong and should imply a reduction rather than an inhibition of dense granule secretion.

      Authors’ response: It is difficult to compare our results to individual dense granule protein mutants described in the literature because such phenotypes are the result of the loss of only a single protein being exported to the host, whereas we are observing the effects of the reduction of secretion of up to 120+ different proteins. Furthermore, we agree with this reviewer that none of the protein knockdowns appear to completely prevent dense granule secretion, which we implied by ‘inhibition’, and this could be either due to incomplete knockdown of each of these proteins with some residue function, or some redundancy where other proteins can contribute to secretion. We have changed the statement flagged by this reviewer to: ‘Depletion of all four of these proteins affects dense granule secretion*’ to avoid the interpretation of complete loss of function. We now further state that residual secretion may still occur and consider this in the light of possible reasons for this (Discussion, paragraph 4). In any case, none of these considerations change our conclusion that these proteins, at the site of the apical annuli, are implicated in dense granule secretion. *

      __Comment 1.2 __

      The more severe phenotype observed in the AAQa iKD and the additional localizations of AAQa and AAQc suggests an additional role for these protein in protein trafficking that is supported by the authors data. In both AAQa and AAQc there appears to be an accumulation of GRA1 in a post-Golgi compartment and is less vesicular in appearance than the phenotype observed in the AAQb iKD parasites. Additionally, I disagree with the authors assessment that KD of these proteins does not effect microneme localization. In both AAQa and AAQc there appears to be increased number of micronemes at the basal end of the parasites compared with controls. Although this is not a direct focus of the authors papers, a description of these findings should be included in the results and discussion sections.

      Authors’ response: We have included a more complete discussion that considers the differences in phenotypes of the four mutants, including additional locations of two SNAREs, all of which is consistent with known SNARE biology (Discussion, fourth paragraph). These considerations, however, have no impact on our conclusions where all four proteins, including two that are exclusive to the apical annuli, have equivalent effects on dense granule exocytosis.

      Concerning the effects on microneme and rhoptries of the different knockdowns, we have modified and limited our interpretation to overall IFA staining strength and protein organelle protein abundance by proteomics, where we see no differences. This addresses if there is a major post-Golgi trafficking defect that could affect biogenesis of all of micronemes, rhoptries and dense granules, for which we see no evidence. Whether there are subtle differences in the location of these organelles, which are known to show some variability, is beyond the scope or relevance to our central questions. Given that growth phenotypes are seen for all mutants, it is quite possible that secondary effects of retarded cells might present as some disorder within the cell, although we saw nothing conspicuous of this nature in many hundreds of examples observed.

      __ Comment 1.3__

      Presentation of the data in Figure 5. This figure contains images where the fluorescent dense granule signal is overlaid on phase images. However, in some cases (AAQb, AAQc, AAQa, GRA1 KD) the merged imaged looks like a straight merges of the two images, whereas in the rest of the images it looks like a thresholded fluorescent image is merge with phase. Authors need to process the images in consistent manner and provide a description of the image processing in the figure legend and materials and methods.

      Authors’ response: Thank you for this suggestion, we have now processed all of these merges the same way (ImageJ -> merge channels -> Composite Sum). While the merges are only intended to aid in aligning the fluorescence signal with the phase image, we agree that it is better to present them the same way.

      Minor comments:

      Comment 1.4

      The discussion is overly long and could be shorted in some places. Lines 373 and 388 in particularly don't seems directly relevant to the manuscript.

      Authors’ response: The paragraph identified by this reviewer considers the LMBD protein that is the first, and currently only, trans plasma membrane protein specific to the apical annuli that implies that this structure is exposed to the exterior of the cell. It is, therefore, of considerable significance to how we interpret the function and behaviour of these annular structures. We believe that it is very relevant to our study to consider what else is known about these relatively mysterious, but widely conserved, eukaryotic proteins, which is the subject of this paragraph. The other reviewers highlight the relevance of LMBD3 to the interpretation of this structure. This reviewer hasn’t identified any further superfluous discussion elements, and we believe that the current length is not excessive and is justified.

      Comment 1.5

      Line 184 - Remove question mark from this sentence

      Authors’ response: The question mark has been removed.

      Comment 1.6

      Line 321. Should read Figure 7A, not figure 6A.

      Authors’ response: Thank you, corrected.

      Comment 1.7

      Line 139 - should read Figure 1B instead of 2C

      Authors’ response: Thank you, corrected (although to 1C, which is in fact correct).

      Comment 1.8

      Figure 3- Column labels for early, mid, or late endodyogeny would help with the clarity of this figure, especially for readings who are unfamiliar with the field.

      Authors’ response: We have labelled the figure as suggested.

      Comment 1.9

      Figure S2 - the letter n is missing from knockdown labels. And the number 3 from LMBD 3 is covering the word knockdown in the last panel.

      Authors’ response: Thank you, corrected.

      Reviewer #1 (Significance (Required)):

      The manuscript provides, for the first time, insight into the mechanism of dense granule secretion in Toxoplasma and identifies the sites on parasite pellicle where these vesicles can traverse the IMC to reach the plasma membrane. This is a significant conceptual advance in our understanding of this cellular vital process, one that is required for T. gondii intracellular survival. This paper would have broad interest from other research groups studying parasitology, secretion and protein trafficking.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript reports on characterizing the function of the long-known apical annuli, which are pores embedded in the membrane skeleton of Toxoplasma gondii. Since their function has remained long elusive, this manuscript is a major breakthrough.

      Comment 2.1

      It is of note, however, that this breakthrough, using the same three SNAREs, was recently, in parallel, also reported by Fu et al in PLoS Pathogens (PMID 36972314), which work is cited here. The additional novelty here is the finding of LMDB3 in the plasma membrane at the site of the annuli. This is a widely conserved protein for which little function is known except roles in signaling, The connection between LMDB3 and the SNAREs is through BioID, but they are preys quite far down the list. Furthermore, the function of LMDB3 is not explored here. As such, the additional advance compared to the Fu et al report is limited. The function of the SNAREs in dense granule exocytosis is much more robustly done here through the proteomics data displaying an accumulation of DG proteins.

      Authors’ response: While it is true that the discovery of the three SNAREs at the apical annuli was made and reported in parallel by Fu et al (2023), a major difference in their conclusions is that they suggest that dense granules are not secreted at this site (this reviewer has mistakenly thought that this was their conclusion — “In our experiments, none of the SNAREs were shown to be related to the exocytosis of GRAs. Therefore, the mechanism that mediates exocytosis of GRAs at the plasma membrane remains to be elucidated.” Fu et al (2023)*). The failure of Fu et al to detect this was almost certainly because they only tested for dense granule secretion defects by inducing depletion of the apical annuli SNAREs after the parasites had invaded the host cells. It is known that dense granule protein secretion happens rapidly in the initial moments after invasion, so apical annuli perturbation in their assay would have only occurred after these secretion events. We directly discuss this experimental difference in our revised discussion and how it accounts for their different conclusions (Discussion, fourth paragraph). We independently tested for this effect by quantitative proteomics which further supported our conclusions. *

      As this reviewer indicates, we additionally discovered that a protein (LMBD3) also spans the plasma membrane at these structures, and this implicates signalling or events at the cell surface. We show that this protein is also required for normal dense granule secretion. While we have not identified an explicit mechanistic role for LMBD3 in this process, such insight is also lacking for all LMBD proteins, including those in humans where they are implicated in disease. While we continue to pursue this interesting question of LMBD3 function, we are by no means alone in cell biology for these answers to be outstanding still.

      Comment 2.2

      The presentation of the data is very clean and convincing, and the broader evolutionary context is well-presented as well. The discussion on whether maintaining the IMC during cell division is an innovation or ancestral is an open debate where the authors seem to come down on the side of innovation, but the evidence could go either way, so I would caution a bit more.

      Authors’ response: We are puzzled by this reviewer’s comment because we do not make reference to the maintenance of the IMC during cell division in this evolutionary context — ancestral or a recent innovation. We describe the case of Toxoplasma and its close relatives maintaining the maternal IMC during division as ‘unusual’, not ancestral (second sentence of the last paragraph of the Discussion), and this is the only statement that we think might have elicited this query from the reviewer. But this does not imply what the ancestral state might have been which is not a subject of any of our considerations here.

      Major comments: - Are the key conclusions convincing?

      Comment 2.3

      The identification of the three SNARE proteins through BioID is not very convincingly represented in Table S1. These SNAREs were not showing significant changes and were not detected universally across the three bio-reps, and thyn were also present in the controls. Although this does not diminish the message of the work, this appears to be quite Cherry-picked, while other top hits in the BioID were overlooked, e.g. Nd6 and Nd2 are right in the top ten, which have a demonstrated role in rhoptry exocytosis. This certainly piqued my interest, but is not even discussed.

      *Authors’ response: We have used BioID as a protein discovery strategy, not to directly measure protein proximity for which it is an imperfect measure for many technical reasons. Accordingly, discovered ‘candidates’ for proteins that might occur at the annuli were all independently verified by protein reporter tagging. We focused our efforts on discovering apical annuli plasma membrane-tethered proteins and, therefore, parsed our BioID data for those shown previously to be in the plasma membrane by LOPIT spatial proteomics (Barylyuk et al, 2020). It is true that the SNARE proteins were not favoured over many other proteins in the BioID signal, but their verified location at these sites justified our pursuit of them as new apical annuli proteins. *

      Other proteins, including the previously identified apical proteins Nd6 and Nd2 that are implicated in rhoptry secretion, similarly piqued our interest! But when we reporter-tagged them they were revealed as BioID false positives, consistent with published work on these proteins, and other ‘top hits’ included some other false positives. Table S1 is included as a further recourse for the field, but it only served as a first step in functional protein discovery in our study.

      Comment 2.4

      TgAAQa, TgAAQb and TgAAQc were recently also reported to localize to the annuli by Fu et al 2023 (PMID: 36972314; this report is even cited in this manuscript for Rab11a accumulation), who gave them different names: TgStx1, TgStx20, and TgStx21 (not in this order). I see no reason to adopt a new nomenclature here, which will be very confusing in the future literature. Please adopt the Stx names in this manuscript.

      *Authors’ response: We agree that where there is precedent in naming it is better to use the earliest used names. Naming of proteins is also best done to reflect orthologues found between species so that consistent names indicate common functions. The naming system proposed by Fu et al for the Qa, Qb and Qc SNAREs unfortunately does not fulfil this second important criterion. They based their names on ‘Syntaxin’ which was first used for an animal SNARE of the nervous system that is almost exclusively used for Qa paralogues. Furthermore, in animals Stx1-4 are all vertebrate-specific Qa paralogues that have arisen only in this group. So, to name the Qa SNARE of Toxoplasma according to one of these animal-specific nerve proteins (Stx1) implies an evolutionary inheritance that is very unlikely (i.e., lateral gene transfer from an animal) and is unsupported by published phylogenies. Furthermore, Fu et al also give the Qb and Qc SNAREs the animal Qa name ‘syntaxin’, and arbitrarily number them Stx21 and Stx20. So, while they have named these proteins first, we think that the names given provide confusing and misleading labels for these proteins. *

      We initially proposed a simpler system according to the location of the SNARE in Toxoplasma (AA = Apical Annuli) and the Q domain type (Qa, Qb, Qc), e.g., AAQa. But on reflection we propose using precedent and orthology and adopt the existing orthologue names as the most useful solution. Klinger et al (2022) have resolved the phylogeny of the three Toxoplasma SNAREs, and they group with strong phylogenetic support with known eukaryote-wide orthogroups with previous names: Qa=StxPM (Syntaxin Plasma Membrane); Qb=NPSN (Novel Plant ‘Syntaxin’); and Qc=Syp7 (a Qc SNARE family originally thought to be specific to plants). These SNARE types are all known to operate at the plasma membrane, and accordingly the names TgStxPM, TgNPSN, and TgSyp7 would indicate their orthology and similar functional location known in other eukaryotes. We have justified this preferred naming system in the text of our report (Discussion, third paragraph), but making it clear which Fu et al names correspond to these more universally consistent names so that these can be easily cross-referenced.

      Comment 2.5

      No knock-down of LMBD3 is pursued: how would this impact SNARE distribution and/or other annuli proteins? The fitness score is very severe, -4.07, so this is somewhat puzzling. Lower comment is related. This could provide tantalizing insights in the architecture of the annuli, and/or their function as a secretory conduit.

      LMBD3 relative to the SNAREs is not explored: co-IPs or detergent extraction to see if they are all in a physically interacting complex. What keeps them together. Is LBCDR3 interfacing with any annuli proteins Cen2 is suggested through the image in Fig 2A, though there appears to be some separation in some images: AAP2, 3 and 5 were previously shown to have smaller diameters than Cen2 and therefore appear better positioned.

      Authors’ response: LMBD3 knockdowns were pursued in so far as identifying that they also have a phenotype of reduced dense granule secretion as for the SNAREs, but it will indeed require further studies of this intriguing molecule to define its specific function. Our central questions of this study were what is the association of the apical annuli with respect to the IMC and plasma membrane, and what is the overall significance and function of these structures. These core questions have been answered in our study. The questions that this review raises here are further and logical questions specifically related to LMBD3 that we are now pursuing as an independent follow-on study.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Comment 2.6

      The discussion on whether maintaining the IMC during cell division is an innovation or ancestral is an open debate where the authors seem to come down on the side of innovation, but the evidence could go either way, so I would caution a bit more.

      Authors’ response: This comment (2.2) is already made and addressed above.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Comment 2.7

      The heavy focus on the LMBD3 in Fig 1 and the evolutionary discussion would warrant a more direct functional dissection. Either through an LMDB3 known-down, or its interface with the SNAREs or annuli more directly.

      Authors’ response: This reviewer has not made it clear that further work on LMBD3 is necessary to support the conclusions of the paper or address the questions that we have asked, only that they would like to see more insight into LMBD3. We would also! But we do present knock-down studies and show that there are functional consequences for dense granule secretion. The question of if LMBD3 is involved in the maintenance of apical annuli structure and/or integrity is an interesting one, but a further question to those that we have presented in this first study. LMBD proteins have poorly characterised molecular functions throughout eukaryotes, and while we are also motivated to understand their role more, this has not proven a straightforward task in other systems also.

      Comment 2.8

      The claim that the annuli are the conduits though which the dense granules travel to get exocytosis is not directly supported by any of the experiments as it is solely based on co-localization studies, not even direct interactions.

      Authors’ response: We agree that we have not directly observed dense granules in the act of secretion at the apical annuli. Dense granules are known to be very mobile in the cell and traffic dynamically on actin networks. So, they do not accumulate at any one site, and their fusion and exocytosis is likely a rapid, transient event. Multiple lines of evidence for them pausing and fusing with the plasma membrane, while indirect, independently support this conclusion:

      • SNARE proteins restricted to the apical annuli in the plasma membrane are required for normal dense granule secretion
      • When these SNAREs are depleted dense granule proteins accumulate in the parasite
      • Rab11A is a further vesicle-tethering molecule that has been shown to be attached to dense granules and its mutation also leads to inhibition of dense granule proteins (Venugopal et al, 2020)
      • When the apical annuli SNAREs are depleted Rab11A accumulates at the annuli (Fu et al, 2023) Collectively, we believe that the claim that the apical annuli are the sites of dense granule secretion is very strongly supported, particularly by the very molecules that would be required for vesicle docking and fusing at these sites, and is justified to be noted in the title. We have, however, made it clear in our report now that these data are indirect and that dense granules are yet to be captured in the act of secreting their contents at these sites (Discussion, paragraph five).

      **Referees cross-commenting**

      The consolidating themes I see (and value) in the reviews:

      Comment 2.9

      1. functional follow up of role of LMDB3 Authors’ response: This work is already part of a follow-up project.

      Comment 2.10

      adopt nomenclature of Fu et al, to avoid confusion in literature

      Authors’ response: Please see our response to Comment 2.4

      Comment 2.11

      better integrate the findings in light of the Fu et al publication throughout this manuscript

      *Authors’ response: We have further acknowledged and compared our findings to those of the parallel study of Fu et al with additional text in the discussion. *

      Comment 2.12

      no direct evidence of dense granules at annuli; attenuate the claims (in title etc), or include supportive data

      Authors’ response: Please see our response to the equivalent Comment 2.8 above.

      Reviewer #2 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Comment 2.13

      The presented manuscript reports on a novel protein, LMBD3, embedded in the plasma membrane of Toxoplasma gondii at the site of the apical annuli, which are pores across the inner membrane complex (IMC) skeleton. This provides a novel, putative connection between the cytoplasm and plasma membrane, although this is not directly explored here. Through LMDB3 proximity biotinylation, three SNAREs are identified that were recently reported to be involved in dense granule exocytosis, which is is confirmed here through robust proteomic experiments.

      Authors’ response: This reviewer has made an error here in stating that the parallel study of Fu et al implicated the apical annuli SNAREs with dense granule exocytosis. See our response to Comment 2.1 where we describe why the experimental design used for Fu et al was unlikely to test this question effectively.

      • Place the work in the context of the existing literature (provide references, where appropriate). The annuli were first reported in 2006, and understanding of their proteomic composition has expanded over the years, however, a function has remained long elusive. This report, together with another parallel performed work, now uses three SNAREs, named TgAAQa, TgAAQb and TgAAQc in this report but previously named TgStx1, TgStx20, and TgStx21 (not in this orthologous order), localizing to the annuli as tool to assign the function of the annuli to exocytosis of the dense granules during intracellular parasite multiplication. The evolutionary context and concepts of the new findings are very well-embedded in the existing literature and insights.

      • State what audience might be interested in and influenced by the reported findings. The audience comprises people with a specific interest beyond apicomplexan biology, basically all Alveolates as they all share a similar membrane skeleton. Assigning a putative function to widely conserved LMBD3 will be of high interest to this completely different audience as well.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In the submitted work "Apical annuli are specialised sites of post-invasion secretion of dense granules in Toxoplasma", the authors explore the role of the apical annuli in T. gondii. They identify a number of proteins that localize to the membranes at the annuli, including SNARE proteins that are known players in vesicle fusion. They also shown that knockdown of several annuli localized proteins blocks replication and secretion of dense granule cargo into the parasitophorous vacuole. Overall, the work is well done and an important contribution to the field.

      Major comments

      Comment 3.1

      1. In the title and throughout the manuscript the authors claim that the apical annuli are sites of dense granule secretion (e.g. "firmly implicating the apical annuli as the site of dense granule docking and membrane fusion." or "that the apical annuli are sites of vesicle fusion and exocytosis"). However, there does not appear to be direct evidence of the dense granules docking and fusing at these sites.

      It would be ideal to see vesicles docked via EM at the annuli, either in wildtype or knockdown parasites. This may not be possible - if not, I recommend toning down the conclusions on docking (or "specialized sites of secretion" as this has not been shown) and instead stating that these structures play a critical role in dense granule secretion. Authors’ response: Please see our response to Comments 2.8 & 2.12, and we have toned down this conclusion as requested to make it clear that direct observations of dense granule fusion are yet to be made. Capturing the transient event of dense granule docking by EM would indeed be a very challenging ambition.

      Comment 3.2

      The authors should discuss earlier (in the results) the findings of Fu et al. which:

      Authors’ response: The parallel study of Fu et al (2023) has indeed generated some similar data, but there are also multiple points of difference including their conclusions. We discuss all of these relevant points in the Discussion, and believe that it would make the Results narrative confusing to introduce this element of discussion there. Our study has not been performed in response to theirs, but rather was conducted in parallel.

      • show the localization of some of the same SNAREs at the apical annuli. Fu et al also see localization to the plasma membrane separate from the annuli for some of these proteins. Do you see plasma membrane spots as well upon longer exposures? Can differences be explained by the position or type of tag used?

      Authors’ response: Fu et al have indeed used different reporters and expressed the SNARE fusion proteins with different non-native promoters. They used a very bulky reporter which combined 12 HA tags as well as the large Auxin-Inducible Degron (AID), and together it is possible that they observe some mistargeting artefacts. For our location studies we used the small epitope 3xV5 only. We did not see the additional locations that they report, and this may be due to the larger modification that they made to these proteins.

      • Fu et al also shows similar plaque defects in the knockdowns and loss of trafficking of plasma membrane proteins to the periphery. In general, the studies from this group are very complementary - they should be better acknowledged.

      Authors’ response: We have included more frequent reference and comparison to the Fu et al study now in our Discussion.

      • Fu et al see an invasion defect but no defect in GRA secretion - Do you see an invasion defect? These differences should be discussed

      Authors’ response: See our response to Comments 2.1 & 2.13 regarding why the Fu et al could not detect the GRA secretion defect. We discuss this in our Discussion now (Discussion paragraph four). We also consider the Fu et al study of an invasion defect as flawed. Both our and their study show that depletion of apical annuli SNAREs has a strong replication phenotype of parasites within the host vacuole. Given induced SNARE depletion must occur during this growing stage of the parasites, to ask if apical annuli could be involved directly in invasion processes requires testing for invasion competence of already very sick cells. It is, therefore, not possible to control for secondary effects on invasion incompetence due to general cell malaise. Furthermore, Fu et al report on invasion efficiency using an assay that relies on SAG1 presentation on the cell surface. However, they conclude independently in their study that SAG1 delivery to the surface is inhibited in their SNARE knockdowns. This further confounds any attempt to reliable measure invasion and any role for these SNAREs in this process. Therefore, for biological as well as technical reasons, we have not tested for a possible role of annuli in invasion.

      • It would be helpful for the field to use the same nomenclature whenever possible. Is it possible to use the naming described earlier?

      Authors’ response: Please see our response to Comment 2.4.

      Comment 3.3

      Fig 1C - The authors use trypsin shaving to demonstrate plasma membrane localization of LMBD3. They are probably correct - but it is important to definitively distinguish between plasma membrane and IMC membrane localization. a. The western blot bands for GAP40 should be quantified. It appears that GAP40 is also reduced and it could be reduced to a similar extent as SAG1 without quantification. In addition, this protection from digestion could be confirmed with a second marker in the space between the PM and IMC membranes like GAP45 (whereas cytoplasmic/mito markers like profilin and Tom40 are likely further protected by the IMC membranes and are thus less relevant here).

      Authors’ response: Quantitation of Western blots is notoriously inaccurate and, rather, we use it here as a qualitative indication of trypsin sensitivity of proteins in intact cells. The LMBD3 protein is completely transformed within the first time point (1 hour) to stable products of proteolysis of this polytopic membrane protein — presumably to those now protected within the cell. Known GPI-anchored surface protein SAG1 shows similar immediate sensitivity, although it is known that internalised SAG1 pools are constantly recycled to the surface and hence gradual elimination of the residual SAG1 band over 4 hours. The internal protein markers (GAP40, PRF, TOM40) show no discernible change in the first hour and little if any beyond that (within the variation common to Western blotting). GAP40 shares an equivalent polytopic membrane topology to LMBD3 except it occurs in the IMC membrane directly below the plasma membrane, so we think this is the more suitable control. Thus, this trypsin shaving experiment gives a binary output: sensitive or insensitive. This conclusion is further supported by the published spatial proteomics study (Barylyuk et al. 2020) which shows that LMBD3 segregates with other integral membrane proteins specific to the plasma membrane and not with the IMC proteins. Our super resolution imaging of LMBD3 relative to inner membrane complex markers (Centrin2, GAP45, IMC1) also show it as peripheral to them, further corroborating the plasma membrane location.

      1. Is it possible to N-terminally tag LMBD3 and then examine plasma membrane localization by detection of the tag without permeabilization? (this would also confirm the proposed topology) Authors’ response: We have tried to N-terminally tag LMBD3 with an epitope reporter but this integration was not tolerated by the cell, presumable because it interferes with membrane insertion of this protein that is essential for cell viability. So, this experimental option is not available.

      Comment 3.4

      I think it is important to make clear for the reader what is happening here. The paper sounds as though the dense granules directly dock at the annuli for release. It also seems possible from this work and Fu et al that secretion at the annuli occurs via small vesicles that originate from the dense granules. Perhaps a diagram or model would help the reader here (and discuss why DGs or other vesicles are not routinely seen at the annuli if this is the critical portal - and perhaps why the organelles are not clustered in the apical end of the cell if this is where they are needed)

      Authors’ response: This comment is related to that of review 2 (Comments 2.8/12), although we note again that Fu et al did not conclude that dense granules are exocytosed at this site. It is also unclear why this reviewer envisages that small vesicles arise from the dense granules, rather than the dense granule itself fusing at the annuli to the plasma membrane. Indeed, the occurrence of Rab11A on the dense granules, and the accumulation of this protein at the annuli with SNARE knockdown, supports that it is the dense granules that dock at this site. Why dense granules don’t otherwise cluster at their sites of secretion but are instead motile in the cell, their movement driven by Myosin F on actin filaments, is not known. Perhaps these otherwise bulky organelles would create too much cellular crowding that could interfere with other processes. We have addressed all of these points in additions to the discussion so that these interesting unknowns are transparent to the reader (Discussion paragraph 5).

      Comment 3.5

      Figure 5. The authors state the knockdown results in "strong phenotypes of reduced plaque development" - The plaque assays should be quantified.

      • Are there no plaques or just very small ones here?

      Authors’ response: The reviewer provides no rationale for this request or states what questions could be addressed by doing so. Indeed, none of our conclusions would be affected. We use the plaque assays to test whether each of the proteins tested are independently necessary for some facet of normal parasite growth where the result is binary — no difference in plaque size versus near or complete absence of plaque development. The interpretation of differing plaque sizes between different knockdown mutations is a very inexact science with assumptions of equal rates of protein depletion, sensitivity of relative protein abundance, modes of action of mutation, and kinetics of plaque growth very difficult to validate for meaningful comparisons to be made. Therefore, we don’t see any useful role for plaque quantification in the research questions that we’ve addressed or the conclusions that we present.

      Comment 3.6

      Figure 6 a. Fig 6A - The use of digitonin for semipermeabilization requires controls as there is typically a lot of variability across the monolayer. This is ideally done with something to show that the host plasma membrane has been permeabilized (e.g. host tubulin) and the PVM has not been permeabilized (e.g. SAG1). Otherwise, perhaps the authors could state what percent of cells showed the data like the representative images shown or describe further how selective permeabilization was assessed? (or wider fields with many cells and vacuoles?)

      *Authors’ response: As requested, we have included a supplemental figure showing wider fields of view where multiple vacuoles are seen. These data show that the vacuoles are similarly stained with no evidence of variability of digitonin permeabilization. The reduction in GRA5 secretion shown by microscopy is further supported by this protein being quantified using proteomics as enriched in the parasites when the apical annuli proteins are depleted (Fig 7). *

      Comment 3.7

      1. Fig 6B - "the GRA signal seen within the parasite was increased compared to the control" This is not clear from the AAQb image shown as it appears more is also present in the vacuole (or perhaps residual body?) Can this be clarified? Authors’ response: Yes, in this image it appears that the ‘residual body’, which is also an integral internal compartment of the growing parasite rosette, is a site of dense granule accumulation. We have modified the text to make it clear that the observations of IFA images showing ‘apparent’ increase in dense granule staining were then directly tested by quantitative proteomics. These subsequent data (Fig 7) provided a clear measure of the increase in dense granule proteins in the parasites when apical annuli function was perturbed.

      Minor comments

      Comment 3.8

      1. Line 215-217 The authors state that "Collectively these data imply that the apical annuli provide coordinated gaps in the IMC barrier that forms at the earliest point of IMC development and that they maintain access of the cytosol to these specialised locations in the plasma membrane."
      2. However, their data shows that LMBD3 only recruits once daughters are emerging (not earliest point of IMC development). Please clarify? Is this just referring to Centrin2 or LMBD3 as well? Authors’ response: Yes, the other AAPs indicate that these structures form early, and they were mentioned as such in the sentences preceding this statement — hence ‘collectively’.

      Comment 3.9

      Fig 5. Regarding growth arrest. AAQa appears to show an arrest but is it possible the others just grow slower? Do they arrest later and hence fail to form a plaque? Is there incomplete knockdown which enables a few parasites to persist?

      *Authors’ response: It is true that it is difficult to discern complete growth arrest from *

      *very retarded growth. However, neither alternative would affect our conclusions where we use these phenotypes as an indication of apical annuli participating in process required for normal growth. All plaque assays show strong growth phenotypes. Nevertheless, we have removed the use of the term ‘growth arrest’ with respect to these phenotypes (including in the Abstract) and replaced it with growth impairment. *

      Comment 3.10

      Line 132, Fig 1 A-C. For clarity it may be better for the reader if LMBD3 is named earlier, or if Fig 1 refers to the gene ID for panels A-C before its named.

      Authors’ response: This is a good idea and we have made this change, making note of the rationale for this name when we present the phylogeny.

      Comment 3.11

      Line 30 - "represent a second structure in the IMC specialised for protein secretion" this is confusing - do the authors mean in addition to the micronemes/rhoptries at the apical complex? Maybe "a second structure in the parasite" would be clearer

      Authors’ response: To clarify we have reworded as follows: ‘The apical annuli, therefore, represent a second type of IMC-embedded structure to the apical complex that is specialised for protein secretion

      Comment 3.12

      Line 440 - the author states that "these pre- and post-invasion secretion processes are also biochemically separated because both microneme and rhoptry secretion are SNARE-independent" Is this from the Cova and Dubios papers cited a line later? I took a quick scan of these papers and neither appear to show this? Cova claims still this is still unclear and Dubios says SNAREs are likely involved?

      Authors’ response: While both microneme and rhoptry secretion use distinctive molecular machineries for controlling membrane fusion for exocytosis, it is true that it is not formally known that these processes completely lack SNARE involvement, and neither paper cited here can eliminate this possibility. We have therefore, removed this short part of the discussion where we consider that dense granules might be unique amongst these three compartments in relying on SNAREs.

      Text editing

      Comment 3.13

      1. Line 94 - plasma membrane or cell surface. Clarify here - do you mean plasma membrane or under the membrane at the periphery? Authors’ response: We have modified as: ‘plasma membrane including the cell surface’.

      Comment 3.14

      Line 321 refers to Fig 6A but should say 7A. Panel 7B is never referenced in the text.

      Authors’ response: Thank you, we have corrected this and only sited Fig7 because A and B are both relevant to the statement made in the text.

      Comment 3.15

      Line 347-242 and fig 4A - the discussion of Q-SNARES and diagram could use some references for the reader

      Authors’ response: Thank you for this suggestion, we have acted on this request.

      Comment 3.16

      The methods says plaque assays were 7 days, fig 5 legend says 8 days

      Authors’ response: Thank you, this is corrected as 8 days.

      **Referees cross-commenting**

      • I completely agree with Rev 2
      • I also think examining invasion given Rev1 comment on the micronemes and the data from Fu et al would be worthwhile and straightforward to do

      Authors’ response: Please see our response to Comment 3.2 where the validity of measuring invasion competence of poorly growing, and/or arrested, parasites is scientifically questionable. It would require controls of similarly unhealthy parasites where the apical annuli are unaffected, but it is difficult to imagine how one would deliver such a control.

      Reviewer #3 (Significance (Required)):

      This is an excellent study that assesses the role of apical annuli in parasite secretion. It is an important addition to the field (and outstanding imaging that provides a high level of detail to the study). The study could be improved by better integrating a recent similar study noted by the authors and in the review

      Authors’ response: We have provided more direct discussion of the Fu et al paper in our Discussion section.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their critiques of our manuscript and for recognizing the importance of the questions about 3D genome organisation that it addresses. We plan to address most of their comments in our revised manuscript.

      Reviewer #1

      1. The aneuploid karyotype of the MCF-7 cells used is a concern. GREB1 is present in four copies, with two on abnormal chromosomes which may not be regulated in the same way as primary cells. The authors should include caveats to this effect in the text to account for this.

      We indicated (pg 5) that there are 4 copies of GREB1, 2 of which are on re-arranged chromosomes. RNA FISH (Figure 1C) suggests all 4 of these alleles are induced by estrogen. On each allele, the GREB1 enhancer and promoter remain closely apposed by imaging (Figure 2, DNA FISH) indicating no gross chromosomal rearrangements around the GREB1 locus. This is confirmed by our Hi-C data (Figure 2A), where any genomic rearrangements at the GREB1 locus would be detectable when the sequencing data were aligned to the reference genome. In the revised manuscript we highlight these points in the respective results sections (pgs 5 and 6). Our data suggest that all 4 alleles of GREB1 in MCF7 cells are regulated in the same way.

      2. The authors should also include more information on the generation and verification of the enhancer deletion cell lines. An illustration of the PCR primers used for screening, as well as an illustration of the sequenced product traces aligned with the reference genome (as opposed to just showing the deleted regions) should be included in Fig. S1D. This would give the reader more confidence that the designed knockout has occurred in the same way on all alleles. Furthermore, long-range PCRs and sequencing should be considered to confirm that no larger deletions have occurred (e.g. Owens et al., 2019 PMID: 31127293).

      We have replaced FigS1D with a new Figure Supplement (Figure S1.2A) that incorporates a more comprehensive diagram of the strategy used for the generation and screening of the enhancer deletion cell lines. This also includes the sequencing traces aligned to the reference genome for each of the clones used in this work. Additionally, in the revised manuscript, we will check the deletions using the C-TALE sequencing data obtained from the enhancer-deleted clones.

      1. The changes in the measured E-P interaction frequency following gene activation are __weak __at best and make visual interpretation of the results difficult. Showing the reciprocal virtual 4C plots from the promoter would help to reassure the reader that the observed effect is real.

      We thank the reviewer for this suggestion, and we will now include virtual 4C plots from the GREB1 and NRIP1promoters in our revised manuscript. These will be in figures 2B, 2E, 3C, 4C and in the supplementary figures 2B, 4B, 5C and 6C.

      4. Furthermore, the precise 3C method used is not clear. The authors repeatedly refer to "Capture-C" (a commonly used 3C-based approach using biotinylated oligos to pull down targets of interest) but the citation used (Golov et al. 2019) refers to a conceptually similar method called "C-TALE". This should be clarified in the text.

      We thank the reviewer for pointing out this potential confusion. We replace the term Capture-C with C-TALE throughout the revised manuscript.

      5. As for the changes in contact frequency, the observed changes in distance measurements between conditions are very small (although statistically significant). We acknowledge that this is likely due to the relatively small linear distances between enhancers and promoters in this study. However, it would be helpful to see the effects of the induction/treatments on a one or more control loci which is not affected by oestrogen signalling given that global changes in nuclear shape/volume and/or cell cycle effects could occur within this time (e.g. effects of tamoxifen treatment on MCF-7 cell cycle distribution, (Osborne et al. 1983 PMID: 6861130), which could impact nuclear volume.

      Data from DNA FISH control probes are already included in Supplementary Figure S3 showing no change in intra-nuclear distances and thus no general effects on chromatin compaction due to nuclear volume or cell cycle. Virtual 4C data for the entire captured regions around GREB1and NRIP1 are show in Fig S2C, also showing no general effect on the wider capture windows. We will include similar data from the viewpoint of the gene promoters in the revised manuscript. Hi-C and imaging data from the enhancer deletion cell lines (Fig S4) also supports that we are looking at an ER-specific effect, not a global one. With the regard to the comment on the effects of tamoxifen treatment on MCF-7 cell cycle distribution, we see no effects of tamoxifen on 3D genome organisation at GREB1 and NRIP1 by Hi-C or by imaging.

      6. The authors discuss previous studies demonstrating that E2 and 4OH recruit different sets of proteins to their target genes. Given that this is central to the conclusion that the ER ligand (and its recruited co-factors) determines the E-P interaction frequency and 3D distances observed, it would be important to demonstrate this at the GREB1/NRIP1 loci specifically. ChIP data of the co-activators/repressors recruited by E2 and 4OH, respectively, would greatly strengthen this claim.

      We acknowledge that investigating co-activator and co-repressor recruitment to the studied loci will strengthen our interpretation our conclusions. In the revision we will perform and include ChIP-qPCR at NRIP1, GREB1 and control loci assaying for PolII, co-activators such as p300, mediator and SRC-3 and the co-repressors N-CoR in control, estradiol and tamoxifen treated cells. We will also perform ChIP-qPCR of PolII and co-activators in cell treated with flavopiridol and triptolide.

      1. The observed uncoupling of E-P contact frequency and 3D distance upon transcriptional inhibition is interesting and offers clues to the molecular details underlying E-P interactions. However, the use of flavopiridol and triptolide, while common in the field, should be carefully qualified given the potential for their indirect effects on transcription. This is particularly important for flavopiridol given its ability to target multiple cyclin-dependent kinases beyond CDK9 and its role in transcription initiation.

      In the revised manuscript we indicate that “Flavopiridol inhibits several CDKs, including CDK9/PTEF-b”

      Minor comments:

      i. In the introduction and beginning of discussion, it would be helpful to detail previous studies where FISH-based analyses have shown more proximal E-P positioning upon activation, to make it clear that differences in E-P proximity appear to be gene-specific. Some examples include Williamson et al. (2016; PMID 27402708) and Chen et al. (2018; PMID 30038397). Speculation as to why some genes behave in this way while others do not, would also be worthwhile.

      We have followed the reviewer’s suggestion and noted these two studies in the Introduction of a revised manuscript. Given that the focus of this current manuscript is to explore discrepancies between Hi-C and DNA FISH, we do not think that this is the right forum for a wider discussion of why there might be differences in E-P proximity between different biological systems.

      ii. On page 6, the authors state that after deletion of the NRIP1 enhancer there is "almost total loss of NRIP1 induction in response to E2". This does not seem to match the data where in 3 out of 4 replicates (2 for each clone) there is a statistically significant increase in number of RNA FISH foci upon E2 stimulation in the NRIP1 enhancer KOs. This suggests that, as for GREB1, the regulation of these genes is not solely controlled by the deleted enhancers. This should be clarified in the text.

      The reviewer is referring to the data on NRIP1 expression in two NRIP1 enhancer deletion clones in Fig 1D and the replicate data in Supplementary Fig S1 (upper-right panel). These data show almost no induction of NRIP12 by E2 compared to wild-type cells. We stand by our statement.

      iii. The labelling of the FISH probes in Supp. Fig. S2 could be improved as it is currently very difficult to read these.

      We will try to improve this in a revised Figure S2.

      iv Given that the authors have referenced a distance of 200 nm as potentially being an important threshold for gene activation, it would be useful to include the fraction of alleles which are below this distance alongside the cumulative frequency plots in Figure 2D and elsewhere in the paper as the cumulative frequency plots can be hard to read in some cases (e.g. Supp. Fig. S3B e-p). This would also allow the authors to show consistency across replicates.

      We thank the reviewer for this suggestion to make the data easier to interpret. In a revised manuscript, we will incorporate the fraction of alleles below and above 200 nm for the DNA-FISH experiments in Figure 2D and Figure S4A-B.

      v. For clarity, it would be helpful to include the difference map between the vehicle-treated unstimulated/stimulated conditions for the 3C plots in Fig. 4. This would help contextualise the resulting differences observed with the drug treatments. Same for Supp. Fig. S6.

      We will include the difference heatmap between the vehicle- and estradiol treated samples for vehicle, flavopiridol and triptolide treated samples.

      vi. Statistical comparisons are not shown for all 3D FISH-based distance measurements (e.g. Supp. Figs. S3A, S4C, D, S6E). If this is because the tests were done and the results were non-significant this should be indicated.

      We had omitted all non-significant p values (>0.05) from the graphs to stop them getting too cluttered. All p values are documented in the supplementary tables. However, following the reviewer’s comment, we will indicate all non-significant statistical comparisons on the graphs.

      vii. On page 13, the authors state that increased E-P separation occurs "before nascent transcription of the gene is detected by either TT-seq or RNA FISH". This does not appear to be correct given that baseline levels of transcription are observed in the absence of ER stimulation by both methods (Fig. 1). This should be clarified in the text.

      We have amended this statement to now indicate that “This is before an induction of nascent transcription of the gene….”

      Reviewer #2

      1. The authors make strong claims and although these are generally reasonably well supported by the data, it is important to acknowledge that they are based on two loci. This manuscript would be stronger if the authors could include additional loci in their study design. If this is not possible, it would be good to acknowledge that the conclusions are preliminary/speculative at this stage.

      The reviewer makes a fair point, and we emphasized throughout the text – including at the end of the Discussion - that we are examining just two gene loci. In a revised manuscript we will include DNA-FISH data for a third locus comprising the CCND1 gene, for which we have preliminary data.

      *2. It would be helpful if the authors could clarify the strategy they used for their FISH probe design. The enhancer and promoter fosmid probes (which are used for the majority of the experiments) are not centered on the active elements and do not even seem to overlap in the case of the GREB1 enhancer fosmid probe. The 10 kb enhancer probe seems better placed for the GREB1 locus, but the 10 kb enhancer probe does not seem to overlap with the enhancer in the NRIP1 locus. It is conceivable that the exact location of the probes has a big impact on the measurements and it would therefore be helpful if the authors could comment on the location of the probes and add additional probes if required to strengthen their conclusions. In addition, the fosmid probes are very large (40 kb). Although the authors acknowledge this, it would be helpful if they could comment on how overlap between 40 kb probes should be interpreted in relation to a potential rather focal contact between (proteins bound to) regions of In the case of GREB1, the fosmid probes were chosen to maximize the distance between them as the promoter and the enhancer of the gene are genomically relatively close to each other. This was not an issue in the case of the NRIP1 locus where fosmid probes could be placed centered on the TSS and the enhancer region. In the case of the 10 kb probes, these were designed to be centered on the regions where higher E2-induced C-TALE contact frequencies were detected. Virtual 4C plots using the TSS regions as viewpoints (incorporated into the revised manuscript) clearly show that, in the case of NRIP1, the contact frequency peak does not fall on the main ER peak.

      1. It is not clear to me why the authors would choose to work with a locus that is present in 4 copies in their cell line. Is the entire regulatory region (incl. enhancers) preserved for the two additional copies of the gene? Can the authors comment on how this may impact on their measurements?

      See response to Reviewer 1, point 1. Our Hi-C data would have revealed if there were genomic rearrangements in the 600kb window surrounding GREB1.

      4. Figure 2D shows an increase in E-P separation for the NRIP1 locus across all timepoints, with cumulative frequency plots shown for the 10 min timepoint. However, the data for the second replicate shown in Figure S2D are a lot less robust and not significant for the 10 min timepoint. It is important that the authors either provide additional data to support the robustness of this experiment or acknowledge that the results are not fully reproducible.

      We acknowledge this, but we would like to note that there is an increase in the median distance for all time points, although this difference is not significant in some of the timepoints. Additionally, DNA-FISH data obtained using the 10 kb probes confirm these observations.

      5. The data presented in Figure 2F for clone 2 of the GREB1 enhancer deletion still show increased E-P distance upon activation. How do the authors explain this?

      This increase in distance is not statistically significant (p-0.33 – see Table S2) and is not seen for the replicate data in Fig. S4.

      Minor comments:

      i. Could the authors comment on the observation that the NRIP1 promoter is not bound by ERa or p300 upon estrogen activation? Are there ATAC-seq or H3K27ac ChIP-seq data available for these conditions?

      We included ATAC-seq tracks in Figure 1A where a peak on the NRIP1 promoter is clearly seen.

      ii. It is not obvious which timepoint is shown in Figure 1D.

      Pre-mRNA FISH in enhancer deleted clones was done in cells treated with vehicle or E2 for 60 minutes. This will be made clearer in the figure legend.

      iii. Why did the authors choose e-i and p-i instead of e-c and p-c in Supplementary Figure 3B?

      We apologize as it was an oversight not to include the e-c data for this experiment. This is now included in Supplementary figure S4B.

      iv. "We treated hormone starved MCF-7 cells with flavopiridol or triptolide for 5 min before adding E2 for 30 min (Fig. 4A)." Does this mean that the FLV/TRP treatment lasted for 35 min or did the authors wash it out before adding E2? Please clarify.

      This observation is correct, and it was made clear in Figure 4A and in the figure legend.

      v. The authors refer to their Capture-C data as "high-resolution". However, the methods section mentions that the data for the GREB1 and NRIP1 locus are 5 kb and 10 kb resolution, respectively. This is not particularly high for a targeted approach, certainly not in light of the MNase-based approaches that have recently been developed. I therefore think that the "high-resolution" claims should be removed from the paper.

      In line with the reviewer’s suggestion, we have removed the term high-resolution when referring from our own data.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Gómez Acuña and colleagues have investigated changes in enhancer-promoter (E-P) interactions with both 3C and DNA FISH. As a model system, they have used the activation of estrogen receptor-dependent enhancers, which allows for examination of changes in E-P interactions at relatively high temporal resolution. Surprisingly, they find that gene activation is associated with increased E-P interactions as measured by 3C but reduced spatial proximity as measured by DNA FISH. The authors show that both these measurements are dependent on the presence of the enhancer. In contrast, blocking transcription with inhibitors does not have a strong effect on the 3C measurements, but abolishes the increased spatial E-P separation as measured by DNA FISH following estrogen induction.

      Overall, this is an interesting and thought-provoking study. However, the strong conclusions are not fully supported by the data, as explained in further detail below.

      Major comments:

      • The authors make strong claims and although these are generally reasonably well supported by the data, it is important to acknowledge that they are based on two loci. This manuscript would be stronger if the authors could include additional loci in their study design. If this is not possible, it would be good to acknowledge that the conclusions are preliminary/speculative at this stage.
      • It would be helpful if the authors could clarify the strategy they used for their FISH probe design. The enhancer and promoter fosmid probes (which are used for the majority of the experiments) are not centered on the active elements and do not even seem to overlap in the case of the GREB1 enhancer fosmid probe. The 10 kb enhancer probe seems better placed for the GREB1 locus, but the 10 kb enhancer probe does not seem to overlap with the enhancer in the NRIP1 locus. It is conceivable that the exact location of the probes has a big impact on the measurements and it would therefore be helpful if the authors could comment on the location of the probes and add additional probes if required to strengthen their conclusions. In addition, the fosmid probes are very large (40 kb). Although the authors acknowledge this, it would be helpful if they could comment on how overlap between 40 kb probes should be interpreted in relation to a potential rather focal contact between (proteins bound to) regions of <1 kb.
      • It is not clear to me why the authors would choose to work with a locus that is present in 4 copies in their cell line. Is the entire regulatory region (incl. enhancers) preserved for the two additional copies of the gene? Can the authors comment on how this may impact on their measurements?
      • Figure 2D shows an increase in E-P separation for the NRIP1 locus across all timepoints, with cumulative frequency plots shown for the 10 min timepoint. However, the data for the second replicate shown in Figure S2D are a lot less robust and not significant for the 10 min timepoint. It is important that the authors either provide additional data to support the robustness of this experiment or acknowledge that the results are not fully reproducible.
      • The data presented in Figure 2F for clone 2 of the GREB1 enhancer deletion still show increased E-P distance upon activation. How do the authors explain this?

      Minor comments:

      • Could the authors comment on the observation that the NRIP1 promoter is not bound by ERa or p300 upon estrogen activation? Are there ATAC-seq or H3K27ac ChIP-seq data available for these conditions?
      • It is not obvious which timepoint is shown in Figure 1D.
      • Why did the authors choose e-i and p-i instead of e-c and p-c in Supplementary Figure 3B?
      • "We treated hormone starved MCF-7 cells with flavopiridol or triptolide for 5 min before adding E2 for 30 min (Fig. 4A)." Does this mean that the FLV/TRP treatment lasted for 35 min or did the authors wash it out before adding E2? Please clarify.
      • The authors refer to their Capture-C data as "high-resolution". However, the methods section mentions that the data for the GREB1 and NRIP1 locus are 5 kb and 10 kb resolution, respectively. This is not particularly high for a targeted approach, certainly not in light of the MNase-based approaches that have recently been developed. I therefore think that the "high-resolution" claims should be removed from the paper.

      Referees cross-commenting I agree with the comments raised by Reviewer 1

      Significance

      Since 3C and DNA FISH are widely used, the discrepancy between these measurements that is described here is of potential broad interest to the field. Since these claims are rather strong and have potential far-reaching implications, it would be helpful if the authors could strengthen their conclusions further, by improving the robustness of the data and including additional loci and additional probes to show that the measurements are not specific for these two loci or dependent on the location of the probes. I think that the paper is in principle also publishable without these additional experiments, but in that case, it would be very important to explicitly acknowledge the limitations of the data throughout the manuscript and clarify that the conclusions are preliminary/speculative at this stage.

      Expertise: 3D genome organization.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We would like to thank the 3 reviewers for their comments and suggestions for our manuscript. We believe that the revisions we plan to make, based on the comments by the reviewers, will greatly enhance the quality of our manuscript.

      We would like to respond to some reviewer comments here, since they do not fit into any of the subsequent sections.

      Reviewer #3

      In the Results section that describes the delay in gata2b expression (page 4 and Supp. Fig. 4), the authors show that the mutant embryos start expressing more gata2b at 30 - 36hpf after the decreased expression at earlier time points, with no difference at 48hpf. What could explain that recovery?

      We thank reviewer 3 for this question. The partial functionality of the Cx41.8 channel in cx41.8tq/tq mutants may explain why the HSPC program is eventually induced (leading to sufficient mitochondrial ROS production for Hif1/2α stabilisation). However, this could also result from functional redundancy between Cx41.8 and other connexins such as Cx43 or Cx45.6 in the mitochondria, since they are also expressed in zebrafish arterial ECs at 24hpf (Gurung et al, Sci Rep, 2022) and cx43 knockdown has previously been shown to result in an HSPC specification defect in zebrafish (Jiang et al, Fish Physiol Biochem, 2010). Together, these aspects may explain the recovery, although delayed, of gata2b expression in the cx41.8tq/tq mutant, as discussed in detail in our manuscript.

      The authors showed that gata2b expression can be rescued by ROS induction in the dose-dependent manner (page 6 and Fig.3 and Supp. Fig. 6). Is this what rescues gata2b expression at 30hpf in the cx41.8 mutants?

      This is exactly right, we hypothesize that in cx41.8tq/tq mutants, it takes longer for mitochondrial ROS production to reach above the threshold required to stabilise Hif1/2α and hence induce gata2b expression, which is supported by the data referred to by this reviewer.

      Are any vascular defects in the mutant embryos?

      Our lab previously reported that cx41.8tq/tq embryos have faster ISV growth rate (Denis et al, Front Physiol, 2019). However, we found no evidence of a link between the ISV growth rate increase and the HSPC specification defect in these embryos. Importantly, we show that aorta specification is normal in cx41.8tq/tq mutants, as determined by dll4 expression at 24 (Supp. Fig. 1C) and 28 hpf (Supp. Fig. 1D).

      2. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary

      The manuscript by Petzold et al. explores the functions of connexin 41.8 (cx41.8) (mammalian homologue Connexin 40) in hematopoietic stem cell (HSC) formation in the zebrafish dorsal aorta. The authors use a cx41.8 allele that appears to be hypomorphic, as the phenotype is milder than a previous cx41.8 allele that the same group published (Cacialli et al., 2021). cx41.8tq/tq mutants exhibit delayed onset of hemogenic endothelial specification, as marked by gata2b at 24 hpf, but HSPC development proceeds normally from 48 hpf onwards. A new reporter line for cx41.8, Tg(cx41.8:GFP), was generated and is expressed in the floor of the dorsal aorta, consistent with the location of hemogenic endothelial cells. Lower ROS production in the whole cell and in the mitochondria was reported in the cx41.8tq/tq mutants, and treatment with ROS enhancers, H2O2 and menadione, appeared to rescue the mutant phenotype of reduced HSPCs at 28 hpf. Finally, the authors tested a link between cx41.8 and Hif1α by pharmaceutically (DMOG/CoCl2) or genetically (vhl morpholino) inhibiting Hif inhibitors, and observed a rescue of HSPC formation in cx41.8 mutants.

      I think it would be important for the authors to address the mechanisms of why cx41.8tq/tq and the other cx41.8-/- (leot1/t1) mutant phenotypes are different, with the latter allele showing more severe phenotypes of increased HSPC apoptosis and reduced HSPCs during later development. The authors speculate the cx41.8tq/tq allele encodes a missense mutation in one of the channel domains, and as such, might be a hypomorph. The authors cited the original paper by Watanabe et al. (2006); however, this paper actually noted that the cx41.8tq/tq allele is likely to be a dominant negative - and as such, should have exhibited a stronger phenotype than the leot1/t1 mutant allele. From the paper: "leotw28 and leotq270 heterozygotes have phenotypes different from that of WT; thus, they represent dominant-negative alleles." Importantly, no data are shown to provide evidence that the allele is a hypomorph - at minimum, qPCR data should be provided to show whether there is NMD of the mRNA in cx41.8tq/tq mutants.

      We would like to thank the reviewer for this comment and suggestion. As the reviewer has rightly pointed out, the cx41.8tq/tq mutation is thought to result in a protein with dominant-negative function (Watanabe et al, EMBO Rep, 2006; Watanabe et al, J Biol Chem, 2016).

      In fact, we agree that the mutant cx41.8tq/tq protein acts as a dominant-negative and although the reviewer is right to point out that the cx41.8t1/t1 mutant may thus exhibit a stronger phenotype which we found not to be the case (runx1 expression was found to be normal in the cx41.8t1/t1 mutant, Cacialli et al, Nature Commun, 2021), we provided our explanation for this in the discussion of the manuscript:

      “The partial functionality of the Cx41.8 channel in cx41.8tq/tq mutants [14] may explain why the HSPC program is eventually induced. However, this could also result from functional redundancy between Cx41.8 and other connexins such as Cx43 or Cx45.6 in the mitochondria, since they are also expressed in zebrafish arterial ECs at 24hpf [18] and cx43 knockdown has previously been shown to result in an HSPC specification defect in zebrafish [36]. This potential functional redundancy may also provide an explanation as to why HSPCs are specified normally, without any delay, in cx41.8t1/t1 embryos [12]. In these null mutants, cx41.8 expression is completely absent but may be functionally compensated by other connexins, whereas in cx41.8tq/tq mutants, although cx41.8 is expressed, its channel function is reduced [14]. Moreover, as Cx41.8 may form heterotypic channels with Cx43 and/or Cx45.6 (and potentially also with others), the function of these chimeric channels would also be altered”

      We believe this addresses the reviewers concern regarding this, especially given the fact that Cx43 and Cx45.6 have been found to be expressed in arterial ECs at 24 hpf, as cited in the manuscript. With regards to the reviewer’s question about whether there is NMD of the cx41.8 transcript, given that the cx41.8tq/tq mutation is missense and does not result in a premature stop codon (usually required for NMD to be induced, Kurosaki et al, J Cell Sci, 2016), we do not believe that there is NMD of the cx41.8 transcript in cx41.8tq/tq mutants. We will however verify this by carrying out the experiment suggested by this reviewer, qPCR analysis of cx41.8 expression in cx41.8tq/tq embryos and wild-type controls.

      The quantification data in this manuscript are not satisfactory. The authors only provide graphs that show embryos with "low", "medium" and "high" numbers of HSPCs, which is incredibly subjective. Considering that the authors already have the cx41.8tq/tq in the Tg(myb:GFP) background (Figure 1E), they could have quantified the precise numbers of Tg(myb:GFP)-positive cells at different timepoints and with the different pharmaceutical rescue experiments. Ideally, this should be combined with other HSPC markers such as Tg(cd41:GFP) or Tg(runx1:GFP) - although this could be limited by the authors' access to the lines or time it takes to cross the mutants to the transgenes.

      We thank reviewer 1 for their concern regarding this. Indeed the reviewer is correct, it would take us too long (at least 6 months) to generate the cx41.8tq/tq cd41:GFP or cx41.8tq/tq runx1:GFP lines, however, as stated, we do already have the cx41.8tq/tq cmyb:GFP zebrafish line. That said, repeating the pharmacological experiments using the cx41.8tq/tq cmyb:GFP zebrafish line would demand months of work and we do not currently have the personnel to perform all of this. However, we will perform the same experiment as performed previously to generate figure 1E but also at earlier timepoints. The cmyb:EGFP transgene marks nascent HSPCs from 28 hpf, and so we will aim to image, and quantify differences in budding HSPCs in cx41.8tq/tq cmyb:EGFP and cmyb:EGFP controls between 28 hpf and 36 hpf. We agree with the reviewer that this will add depth to our study and will provide evidence to back up our conclusions.

      The link between cx41.8 and Hif1α is tenuous. The authors should perform in situ hybridization for the hif1 genes and their downstream effector notch1 which is known to be important for the HSPC specification (Gerri et al., 2018).

      We thank the reviewer for this point. we do not expect hif1/2α expression to be affected in this mutant. Mitochondrial ROS has been shown to stabilise Hif1/2α at the protein level, not the mRNA level. Our data, and that of others (Harris et al, Blood, 2013), suggest that in the absence of mitochondrial ROS, prolyl hydroxylases are not inhibited by mitochondrial ROS, and they target Hif1/2α for ubiquitination and subsequent destruction in a Vhl-dependent manner (as shown in Fig. 4D). We have changed the text in the manuscript to clarify that Hif is stabilised on the protein level (please see the section below).

      Since we do however expect notch1a and notch1b expression to be altered in our mutant embryos, as they are transcriptionally regulated by Hif1/2α (Gerri, Blood, 2018), we will perform in situ hybridisation and qPCR analysis of these 2 genes at 18-24 hpf in cx41.8tq/tq mutants and controls to clarify this point and solidify our model.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary

      Petzold et al are here addressing the potential function of the connexin Cx48.1, a protein involved in the structure of gap junctions, in the specification of future hematopoietic stem cells and progenitors (HSPCs). This piece of work is complementing their previous results showing the function of this connexin isoform in HSPC expansion in the transient hematopoietic niche in the caudal tissue of the zebrafish embryo. They explore phenotypes triggered by the expression of a mutant form bearing a single amino-acid substitution in the fourth transmembrane domain of the protein. Using whole mount in situ hybridization (WISH) of the two transcription factors Gata2b and Runx1, a novel transgenic fish line that expresses eGFP under the control of the Cx48.1 promoter region, and a series of drug treatments interfering with, or promoting, the formation of reactive oxygen species (ROS) production and oxidative stress, they propose that Cx48.1 is also involved upstream of HSPC amplification, rather in their specification at the level of the hemogenic endothelium constituting the ventral floor of the dorsal aorta. Mechanistically, they hypothesize that this function relies on mitochondria-derived ROS that would destabilize the VHL protein involved in mediating the degradation of Hif1/2a transcription factors, thereby stabilizing the Hif1/2a-Notch1a/b signaling axis involved in specification of the hemogenic endothelium.

      The WISH and quantitative analyses.

      Most of the quantitative analyses in the work are based on chromogenic WISH, which is not sufficiently accurate because leading to highly variable results, in addition to its lack of linearity. WISH is also subjected to important variations, particularly for transcription factors that are expressed at low levels such as Runx1, and to some extent Gata2b also. One obvious example in the paper is the inconsistency of signals that are observed Fig1C (Gata2b, left, wt, 24hpf) and FigS3B (Gata2b, left, wt, 24hpf) in which the signal is barely visible and is comparable to the signal for the cx41.8tq/tq mutant Fig1C, right.

      In addition, in the timings that are analyzed in FigS3 (Gata2b, 26 and 28hpf) to argue on temporal delay of expression in the cx41.8tq/tq mutant, the Gata2b signal is masked by the strong increase in tissues other than the hemogenic endothelium in the dorsal aorta (including signal in the somites as well as, possibly, increase in background). In this very example, it is legitimate to question the accuracy of the quantification methodology when the signal in the tissue of interest is drowned in the overall signal from surrounding tissues; how can the authors explain the 100% of embryos that have a 'Low' signal in the region of interest (FigS3C, cx41.8tq/tq mutant in comparison to WT)? This point is also valid for the data quantified FigS4 in which the fitting between WISH data and the quantifications appears to be questionable (for all timing points: 30, 32, 36, 48hpf and comparing mutant with the WT.

      My suggestion would be to complement the WISH data and improve the quantitative analyses using another, more accurate approach such as qRT-PCR for example (on dissected trunk regions and, if necessary because of expression in other surrounding tissues (in the case of Gata2b at later time points), after FACS-sorting using a fish line expressing a fluorescent reporter driven by a vascular promoter, ex: the kdrl:mCherry line used in the work). This is particularly important for the expression of the two transcription factors Runx1 and the more upstream Gata2b, the latter being involved in HSPC specification which is taken as a reference. qRT-PCR experiments should be feasible relatively easily and in a reasonable time frame as the technics is not very time consuming and easily accessible.

      We thank reviewer 2 for their concerns regarding the in situ quantifications used during this study. Although the approach we have used is widely used in the field to quantify gene expression differences, we appreciate that our data could be strengthened by complementing it with another approach. As such we will do the following:

      • We will complement our in situ hybridisation characterisation of delayed hemogenic endothelium formation and HSPC specification with qPCR experiments. For this, we will dissect the trunks of 8tq/tq embryos and controls and perform qPCR analysis of gata2b expression at the timepoints analysed during development (Supp. Fig. 3 A-D and Supp. Fig. 4 A-D), whilst also using the same approach to compliment the data for gata2b and runx1 expression at 24 hpf (Figure 1C and D). We agree with the reviewer that this is a feasible approach and would add robustness to the data we already show.

        2- Fluorescence imaging and associated interpretation/conclusions.

      The fluorescence images (Fig1E; Fig2B,D; Fig3A) are very difficult to analyze; they lack resolution because they appear to be epifluorescence images and not confocal images. When the signal is low, which is in particular the case for the novel Cx41.8:EGFP fish line, Fig2B (which is confirmed with the FACS GFP signal in comparison to the mCherry of the kdrl:mCherry fish line), it is not possible to provide convincing images on the vascular/aortic expression because of the high background of diffusion (the authors state 'likely to be the aortic floor', indeed it is not possible to validate the fact that the expression is truly in potential hemogenic cells). The double positive population in the FACS (Fig2C, right) does not resolve the issue because if indeed cx41.8 is expressed in endothelial cells (as expected from previous studies), the double positive population could equally be endothelial cells from inter-somitic vessels, for example (not to mention the underlying vein which is very close to the aorta in the trunk)). Fig2D, images are too small and, again, the resolution is not good enough to say that double positive cells are on the aortic floor. It is recommended to convince the reader that the authors try to confirm their statements by using confocal microscopy and increase the magnification of the relevant regions of interest.

      We thank this reviewer for this point. We will address this concern by using, as they suggest, confocal microscopy to try to get higher resolution images. In particular, we will do the following:

      • We will use confocal microscopy to image the 8:EGFP line as was done previously (Fig 2B), in order to obtain higher resolution images of expression of cx41.8 in the floor of the aorta.
      • We will also use confocal microscopy to image the 8:EGFP;kdrl:mCherry line as was done in Fig 2D, in order to gain higher resolution images.
      • We will also increase the magnification of the relevant regions of our confocal microscopy images as suggested by this reviewer.

        There is an inconsistency in the data between Fig1E (40hpf, in vivo imaging using the cmyb:GFP fish line) and FigS2 (48hpf, WISH cmyb); how can we observe 'HSPCs budding from the dorsal aorta' (see legend Fig1, arrowheads) which seems very much decreased in the imaging experiment for the cx41.8tq/tq mutant in comparison to WT, and have no effect on the cmyb signals FigS2B? What are the GFP+ cells that are aligned along the elongated yolk Fig1E and that appeared to be decreased in number in the mutant?

      We agree that this disparity is confusing for the reader. We believe the disparity between these results is due firstly to the fact that the experiment in Supp. Fig 2C was performed 8 hours after that in Fig 1E and secondly due to the time it takes for GFP to fold (in the case of Fig 1E). It is also important to keep in mind that the phenotype is not a complete absence of HSPC budding, but only a delay in the onset of EHT.

      • We will however address this concern by carrying out the experiment described above - we will perform the same experiment as performed previously to generate figure 1E but also at earlier timepoints. The cmyb:EGFP transgene marks nascent HSPCs from 28 hpf, and so we will aim to image, and quantify differences in budding HSPCs in 8tq/tq cmyb:EGFP and cmyb:EGFP controls at numerous timepoints from 28 hpf to 36 hpf. This will add depth to our study by providing evidence to back up our conclusions.
      • We will remove the 40-hpf timepoint (Fig 1E) to avoid confusion regarding the disparity with cmyb expression by WISH in Supp. Fig 2C.
      • Regarding the GFP+ cells aligned along the yolk in 1E, we thank the reviewer for pointing this out. These cells are multiciliated cells, from the kidney tubules (Wang et al, Development 2013). We will determine whether their numbers do indeed differ between 8tq/tq;cmyb:EGFP and cmyb:EGFP controls in our new confocal experiments and will mention this in the manuscript if they do.

        It would be important to investigate/show, at least with qualitative WISH experiments all along the time-window of HSPC specification as stated by the authors (26-54hpf, see main text third paragraph of Results), that Cx41.8 is detected in arterial endothelial cells (and perhaps enriched in the hemogenic endothelium?), in complement to the work they are referring to on transcriptomic data at 24hpf (Ref18 Gurung et al Sci Rep 2022). Ideally, these WISH data should be resolutive enough to provide clear localization in aortic cells versus cells in the aortic floor to bring significant added value to the work that lacks spatial resolution (ex: fluorescent WISH using confocal microscopy, allowing to superpose signal with cell types (either by double fluorescent WISH (vascular marker + Cx41.8) or superposing fluorescence signals with transmitted light)).

      We agree with this reviewer regarding this point. The way we will address this is to use confocal microscopy at different timepoints from 24-40 hpf using the cx41.8:EGFP; kdrl:mCherry line to show that expression of cx41.8 is indeed present and enriched in the floor of the dorsal aorta during the timeframe of HSPC specification. We believe that imaging this line using confocal microscopy will be sufficient to clearly show this.

      It would be more informative and secure, Fig2D, to show images of the double transgenics (Cx48.1:eGFP;kdrl:mCherry) at 28-30 hpf (rather than 48 hpf) which is more narrowed down to the specification of the hemogenic endothelium thus preventing any risk to visualize the fluorescence signals coming from recently born HSPCs rather than signals from cells embedded in the aortic floor.

      We thank the reviewer for this suggestion, which we believe would indeed improve the manuscript. As discussed above, we will indeed use confocal microscopy at different timepoints, including 28-30 hpf using the cx41.8:EGFP;kdrl:mCherry line to show that expression of cx41.8 is indeed present and enriched in the floor of the dorsal aorta during the timeframe of HSPC specification. We believe that imaging this line using confocal microscopy will be sufficient to clearly show this and so thank the reviewer for this excellent suggestion.

      To make the data more convincing on the ROS production in the ventral side of the cord in wild type embryos (which suggests that future hemogenic cells are already ventralized at that stage), it would be important to obtain confocal images of the region of interest and perform reconstitution of Z-stacks with a sagittal view (rather than longitudinal). It would be nice also to obtain comparable images later on, after lumenization and before initiation of HSPC emergence (before 28hpf).

      We thank the reviewer for this suggestion and agree that the suggested approach will solidify our data. As such, we will carry out the proposed experiment, using confocal imaging to gain longitudinal and sagittal images of mitoSOX staining in WT embryos and cx41.8tq/tq mutants at both 16 hpf and 26 hpf.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      Related to the above point, the authors should test whether the gap junction function of Cx41.8 is intact in the cx41.8tq/tq mutants by assessing calcium waves in the GCamp transgenic line.

      …we have also now found additional published in vivo evidence that Cx41.8 channel function is reduced in the cx41.8tq/tq mutant, which is now also cited in the new version of the manuscript (please see our full response to this point below).

      Please see the section “Description of analyses that authors prefer not to carry out” for additional information regarding the GCamp experiment suggestion.

      The link between cx41.8 and Hif1α is tenuous. The authors should perform in situ hybridization for the hif1 genes…

      We thank reviewer 1 for making this point. To clarify this, we do not expect hif1/2α expression to be affected in this mutant. Mitochondrial ROS has been shown to stabilise Hif1/2α at the protein level, not the mRNA level. Our data, and that of others (Harris et al, Blood, 2013), suggest that in the absence of mitochondrial ROS, prolyl hydroxylases are not inhibited by mitochondrial ROS, and they target Hif1/2α for ubiquitination and subsequent destruction in a Vhl dependent manner (as shown in Fig. 4D).

      To clarify this in the manuscript, we have adjusted the text in three places (including in the abstract) to clarify that Hif1/2α is stabilised at the protein level, as shown below. We believe these changes have made this important point more understandable for the reader:

      1. “… Mitochondrial-derived reactive oxygen species (ROS) have been shown to stabilise the hypoxia-inducible factor 1/2a (Hif1/2a) proteins, allowing them..”
      2. “Recent research has demonstrated that hypoxia and mitochondrial ROS are required for the stabilisation of the transcription factors Hif1/2a at the protein level”
      3. “… as mitochondrial ROS generation may eventually reach the threshold required to sufficiently stabilise the Hif1/2a proteins for downstream”

        Reviewer #2

      Importantly, it appears also that all over the WISH quantifications, the reader cannot appreciate the accuracy of the categories High/Medium/Low, which is not at all developed in the Methods section (paragraph Image processing and WISH phenotypic analyses).

      We have developed the Methods section (paragraph Image processing and WISH phenotypic analyses), which was highlighted as a concern by this reviewer, in order to detail exactly how we performed our image analysis and statistical analyses using this approach. We believe this will satisfy the concerns reviewer 2 has regarding this and appreciate that they have a point that this was indeed underdeveloped in the original submission.

      Finally, there is a confusion in the quantification regarding the number of HSPCs (see the beginning of the second paragraph of Results 'The HSPC specification defect in cx41.8tq/tq mutants is due to a delay in Gata2b expression') and the % of embryos falling into the 3 categories High/Medium/Low FigS2, cmyb 48hpf. The authors use this argument (based on the WISH cmyb signals) to infer that the deficit in the cx41.8tq/tq mutant is not due to controlling HSPC number (no difference in cmyb between WT and mutant) but rather upstream, at the level of the hemogenic endothelium, which is not a thorough argument at that point.

      We thank reviewer 2 for pointing this out to us and agree that the wording we used is a little confusing. We have therefore added to the first sentence of the second paragraph in the results section “'The HSPC specification defect in cx41.8tq/tq mutants is due to a delay in gata2b expression” which now reads:

      “Hence, since HSPC specification is initially reduced, but then recovers in cx41.8tq/tq embryos, we suspected a delay in the formation of the haemogenic endothelium in these mutants. To test this hypothesis…”

      We believe this change to the manuscript will satisfy the reviewers concern by making this section more logical for the reader.

      The authors should take care of the fact that at 16hpf, it is an overstatement to speak of an aorta when the cord is starting to lumenize at around 18hpf, Jin et al Development 2005 (see Main text referring to Fig3).

      We thank the reviewer for this clarification. We have changed the relevant text to state “vascular cord” instead of “aorta” and have mentioned that it begins to lumenize around 18hpf for clarification. We have also added the suggested reference.

      Reviewer #3

      As Gata2 has been shown to be a positive autoregulator of itself in mice (Nozawa 2009, Katsumura 2016) and might do so in zebrafish (Dobrzycki 2020), so could gata2b recover itself, in a dose-dependent manner, without the Hif-Nocth1 axis once enough of it is expressed?

      We thank reviewer 3 for this suggestion. We believe that our data show that Cx41.8 is required for mitochondrial ROS production, which stabilises Hif1/2α and switches on downstream gata2b via Notch1a/b (which will be added, see previous section). As such, we believe that the Hif1/2α/Notch1a/b axis is required, at least for the initial induction of gata2b expression. However, reviewer 3 makes a very interesting point regarding the potential for gata2b to positively autoregulate itself, which may of course occur once gata2b expression has been induced by the Cx41.8-mitoROS-Hif1/2α-Notch1a/b-gata2b pathway. We thank the reviewer again for this interesting proposition and have added this suggestion into our discussion in the following paragraph:

      “GATA2 has been shown to positively autoregulate its own expression in mice (Nozawa et al, Genes to Cells, 2009; Katsumura et al, Cell Reports 2016), and Gata2b may also act in this way in zebrafish (Dobrzycki et al, Commun Biol, 2020). Therefore, it is interesting to speculate that once gata2b expression has been induced by the Cx-mitoROS-Hif1/2α-Notch1a/b-gata2b pathway, it may also further induce its own expression, which would make the induction of the haematopoietic transcriptional program more robust”

      Is Hif1/2a expression affected in the mutant? Is it expressed normally but then degraded faster due to the absence of mitochondrial ROS or is it less Hif1/2a expressed overall?

      We thank reviewer 3 for this question, which is similar to a point made by reviewer 1. To clarify, we do not expect hif1/2α expression to be affected in this mutant. Mitochondrial ROS has been shown to stabilise Hif1/2α at the protein level, not the mRNA level. Our data, and that of others (Harris et al, Blood, 2013), suggest that in the absence of mitochondrial ROS, prolyl hydroxylases are not inhibited by mitochondrial ROS, and they target Hif1/2α for ubiquitination and subsequent destruction in a Vhl dependent manner (as shown in Fig. 4D).

      To clarify this in the manuscript, we have adjusted the text in three places (including in the abstract) to clarify that Hif1/2α is stabilised at the protein level, as shown below. We believe these changes have made this important point more understandable for the reader:

      1. “… Mitochondrial-derived reactive oxygen species (ROS) have been shown to stabilise the hypoxia-inducible factor 1/2a (Hif1/2a) proteins, allowing them..”
      2. “Recent research has demonstrated that hypoxia and mitochondrial ROS are required for the stabilisation of the transcription factors Hif1/2a at the protein level”
      3. “… as mitochondrial ROS generation may eventually reach the threshold required to sufficiently stabilise the Hif1/2a proteins for downstream”

        Does MO-mediated knockdown of vhl in the wildtype and mutant (page 7and Fig. ) result in more HSPCs, following the increase in gata2b expression from WT baseline? Does that high expression persist, or does it drop?

      This is an important question. We had already clarified this in the case of cx41.8tq/tq, since we showed that the vhl MO results in more HSPCs (as determined by runx1 expression) at 28 hpf (Supp. Fig. 8A) but we have now added data for the same marker at the same timepoint for WT embryos (Supp. Fig. 8B).

      Although the vhl MO results in an increase in runx1 signal in WT embryos, since the majority of WT embryos injected with the control MO already have “high” runx1 WISH signal at 28 hpf, the difference between injected and control MO injected WT embryos is not significant (Supp. Fig. 8B), as can be expected. This is now explained in the manuscript following the relevant data addition.

      4. Description of analyses that authors prefer not to carry out

      Reviewer #1

      One major missing component is experimental data that distinguish the gap junction/plasma membrane- related and the mitochondrial membrane-related functions of Cx41.8. This is critical, as the role of Connexins in the mitochondria remains poorly understood (and Connexin 43 is the best understood one). Thus, it is a big claim by the authors that Cx41.8 primarily acts through the mitochondria and not the gap junctions. Suggested experiment: The authors should generate a fluorophore-tagged Cx41.8 - under a ubiquitous (ubb or actin) or HSPC-/hemogenic endothelium-specific (gata2b) promoter to monitor the protein localization of Cx41.8. Providing data on whether Cx41.8 protein indeed localizes to the mitochondria is important to support their claim.

      We thank the reviewer for this suggestion, which we agree would be a nice experimental approach to try to investigate whether Cx41.8 does indeed localise to the mitochondria in zebrafish endothelial cells.

      However, EGFP fused full-length cx41.8 has previously been generated and was reported to be nonfunctional, and it was suggested that the amount of localised Cx41.8 is also too small to detect using this approach (Watanabe et al, Pigment Cell Melanoma Res, 2012; Usui et al, BBA Advances, 2021). An EGFP tagged CT-truncated Cx41.8 construct has also been generated and shown to rescue the cx41.8t1/t1 mutant (Usui et al, BBA Advances, 2021), but EGFP expression again could not be detected using this construct in zebrafish.

      As such, since efforts to carry out such an approach have failed in previous attempts and since it has already been demonstrated that CX40 (orthologous to cx41.8) localises to the mitochondria of endothelial cells (Guo et al, Am J Physiol Cell Physiol, 2017), we believe that confirmation of Cx41.8 localisation to the mitochondria in vivo in zebrafish endothelial cells will be very difficult and too time-consuming in the context of this manuscript.

      Related to the above point, the authors should test whether the gap junction function of Cx41.8 is intact in the cx41.8tq/tq mutants by assessing calcium waves in the GCamp transgenic line.

      We agree with the reviewer that this would be a very elegant approach in order to analyse whether Cx41.8 channel function is affected in cx41.8tq/tq mutants. However, we feel that this experiment is definitely beyond the scope of this manuscript. Furthermore, carrying out this experiment would require the acquisition of the GCamp line as well as multiple crosses with the cx41.8tq/tq line which, together, we envisage would take at least 9 months before the experiments can be performed, as so this experiment would also be too time consuming for this manuscript. Finally, we believe there is already strong published evidence that the cx41.8tq/tq mutant results in disrupted channel function (Watanabe et al, EMBO Rep, 2006), as already cited in our manuscript. However, since then, we have also now found additional published in vivo evidence that cx41.8tq/tq channel function is reduced, which is now also cited in the new version of the manuscript.

      The authors might also want to consider performing transcriptomic analysis (bulk RNA sequencing) from purified HSCs in wild types and cx41.8 mutants and assess the downstream pathways affected by the loss of this gene.

      Although this is an interesting proposition, we consider this suggestion to be out of the scope of this manuscript, especially since our model involves changes in gene expression upstream of HSPC induction, and, expression of the key genes thought to be affected (notch1a/b and gata2b) can be checked using a much more cost and time efficient approach, by qPCR, which we will do, as discussed above.

      Are the authors sure of their statement on budding HSPCs when the GFP signal pointed by arrows could in majority be hemogenic cells? (which would be in favor of their hypothesis on Cx41.8 being involved rather in hemogenic endothelium/HSPC specification).

      Since cmyb is a marker of HSPCs and not of the haemogenic endothelium as demonstrated in numerous publications (North et al, Nature, 2007; Bertrand et al, Development, 2008; Bertrand et al, Nature, 2010 and others). Hence, we are confident that this transgene is marking nascent HSPCs and not the haemogenic endothelium.

      As mentioned by the authors in the Discussion, the other connexin Cx43 (Ref 36, Jiang et al 2010) is playing a significant role in HSPC specification in the zebrafish and is expressed in zebrafish arterial cells at 24 hpf. Hence there may be some functional redundancy between Cx43 and Cx48.1, as supported by previous work from the authors showing that a null mutant of Cx48.1 does not exhibit any phenotype in HSPC specification (Ref12, Cacialli et al 2021). This may be problematic for the experiments using drug treatments in the present work, because they are not selective for the different connexins (ex: anti-oxydants (NAC), connexin blockers (heptanol, CBX)), thus blurring interpretations on the specific function of Cx48.1 versus the ones exerted by Cx43 (this should be also valid for the vhl MO treatments).

      This comment is strengthened by the fact that the authors do not systematically address, for both WT and mutant embryos (Fig3 E, F; FigS6; FigS8), if expression levels with drugs/H2O2/MO are different for the 2 conditions (if relatively equal, it would indeed indicate that these drugs/conditions possibly act on another connexin, which would help the authors in their analyses and interpretations).

      We thank the reviewer for these comments and we agree with their concerns regarding the possibility of other Connexins being affected by our experiments using drug treatments. However, we do not rule this out in our manuscript and actually discuss it as being a very realistic prospect, as written about in the discussion section.

      Sadly, to the best of our knowledge, no selective Cx41.8 inhibitors have been described for use in zebrafish, otherwise we would of course have used this. Hence, this was the reason for our choice of compounds, many of which we also used in our previous publication (Cacialli et al, Nature Commun, 2021).

      The haemogenic endothelium/HSPC phenotype in cx41.8tq/tq embryos confirms that this connexin plays a role in HSPC specification, whilst we believe disentangling which other connexins are also involved in this process will be interesting to look into in other future studies but is beyond the scope of this one – we believe that together, the data presented in our manuscript, along with the revisions we plan to carry out, will be convincing to demonstrate a role for Cx41.8 in the mechanism we describe.

      The authors may try to rescue the wt phenotype by expressing, in the Cx48.1tq/tq mutants, the mRNA encoding for the wt protein.

      Although we appreciate this suggestion, we do not believe this experiment will add much in terms of value to the conclusions of our manuscript and, as such, we believe this suggestion is surplus to requirements for this manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We have thoroughly revised the manuscript, taking into account all comments from all four reviewers. We have added new data (Supplemental Figure 2 and Supplemental Figure 4) in response to these comments.

      Reviewer 1

      The assessment of data reproducibility is currently uncertain due to the absence of replication and statistical analysis in the dataset. It is essential to provide explicit information regarding sample sizes or replicates for all data and figures, data should be presented as mean +/- SD/SEM, and the interpretation of results should be grounded in rigorous statistical analysis. The lack of experimental replicates and statistical analysis in most of the figures presented raises major concerns regarding the validity of the result.

      We have now added error bars for the graphs in Figure 3D, E, F, G, H; Figure 4 D, F, G, H, I, J; Figure 5 B, C, D, E, F, G; and Figure 6B, C, D. All GTPase assays have repeated three times. The mean ± S.D. (n = 3) is plotted for each condition. For high-speed pelleting assays, all assays have been conducted three times, and a representative assay is shown.

      Why was only one of the MiD proteins, specifically MiD49, studied, while MiD51 was not includedin the investigation?

      This is an excellent point. In our previous work (doi:10.1101/2023.07.31.551267), we found that MiD49 and MiD51 were strikingly similar in their abilities to activate Drp1 after their own activation with fatty acyl-CoA. We feel that the demonstration here with MiD49 suggests that a similar effect would occur with MiD51. Due to time constraints for the lead author, preparing more MiD51 protein was out of the scope of what could be done. We now add a line in the Discussion that results for MiD51 may be different.

      The author suggestion of Drp1 phosphorylation, based on the mobility of protein observed in SDS-PAGE gel (fig 4A, 5A, 6A), is not a sufficiently valid assessment. While western blot analysis is a valid method to assess Drp1 phosphorylation, it is essential to include replicates for semi-quantitation and demonstrate the reproducibility of the results. Moreover, it is recommended to incorporate Western blot analyses to provide additional support for the findings presented in Figures 5 and 6.

      • We agree with the reviewer that additional information on the phosphorylation state of these proteins should be provided. We now include phospho-proteomic analysis for Erk2 phosphorylation of WT Drp1 and Drp1-S600D (Supplemental Table 1), showing that S579 is by far the predominant phosphorylation site. For WT Drp1, three lines of evidence now suggest efficient Erk2 phosphorylation of S579:
      • Western blot using anti-phosphoS579
      • Phosphoproteomic analysis
      • Gel shift

      For the Drp1-S600D phosphorylation, we have phosphoproteomic and gel shift analysis. For isoform 6, we regrettably only have gel shift. However, given the fact that the effect of Erk2 treatment on actin-stimulated GTPase activity mimics what we found for WT-Drp1 and for Drp1-phosphoS579/S600D, we think it is highly likely that the equivalent phosphorylation (S629 in this case) has been affected.

      Data on phosphorylated peptides with replicates experiments should be presented.

      We now present these data, which have been significantly expanded since the initial submission (new Supplemental Table 1). While non-phophorylated S579 is still detected in both the WT and S600D phosphorylation reactions, the phosphorylated peptide is 2.2 and 2.3-fold more abundant, respectively. Our conclusion is that Erk2 efficiently phosphorylates S579, although stoichiometric phosphorylation was not obtained here. We have added statements in the relevant sections of the Result, and in the Methods. We have also added Supplemental Table 1 to show the spectral counts obtained from phospho-proteomic analysis, and have deposited the raw data files with the PRIDE consortium (access information in the Methods).

      Please provide additional context or specific details about the GFP-tagged Drp1 protein, such as the protein site where GFP was attached, as well as whether this tag could potentially impact the Drp1 GTPase activity and oligomerization. Figure 7C and D appear to suggest an increase in the GTPase activity of the GFP-Drp1 protein.

      We have now added these details to the Methods section, and have also added the complete amino acid sequence for the final purified construct in Supplemental Figure 4. We have also added that a previous study (PMID: 32901052) found that inclusion of GFP strongly inhibited Drp1 GTPase activity. We do not observe this effect here or in a previous study (PMID: 27559132), and provide possible reasons for this difference in the Methods. The reviewer points out that the activity of GFP-Drp1 appears higher than that of un-tagged Drp1 (comparing 7C with 7D). We find that the GTPase activity of Drp1 alone varies between 1 and 2 uM/min/uM protein depending on the preparation. This variation occurs for both untagged and GFP-tagged Drp1. This difference in basal activity from prep-to-prep might relate to differences between protein preparations, or exact amount of time required to freeze the aliquots of purified protein (we freeze small aliquots ( An optional experiment that would significantly enhance the biological relevance of the findings presented in the current study is to assess the morphology of mitochondria in cells expressing the phospho-mimetic mutant Drp1 proteins. This experiment would provide valuable insights into the functional consequences of Drp1 S579 and S600 phosphorylation on mitochondrial structure and dynamics.

      We fully agree that these would be valuable experiments. The issue is that a large number of experiments using phospho-mimetic mutants in cells have already been conducted, with varying results (Taguchi et al., 2007; Qi et al., 2011; Yu et al., 2011; Strack et al., 2013; Kashatus et al., 2015; Serasinghe et al., 2015; Xu et al., 2016; Brand et al., 2018; Han et al., 2020, Chang and Blackstone, 2007; Cribbs and Strack, 2007; Cereghetti et al., 2008; Wikstrom et al., 2013, Han et al., 2008,Wang et al., 2012 Jhun BS, Sheu, 2018, J Physiol). To conduct more targeted tests examining specific forms of Drp1 activation in cells (for example, through Mff, MiD proteins, actin, or cardiolipin) will require extensive work that is outside the scope here. Our feeling is that S579 phosphorylation is likely to recruit another molecule (probably a protein) that has an activating effect. We tried to test one possibility (NME3, mentioned in the Discussion) but failed to produce useable NME3 protein for these tests and, given time constraints for the lead author, could not address this further.

      Provide reference for method on actin polymerization.

      We have now added a reference in the ‘Actin preparation for biochemical assays’ section of the Methods (PMID 16472659).

      Rectify the error in referencing figure 3 panels within the figure legends of Supplemental Fig S1.

      Thank you, we have changed this.

      The inclusion of full length isoform 6 is commendable. However, there is no mentioned of isoform6 in the method section.

      Thank you for pointing this out. We have added description of the construct and referenced our previous paper that used it.

      Since papers deposited in bioRxiv have not undergone peer review, reference #7 should not becited as references in scholarly work.

      Reference 7 has so far been reviewed by a peer-review journal. e are addressing reviewers’ concerns and will re-submit soon. We do not know how to rectify the issue of referencing this work, because it describes an extensive amount of groundwork for the MiD proteins. Our hope is that this work will be in press by the time the work reviewed here is ready for publication.

      Please provide details about the calculation of GTPase activity and the distinctions between the specific GTPase activity and total GTPase activity shown in figure 8D-F.

      We now describe these calculations in the “GTPase assay” section of the Methods.

      Reviewer 2

      Overall, the experiments described here are carried out with rigor and the conclusions drawn are of significance to understanding how phosphorylation regulates Drp1 functions.

      Thank you for these kind comments!

      Phosphorylation of both the serine residues appears to elicit a common effect in that they inhibitDrp1's stimulated GTPase activity. This would suggest that phosphorylation affects Drp1's self-assembly as tightly packed helical scaffolds. Instead of sedimentation analysis, an EM analysis of helical scaffolds on cardiolipin-containing membrane nanotubes or in the presence of soluble adaptors causing Drp1 to form filaments would provide a direct readout for defects in self-assembly.

      This is an excellent point, and we would love to conduct this work. Given our current EM infrastructure and expertise, these experiments would take extensive time for us to do. We do have a collaborator who could carry these out, but feel that the time it would take even for them to do this correctly is beyond that which we have (the lead author is transitioning to their next career phase). We have added the point that further EM studies of this type are necessary to test the effect on Drp1 assembly more directly.

      I am not sure of the rationale for experiments reported in Fig. 7 and 8. If the idea was to test if hetero oligomerization with WT Drp1 rescues defects associated with phosphorylated Drp1 then this could be stated explicitly in the manuscript. GFP-Drp1 is used as a WT mimic but a previous report (PMID: 30531964) indicates that this construct is severely defective in stimulated GTPase assays, much like the K38A mutant. But the rationale of using these constructs is not quite apparent. Is the intention to test if defects seen in the phospho-mimetic mutants of Drp1 can be rescued by the presence of a 'seed' of WT Drp1. If so, then this could be stated explicitly in the manuscript. But regardless, I am not quite sure what this data set achieves in terms of addressing mechanism.

      We apologize for not being clearer in our explanation of these experiments. Our goal was to test the effects of partial Drp1 phosphorylation on overall Drp1 activity, which likely mimics more accurately the cellular situation (wherein only a portion of the Drp1 population is likely to be phosphorylated even upon kinase activation). We now discuss these experiments in a clearer manner. For the GFP-Drp1, we do not observe the effect on GTPase activity shown in that previous manuscript by another laboratory, either here or in previous studies (eg, PMID: 27559132). In the Methods, we now provide a discussion of these differences and possible reasons for them, as well as providing the complete amino acid sequence of our GFP-fusion construct in Supplemental Figure 4.

      Finally, it would have been nice to see if the phospho-mimetic mutants of Drp1 produce the same effects on mitochondrial structure as those reported earlier. Reanalyzing their effects in a cellular assay becomes important because it would consolidate this work for the readers to evaluate the'true' effects of phosphorylation on Drp1 functions. If the phospho-mimetic mutants fare in a manner like those previously reported, then it signifies that stimulation in GTPase activity is not a readout that directly correlates with Drp1 functions. If not, then the results presented here would establish a comprehensive analysis of in vitro biochemical activities and in vivo functions of the phospho-mimetic mutants.

      We fully agree that these would be valuable experiments. The issue is that a large number of experiments using phospho-mimetic mutants in cells have already been conducted, with varying results (Taguchi et al., 2007; Qi et al., 2011; Yu et al., 2011; Strack et al., 2013; Kashatus et al., 2015; Serasinghe et al., 2015; Xu et al., 2016; Brand et al., 2018; Han et al., 2020, Chang and Blackstone, 2007; Cribbs and Strack, 2007; Cereghetti et al., 2008; Wikstrom et al., 2013, Han et al., 2008,Wang et al., 2012 Jhun BS, Sheu, 2018, J Physiol). To conduct more targeted tests examining specific forms of Drp1 activation in cells (for example, through Mff, MiD proteins, actin, or cardiolipin) will require extensive work that is outside the scope here. Our feeling is that S579 phosphorylation is likely to recruit another molecule (probably a protein) that has an activating effect. We tried to test one possibility (NME3, mentioned in the Discussion) but failed to produce useable NME3 protein for these tests and, given time constraints for the lead author, could not address this further.

      Previous work reports that the effect of actin on the GTPase activity of Drp1 is biphasic but the binding to actin is not. This is quite confounding, and the authors could perhaps explain why this is the case.

      The reviewer makes an excellent point, which we now explain further in the manuscript. We have also discussed this in doi:10.1101/2023.07.31.551267 (see Figure 2D in that work). Our interpretation is that it is the density of Drp1 bound to the actin that provides the activation, by positioning the GTPase domains in close proximity. As the amount of actin increases, the Drp1 becomes more dispersed on the filaments, and activation decreases. We observe the same effect for MiD49 and MiD51 oligomers (see the above-mentioned reference).

      The manuscript cites PMID: 23798729 for expression analysis of slice variants but PMID:29853636 provides a more compressive analysis. The authors could cite this work.

      Thank you for this reference. We were unaware of it, but are very glad to know of it now. We now include this reference. In particular, in the legend to Figure 1C (table of splice variants), we now state that this table is for human Drp1, and that additional splice variants have been identified for murine Drp1 (PMID 29853636).

      Reviewer 3

      The splendid results of the manuscript willbe interesting to the researchers in the related fields.

      Thank you for this nice comment!

      The manuscript provided well-organized biochemistry results for comparisons between phosphorylation of Drp1 S579 and S600. It is the reviewer's comments that the authors may include experiments that manipulate Drp1 phosphorylation at different amino acids in cells. Such experiments will provide strong support for this manuscript.

      • We fully agree that these would be valuable experiments. The issue is that a large number of experiments using phospho-mimetic mutants in cells have already been conducted (Taguchi et al., 2007; Qi et al., 2011; Yu et al., 2011; Strack et al., 2013; Kashatus et al., 2015; Serasinghe et al., 2015; Xu et al., 2016; Brand et al., 2018; Han et al., 2020, Chang and Blackstone, 2007; Cribbs and Strack, 2007; Cereghetti et al., 2008; Wikstrom et al., 2013, Han et al., 2008,Wang et al., 2012 Jhun BS, Sheu, 2018, J Physiol). To conduct more targeted tests examining specific forms of Drp1 activation in cells (for example, through Mff, MiD proteins, actin, or cardiolipin) will require extensive work that is outside the scope here. Our feeling is that S579 phosphorylation is likely to recruit another molecule (probably a protein) that has an activating effect. We tried to test one possibility (NME3, mentioned in the Discussion) but failed to produce useable NME3 protein for these tests and, given time constraints for the lead author, could not address this further.

        The authors discussed the known factors that involved in Drp1 activation, such as its receptors, actin and cardiolipin. Recent JCB paper (J. Cell Biol. 2023 Vol. 222 No. 10 e202303147) indicates that intermembrane space protein Mdi1/Atg44 may play a role in coordinating mitochondria fission with Dnm1 (Drp1 in yeast cells). It will be valuable if the manuscript could also discuss the potential factor.

      • Thank you for this comment. We now include Mdi1/Atg44 as a possible factor that might be influenced by Drp1 phosphorylation. Two points we would like to make here are: there doesn’t seem to be an Mdi1 homologue in mammals, so the equivalent factor must be identified before testing; and Mdi1 is an inter-membrane space protein, so any effect of Drp1 phosphorylation on coordinated functioning with Mdi1 would either require an intermediary factor or exposure of the IMS in some way.

        Keywords cannot represent the manuscript. It is recommended that the authors use other words to for the current manuscript.

      We have removed K38A from this list. The other key words are not mentioned in the Abstract.

      Reviewer 4

      The authors showed that the binding of Drp1 to actin depends on salt concentrations (Fig. 2Band C). In the presence of 65 mM NaCl, the phosphomimetic mutants showed decreased binding to actin. The GTPase assay is performed with 65 mM KCl, in which actin did not stimulate GTP hydrolysis of the phosphomimetic mutants. In contrast, with 140 mM NaCl, the S579D Drp1 exhibits slightly enhanced actin binding compared to WT Drp1. Could the authors assess the actin-activated GTPase activity in the 140 mM salt condition to test if actin activates GTP hydrolysis ofS579D Drp1 more potently than WT?

      This is a good point by the reviewer. However, with limited time for the first author, we chose to focus on the reviewer’s other comments (see below).

      Both phosphomimetic mutants show reduced activation for GTP hydrolysis in the presence of cardiolipin, Mff, and MiD49. Is this because the mutants have a lower affinity for these interactors? Or do they bind with the same affinity but experience diminished activation? The data suggests the latter scenario, potentially resulting from decreased oligomerization properties. Can the authors provide more insights on this, for example, by measuring their interaction in the presence of GMP- PCP, which fully induces oligomerization in all three forms of Drp1?

      • These are interesting ideas, and we conducted experiments similar to what the reviewer described: co-sedimentation experiments with combinations of Drp1 and Mff under three nucleotide states: no nucleotide, GMP-PCP, and GTP. We used Mff for these experiments because we have this protein in abundance, and have previously characterized this construct as a trimer in PMID 34347505. We use a high concentration of Mff (50 mM) versus Drp1 (1.3 mM) because of the relatively low affinity between the two proteins (shown in PMID 34347505). We find the following:
      • In the absence of nucleotide, Mff does not cause an increase in pelletable Drp1 for any of the Drp1 constructs.
      • In the GTP state, the presence of Mff greatly increases the amount of Drp1 in the pellet, suggestive of increased Drp1 oligomerization. This effect occurs for all Drp1 constructs (WT, S579D and S600D mutants), but the amounts of both Drp1 and Mff in the pellets are about 50% less for both mutants than for the WT construct. This result suggests a decrease in oligomerization for the mutants, but not necessarily a decrease in Mff binding.

      I'm curious what happens to oligomerization if GTP is added instead of nonhydrolyzable GMP-PCP (Fig. 1D). Does this lead to higher oligomerization in the mutants compared to WT since the mutants seem to have lower GTPase activity? This might explain why phosphorylation increases mitochondrial localization of Drp1 in cells seen in some studies.

      This is another interesting thought, and we describe the new experiments we conducted in the response to the previous comment. Essentially, while GTP does cause a slight increase in pelletable Drp1, the increase is somewhat similar for all constructs. As described in the last comment, the addition of Mff causes a substantial increase in pelletable Drp1 for both WT and the mutants. This result suggests that, while the basal oligomeric state of Drp1 (in the absence of nucleotide) is reduced for the mutants (our original analytical ultracentrifugation data), the mutants appear to be capable of responding to GTP and Mff in a similar manner to WT. We acknowledge that the assay used here (pelleting) lacks the precision required to draw detailed conclusions on oligomerization or interaction with Mff, and we try to reflect this in our discussion of the data. We do feel, however, that these data are useful to report, in guiding future study.

      Please include the number of experimental repeats and error bars where applicable.

      We have now added number of experimental repeats and error bars for the graphs in Figure 3D, E, F, G, H; Figure 4 D, F, G, H, I, J; Figure 5 B, C, D, E, F, G; and Figure 6B, C, D.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Replies to Reviewers

      Thank you for inviting us to submit our revised manuscript titled, “Diffusive mediator feedbacks control the health-to-disease transition of skin inflammation.” We appreciate the time and effort the editor and each of the reviewers have dedicated to providing insightful feedback on ways to strengthen our manuscript. The revisions in the main text in response to the detailed comments are highlighted in red and were proofread by professional English editors. We hope that our revision and responses address all the concerns raised by the reviewer, and we look forward to hearing from you regarding this submission.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript provides a model of interacting populations of pro- and anti-inflammatory mediators to explain spatial patterns associated with various inflammatory conditions. The work is robust and articulated well, and is certainly scientifically relevant.

      Authors: Thank you for your positive evaluation and many insightful comments on our manuscript. We have incorporated your feedback, and hope that our revisions satisfy all the comments.

      Minor amendments:

      Personally, I feel that the model should be reported prior to the results, as the choice of model is likely to have great significance on the observations. It would be preferable for the reader to have a clear picture of the governing equations in their mind as they digest the results.

      Au: Following this reviewer's suggestion, we have relocated the Method section including the model description to be written prior to the Result section (p.9-14 lines 152-232; revised manuscript).

      The literature review is largely relatively thorough; however, I think it is important that the previous works of Joanne Dunster (University of Reading) and collaborators are included, as these are very closely related to this work. In particular, the authors should note the following two papers, which take a spatial approach:

      • Bayani, A., Dunster, J.L., Crofts, J.J. et al. Mechanisms and Points of Control in the Spread of Inflammation: A Mathematical Investigation. Bull Math Biol 82, 45 (2020). https://doi.org/10.1007/s11538-020-00709-y

      • Bayani A, Dunster JL, Crofts JJ, Nelson MR (2020) Spatial considerations in the resolution of inflammation: Elucidating leukocyte interactions via an experimentally-calibrated agent-based model. PLoS Comput Biol 16(11): e1008413. https://doi.org/10.1371/journal.pcbi.1008413

      Au: We have incorporated this comment by adding the two suggested papers to the relevant sentences in the literature review (p.6 line 118-119; revised manuscript) as follows: “Previous reaction-diffusion models, including chemotactic cells, have reproduced the resolution of inflammation in the lung [Bayani et al. 2020a, Bayani et al. 2020b]”

      One key point that should be mentioned in the discussion is that the model neglects any immune cells (e.g. neutrophils, macrophages) which contribute greatly to the inflammatory condition. Since these cells are motile, and also can contribute both pro- and anti-inflammatory effects, they are likely to influence spatial patterns significantly. It is not necessarily a problem that these aren't included in the model, but I feel that it is important that their omission be discussed in the manuscript.

      Au: We have now discussed the immune cells in the “Future implications” as the reviewer suggested (p.29 line 477-483; revised manuscript) as follows: “This is probably because the present model focuses on the non-chemotactic cells (e.g., including keratinocytes), whereas chemotactic cells (e.g., macrophages and neutrophils) also contribute to skin inflammation [Zhang and An 2007, Coondoo 2011]. Moreover, the present model focuses on the innate immune response, whereas the skin initiates an acquired immune response in the persistence of the innate immune response. Therefore, incorporating the chemotactic cells and acquired immune response into the model will reproduce the end of the expansion.”

      Reviewer #1 (Significance (Required)):

      The manuscript advances our current understanding of spatially spreading inflammation and corresponding patterns, but needs to be contextualized against existing literature as described above.

      This manuscript will appeal to theoreticians (Mathematicians) and clinicians/experimentalists alike.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors propose a minimal mechanistic mathematical model able to reproduce qualitatively different spatial patterns observed in healthy and disease epidermis. The starting point is a systematic review of medical images of different dermatological conditions, which they classify and successfully capture according to the spatial patterns. It is an interesting piece of work, but I consider that it will gain significance if the theoretical results are compared again with the clinical data. Specifically, the authors show a very interesting map between parameter regions and different spatial patterns; this result should be compared back to clinical data, to confirm that specific changes in spatial patterns indeed result from predicted changes in a specific parameter (e.g., due to a genetic condition that affects a feedback strength).

      Authors: We thank you for providing your valuable comments on our manuscript.

      Following your suggestion about the comparison of theoretical results with the clinical data, we have predicted which specific parameters including the feedback strength cause specific transitions of spatial patterns in the respective diseases. The discussion was added on p.26 lines 415-438 in the revised manuscript as follows: “The parameter-to-patterning correspondence (Fig. 4A, B, S2 Fig., and S3 Fig.) allows us to infer the pathogenesis mechanism in various diseases exhibiting each of diverse expanding patterns (seen in Table 2). For instance, psoriasis exhibits all five expanding patterns (Table 2) and increased levels of pro-inflammatory mediator (TNF-α) [Ringham et al. 2019], which is consistent with our theoretical results. The elevated pro-inflammatory mediator in psoriatic skin has been suggested to be caused by genetic mutations affecting regulatory feedback [Valeyev et al. 2010]. Considering these previous studies, our model predicts a psoriasis progression where fading pattern transits to arcuate, polycyclic, gyrate, annular, and circular pattern where increase in the TNF-α level is possibly due to mutation-induced alteration in the feedback parameters, e.g., increase of the production of pro-inflammatory mediator qa (Fig. 4A). Alternatively, Lyme disease exhibits circular, annular, and polycyclic patterns (Table 2). A clinical report showed that patients in Missouri predominantly exhibit an annular pattern without prognostic symptoms, while those in New York tend to exhibit a circular pattern with prognostic symptoms following the same treatment [Wormser et al. 2005]. Considering our theoretical result that the overproduction of pro-inflammatory mediators and the depletion of anti-inflammatory mediators leads to the annular and circular pattern, respectively (Fig 4, 5A, and B), altered levels of pro-inflammatory and anti-inflammatory mediators may significantly impact the development and prognosis of Lyme disease in Missouri and New York patients, respectively.

      These qualitative parameter estimations will be verified in the future through parameter quantification in each diseased skin exhibiting any expanding patterns. By incorporating this quantitative correspondence between patterns and parameters measured in each disease into the present model, we would develop each disease-specific model with a quantitative predictability of how much change of the skin parameters transit from healthy to diseased pattern or vice versa. Therefore, this study provides the first step to controlling the healthy-to-diseased transition of skin inflammation via diffusive mediator feedback.”

      Another shortcoming of this work is that some of the conclusions are rushed: the parameter-to-spatial patterns analysis would strongly benefit from adding a quantitative to the qualitative description, e.g., mapping how changes in a given parameter value results in gradual changes in fading speed. Along the same line, the stability analysis for the different fading pattens was performed only for selected parameter values, it is not clear how variations in parameter values affect the sizes of the basins of attraction of the different steady states; we want to make sure that the parameter values were not cherry-picked. Further, given that the authors show bistability for some parameter values, then the dependency on initial conditions on the final spatial pattern should be more extensively investigated.

      Au: We have incorporated these comments by adding a quantitative description including new results and future research strategies following each of the three constructive suggestions raised by the reviewer.

      First, regarding “the fading speed” the reviewer suggested, fading speed is affected by changes in parameters involved in mediator production. In particular, the speed is reduced by an increase in the production parameters of pro-inflammatory mediators (pa, qa) and a decrease in those of anti-inflammatory mediators (pi, qi) (Fig.2. C and D). Moreover, “the size of the basins” the reviewer pointed out corresponds to the distance between ST (Threshold) and SH (Healthy state) in the cases with excitability. The distance between ST and SH becomes closer indicating the health state being less stable when pro-inflammatory mediators (pa, qa) increase or anti-inflammatory mediators (pi, qi) decrease from the healthy fading pattern. The imbalance of the mediator production transits the fast fading pattern with a small trajectory into a slow fading pattern with a larger trajectory. As imbalance goes on, the expanding pattern appears in the order of arcuate, polycyclic, and gyrate (Fig. 5). In cases with bistability, the size of basins corresponds to the relative distance ST to SH and ST to SI (Inflamed state). The circular and annular patterns appear when the distance between ST and SH is closer. On the other hand, when the distance between ST and SI was closer, the inflamed area shrank rather than expanded. The shrinking pattern appeared by reducing the production of pro-inflammatory mediators (pa, qa) or increasing the production of anti-inflammatory mediators (pi, qi) under conditions of stability. We have added a new figure and described this finding in Results (p.24 lines 384-388; revised manuscript) as follows: “As a result, we found that the distance between the healthy state (SH) and the threshold state (ST, a closer unstable steady state to SH) was the smallest in the gyrate pattern and increased in the order of polycyclic, arcuate, slow fading pattern, and fast fading pattern (Fig. 5C–F, S4 Fig. B and C). The fast fading pattern showed a smaller trajectory (green curve in S4 Fig. B and C) of change in the mediator concentration than the slow fading pattern.”

      Second, regarding “the dependency on initial conditions”, we have further added a new result (p.24 line 374-382; revised manuscript) as follows: “The number of stable states determines the pattern regardless of the initial condition in the spatial distribution of mediator concentration. Similar to the fading pattern (Fig. 2), the arcuate, polycyclic, and gyrate patterns with the excitability appeared reproducibly, independently of the initial conditions due to a single stable state SH (Fig. 5C-F). Even in circular and annular patterns with bistability where the threshold ST was closer to the inflamed state SI than the healthy state SH (Fig. 5A-B), the final spatial pattern was dominated by the SI independently of the initial condition. On the contrary, when ST was closer to the SH than the SI, the inflamed area shrank rather than fading (S4 Fig. A). These results are general outcomes of the traveling wave of bistable systems [Murray 2002], and consistent with the previous theoretical studies on inflammations [Sudo and Fujimoto 2022, Volpert 2009]. ”

      Finally, we have added “a quantitative to the qualitative description as a future research strategy (p.27 line 432-438; revised manuscript) as follows: “These qualitative parameter estimations will be verified in the future through parameter quantification in each diseased skin exhibiting any expanding patterns. By incorporating this quantitative correspondence between patterns and parameters measured in each disease into the present model, we would develop each disease-specific model with a quantitative predictability of how much change of the skin parameters transit from healthy to diseased pattern or vice versa. Therefore, this study provides the first step to controlling the healthy-to-diseased transition of skin inflammation via diffusive mediator feedback.”

      For reproducibility it is essential that the authors add a much more detailed description of the methods, including the software tools / numerical analysis tools used. Making the code publicly available would also be very beneficial to ensure the reproducibility of the results.

      Au: Following your suggestion, we have added a description of the methods, including the simulation code, to the “Methods” (p.13 lines 231-232; revised manuscript) as follows: “A simulation code written in C language is available from GitHub: https://github.com/MakiSudo/Erythema-Patterns/blob/main/AInondim.c.”

      In conclusion, the work is very interesting and worth publishing, but requires (a) to come back to the clinical data for validation of model predictions, (b) a more thorough and quantitative investigation of the effects of parameter variations on model behaviors, (c) a more rigorous and systematic presentation of the methods, (d) carefully explaining how the proposed model is similar / differs to the classical activator -inhibitor model proposed by Turing, and (e) discussing / showing if the fading patterns result from a turning instability.

      Au: For (a) “validation of model predictions,” (b) “model behaviors,” and (c) “a more rigorous and systematic presentation of the methods,” we have reflected your suggestions in the revised manuscript as described above.

      Regarding (d) and (e), we have added an explanation of “how the proposed model is similar/differs to the classical activator–inhibitor model” and “if the fading patterns result from Turing instability” after the model construction in Methods (p.11-12 line 210-216; revised manuscript) as follows: “Reaction terms of this model are similar to the classical activator-inhibitor model proposed by Turing [Turing 1952], which includes the negative feedback of the activator through the inhibitor and the positive feedback of the activator. These reaction terms potentially result in Turing instability. However, the present model setting does not show Turing instability. The reason is that Turing instability requires a large difference between the diffusion coefficients of the activator and inhibitor [Murray 2002], whereas these coefficients in the present model were set to be equal based on molecular findings that these molecular weights are close in proximity [Coondoo 2011]. ”

      **Referees cross-commenting**

      I agree with the comments from Reviewer #1.

      Reviewer #2 (Significance (Required)):

      The work aims to bridge mathematical modelling to dermatological practice, which is much needed to enable the use of theoretical and computational tools to clinical decision-making. While some mathematical models of skin inflammation have been proposed in the past (refer to papers from the RJ Tanaka group in systems dermatology), most of these do not consider explicitly the spatial component, which is crucial for modelling the clinically visible spatial patterns. Potentially interested audience includes biomathematicians, systems biologists, systems dermatologists, and, if the validation of the model predictions is achieved (as suggested above), also dermatologists.

      I am a systems biologists working on multi-scale mechanistic mathematical modelling of epithelial tissue diseases. The work I just reviewed falls exactly within my area of expertise.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their time and both thoughtful and constructive comments. Their specific points are addressed below but a general point that we would like to comment on is that in the original version it appears we did not make our model clear enough. The dogma in the field is that Rab7 is recruited to endosomes from a cytosolic pool via exchange with Rab5 (mediated by Mon1/Ccz1). Our work instead indicates that the majority of Rab7 is delivered to Dictyostelium phagosomes by fusion with other endocytic compartments. It was not our intention to imply there was no canonical recruitment of Rab7 from a cytosolic pool, and indeed we provide data to show this happens at a low level and discuss this in the manuscript. Nonetheless, we clearly over-stated the exclusivity of Rab7 recruitment to phagosomes via fusion at several points and our original model cartoon, and have tried to better explain or more nuanced model with multiple routes for Rab7 acquisition in this revision, including a completely redrawn model figure (Fig. 7).

      2. Description of the planned revisions

      Reviewer 1:

      1. The observation that macropinosomes undergo retrograde fusion with newly formed phagosomes to facilitate phagosome maturation is an interesting notion that challenges the traditional model. However, not all phagocytes exhibit a high level of macropinocytosis, and axenic Dictyostelium cells used in the study may be an exception. Thus, it remains unclear whether fusion with macropinosomes is universally required for phagosome maturation. WT Dictyostelium cells or axenic cells cultured under SorMC/Ka condition (Paschke et al., PLoS One, 2018) exhibit significantly reduced macropinocytosis. The authors could examine whether the accumulation of Rab7 and V-ATPase on large-sized phagosomes is delayed in these cells. These experiments may help broaden the applicability of the authors’ finding.

      As our previous work (Buckley et al. PloS pathogens 2019) demonstrated that bacterially-grown PIKfyve mutants are also defective in bacterial killing and growth it is highly likely that cells also are defective in V-ATPase and Rab7 acquisition. However, we agree that formally testing this will further support our conclusions and improve the paper and should be quite straightforward.

      We will therefore co-express GFP-V-ATPase and RFP-Rab7 in both Ax2 and non-axenic cells grown on bacteria and repeat our analysis of recruitment to phagosomes – with the caveat that non-axenic cells do not phagocytose large particles such as yeast (Bloomfield et al. eLife 2015), so the imaging and quantification will be more challenging in this case.

      PIKfyve seems to play a specific role in the maturation of phagosomes but not macropinosomes. The differences may be driven by signaling from phagocytic receptors, as the author suggested. Alternatively, the large size of the yeast-containing phagosomes may require additional steps for efficient lysosomal delivery. The authors should consider examining whether PIKfyve is needed for the delivery of Rab7 and V-ATPase to phagosomes of comparable size to regular macropinosomes, such as those containing K. aerogenes or small beads. In addition, whether the process also involves fusion between phagosomes and macropinosomes should be verified.

      Whilst it is possible that large size of yeast-containing phagosomes requires additional mechanisms to process them, our previous data demonstrate that PIKfyve is also required to kill much smaller bacteria such as Klebsiella and Legionella (Buckley et al. PloS pathogens 2019). Furthermore, in this paper we also showed that loss of PIKfyve disrupts phagosomal proteolysis using 3um beads, and showed that V-ATPase recruitment was reduced on purified phagosomes containing 1um beads. We therefore find consistent defects on phagosomes of different size, with different cargos. Nonetheless, the experiments above, observing V-ATPase and Rab7 in cells grown on bacteria should directly address this point.

      As suggested, we will also perform a dextran pulse-chase prior to addition of bacteria to test if we can observe macropinocytic delivery to bacteria-containing phagosomes - perhaps using E. coli as their elongated shape may help phagosome visualisation.

      In the previous study from the authors' group (Buckley et al., PLoS Pathog, 2019), it was shown that the accumulation of V-ATPase on phagosomes begins immediately after internalization in both PIKfyve mutant and WT, although V-ATPase accumulation reaches only half of the levels seen in WT. This partial accumulation of V-ATPase differs from the almost complete absence of Rab7 recruitment found in this study, which raises the question of whether there exists yet another population of fusogenic vesicles that are positive for V-ATPase but negative for Rab7. This could be checked by simultaneously examining the dynamics of V-ATPase and Rab7 during yeast phagocytosis in the PIKfyve mutant.

      We agree with the referee that there are multiple pools of V-ATPase, and we show that there is both a very early PIKfyve-independent recruitment of both V-ATPase and Rab7 as well as a later and more substantial pool delivered in a PIKfyve-dependent manner. It is clear that V-ATPase and Rab7 do not always co-localise however - the clearest example being on the contractile vacuole, which has lots of V-ATPase but no Rab7 (the large bright magenta structure in Fig 2G.).

      We suspect that the dramatically reduced, but not completely absent levels of both V-ATPase and Rab7 recruitment in the absence of PIKfyve are similar, but the challenges with imaging these very small low levels means we cannot formally exclude that there is a pool of V-ATPase vesicles that lack Rab7 which fuse to very early phagosomes. Nonetheless, as we will already be looking at V-ATPase and Rab7 in PIKfyve KO's in the experiments above will also attempt to unequivocally differentiate a pool of V-ATPase positive/Rab7 negative vesicles fusing with phagosomes.

      Reviewer 2:

      (1) The authors show that deletion of PIKfyve results "in an almost complete block in Rab7 delivery to phagosomes" (page 17) indicating that the delivery of Rab7 depends on fusion with Rab7-positive structures. This would suggest that the Rab7-GEF Mon1-Ccz1 is not localized to the membrane of the phagosomes. Could the authors test for the presence of Mon1-Ccz1 in either fluorescence microscopy experiments or on purified phagosomes to exclude the possibility of a "canonical" Rab7 recruitment by its GEF? If the GEF is found on phagosomal membranes it would indicate that a Rab-transition from Rab5 to Rab7 occurs on the phagosome during maturation, but on a low level. The later fusion event might be a homotypic fusion of two Rab7-positive compartments. The observed fusion events could still deliver the bulk of Rab7 and other endolysosomal proteins to the phagosome. If the Rab7-GEF is not found on phagosomes how do the authors envision that the organelle keeps its identity? Is it solely dependent on PI(3,5)P2? What is the fate of the Rab7-negative phagosome in ∆PIKfyve cells if Rab7 is not delivered to the membrane, is there degradation happening over longer periods of time?

      This is an excellent suggestion, for which we thank the reviewer. Mon1 and Ccz1 are highly conserved, with clear Dictyostelium orthologues that have never been studied. Our model is that there is a small proportion of Rab7 driven by this canonical pathway so would expect Ccz1/Mon1 to coincide with loss of Rab5 and be unaffected by loss of PIKfyve - although subsequent Rab7 delivery would be lost. This is easy to test by cloning and expressing GFP-fusions of both Ccz1 and Mon1 and would be highly informative. Note we do not exclude canonical Rab7 recruitment in our model (see discussion), our data just indicate this has a minor contribution.

      Reviewer 3:

      The focus is on their manuscript is loading of Rab7 on phagosomes, but there's no indication about Rab7 activation (GTP-loading). Would the RILP-C33 probe work in Dictyostelium? If not possible, the activation state of Rab7 should still be discussed. Despite Rab7 on other organelles in PIKfyve-inhibited cells, is this active or not?

      The GTP-loading status of Rab7 is a good question, although the general dogma is that membrane-localised Rabs are active. We will try the RILP-C33 probe in Dictystelium as suggested, but as these cells lack an endogenous RILP orthologue there is a high chance it will not work. Sadly, reliable tools to asses active Rab status are a general limitation for the field, so if the RILP-C33 probe does not work we will add this caveat to the discussion.

      The authors need to better address the confusing kinetics of early Rab7 recruitment, followed by SnxA (Fig. 4G, same for VatM - Fig. 4I ) - which is counterintuitive if PIKfyve activity is required to recruit Rab7. How do the authors explain this? Are phagosomes prevented from acquiring Rab7 in PIKfyve deficient cells because of a defect on phagosomes or the endo-lysosomes loaded with Rab7 (but not active).

      We believe this again relates to the over-simplification of our model. Our data indicate both PIKfyve dependent and independent Rab7 recruitment. In contrast to the abrupt recruitment of SnxA at ~120 seconds (Vines et al. JCB 2023), both Rab7 and VatM accumulate gradually over time starting from almost immediately following engulfment (Buckley et al. 2019, and Figure 2F). Our data indicate that the first stage of this is PIKfyve independent, and is responsible for ~10% of the total Rab7/V-ATPase accumulation by both the imaging in this paper, and Western blot for V-ATPase on purified phagosomes in Buckley et al. PLoS pathogens 2019. The arrival of some Rab7/V-ATPase prior to PI(3,5)P2 therefore supports our model where there are multiple sources of Rab7.

      As the reviewer quite rightly points out, interpretation of the defects observed in the absence of PIKfyve becomes complex and we cannot completely differentiate between a defect on the phagosome, or the Rab7 compartments that fuse with them (or indeed both). In fact, we already note that small Rab7 compartments that we observe in wild-type cells are much more sparse in PIKfyve mutants. Therefore whilst the requirement for PI(3,5)P2 in the clustering and fusion of macropinosomes with phagosomes is clear, additional effects on the PI(3,5)P2-independent Rab7 compartments cannot be excluded.

      The experiments above using the RILP-C33 active Rab7 biosensor as well as observation of the Mon1/Ccz complex should further clarify this, but we will also add further discussion of these points.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer 1:

      Minor comments.

      1. It is unclear how the experiment in Figure 3G was conducted. If microscopic analysis was involved, the corresponding images should be included.

      We apologise that we overlooked this and have now added a full description in the materials and methods (P8 L16-21). Fluorescence measurements were performed using a plate reader, so there are no images.

      Page 11-Line 2, the sentence "there was no obvious clustering around the nascent phagosome (Figure 2D)." It is Figure 2E, not Figure 2D.

      Corrected.

      There is an inconsistency regarding the description of fluorescent fusion proteins. For example, both GFP (RFP)-2xFyve and 2xFyve-GFP (RFP), as well as GFP-Rab5 and Rab5-GFP, were used. Typically, placing GFP (or RFP) before a gene suggests N-terminal tagging, while placing it after the gene implies C-terminal tagging. The authors should clarify the position of the fluorescent tag and ensure consistency in their descriptions.

      We apologise for this oversight, and have been through and corrected all fusion protein references accordingly.

      One of the videos was not referred in the manuscript or described in the Video legends. This video seems to correspond to Figure 5A, albeit with a different pseudo-color scheme.

      This has been corrected. Video 7 does correspond to Fig 5A, and we have corrected the colour scheme to match and added references to the video in the text and figure legend.

      Reviewer 2:

      (2) In their abstract, the authors state that they "...delineate multiple subpopulations of Rab7-positive endosomes that fuse sequentially with phagosomes" (page 2, line 14,15). However, the data provides only evidence for V-ATPase or PI(3,5) P2-containing structures and the authors conclude to my understanding that macropinosomes are the main source for vesicular structures fusing with phagosomes. I would ask the authors to please be clear on the identity of the "Rab7-donor"-structures throughout the manuscript. Saying that they delineate multiple subpopulations of endosomes seems to be overstated.

      We identify that macropinosomes are one source (subpopulation) of Rab7/PI(3,5)P2 vesicles but our data clearly show that they are the only source of Rab7 - there is clearly an additional early Rab positive / PI(3,5)P2-negative subpopulation of vesicles that cluster and fuse too at earlier stages. For example, in Figure 4F we co-express Rab7a/SnxA and show that whilst all the SnxA vesicles also contain Rab7 (and dextran), there is a clear separate population of small and early-fusing population of Rab7-containing vesicles that do not possess PI(3,5)P2. This is further validated in Figure 5B and C. To our mind this clearly demonstrates and defines different Rab7 endosomal populations, although we do not yet know the origins of the initial Rab7-positive/PI(3,5)P2 negative population - as discussed in our response to their point (3) below.

      Minor points:

      (1) The sentence "...which both deactivates and dissociates Rab5, and recruits and activates Rab7 on endosomes" is at least problematic as it suggests that Mon1-Ccz1 directly drives GTP-hydrolysis of Rab5 and dissociates it from the membrane. Indeed, Mon1-Ccz1 is shown to interfere with the positive feedback loop of the Rab5-GEF by interacting with Rabex (Poteryaev et al., 2010), so a rather indirect effect of Mon1-Ccz1. A GAP and the GDI are needed for Rab5 deactivation and dissociation from the membrane. How both are involved in the endosomal Rab-conversion is not clarified.

      We have changed the text to better represent this complexity (P4 L4-6)

      (2) Signals of RFP-labeled proteins are difficult to interpret throughout the experiments. What are the structures that show a strong accumulation of red signal in Fig. 1A,B, Fig 2G and Fig4A (20sec.) If these are fluorescently labeled proteins it would suggest that most of the proteins cluster/accumulate in the cell. Can the authors provide better images?

      We appreciate that some of these reporters with multiple localisations can be difficult to interpret. This is major challenge for these sort of studies and main reason we use the large and easily-identified yeast containing phagosomes for quantification. In Fig. 1 the large structure is the large peri-nuclear cluster of Rab5 previously reported (Tu et al. JCB 2022). In Fig. 2G the bright structure is the recruitment of V-ATPase on the CV. Both these large structures easily distinguished from the phagosomal pool we are interested in. Whilst we would love to provide better images, this is simply not possible - both these other structures are unavoidable and we are already using some of the best microscopy methods available. We have however clarified the additional localisations seen in these images in the revised figure legends.

      (3) On page 11 the authors state "...macropinosomes in ∆PIKfyve cells still appeared much larger. Quantification of their size and fluorescence intensity demonstrated that although macropinosomes started off the same size,...". This statement is not reflected in the data depicted in Fig. 3A,B. The size of the single labeled macropinosome appears to be larger in wildtype than in ∆PIKfyve cells from the beginning on. However, the quantification in Fig 3F is clear. So, are these bad examples in 3A,B, are they swapped or is this due to the additional expression of GFP-Rab7A? Could you please comment on the effect that the (over-)expression of GFP-tagged Rab-GTPases might have on the observations described in this paper in the discussion part?

      As you can see from the error bars in Figure 3F, macropinosomes are extremely variable in size - ranging from ~0.2-5 microns in size in axenic Dicytostelium. The image in Figure 3B is therefore indicative of this heterogeneity, rather than being a "bad example". This is why we designed the experiment to quantify several hundred vesicles in order to make any conclusions - as well as doing it in the absence of any GFP-fusion expression.

      Although we have not noticed any issues (enlarged vesicles are also clear in GFP-Rab7 expressing cells in Figure 1B), we do of course accept that GFP-Rab7 expression itself may have some detrimental effects on maturation and this is why we quantified macropinosome size in untransformed cells. We have clarified this in the results section (P12 L28).

      (4) In Fig. 6E it is hard to distinguish if the dextran is accumulating inside the phagosome. I would suggest conducting a 3D reconstruction of these images to allow judging if macropinosomes fused with the phagosomes or if they cluster around the neck of the phagosome.

      This would be nice, but not possible as these images are from single confocal sections, rather than a complete high-resolution Z-stack. We have however added an enlargement of both Figure 6D and E which we feel now more clearly shows the presence of dextran within the bounding PI(3)P membrane of the phagosome.

      (5) In the discussion, the authors state that the small pool of "PIKfyve-independent Rab7" is "insufficient to for subsequent fusion with other Rab7A-positive compartments, further Rab7 enrichment, and lysosomal fusion." What is the rationale for this conclusion? Is it shown how many Rabs are necessary to induce a tethering and fusion event? It would be good to revise this part of the discussion also in respect of the first major point of my comments above.

      Our data show that in the absence of PIKfyve, phagosomes still remove Rab5 and gain a small pool of Rab7 but progress no further. This is consistent with some block in the HOPS-mediated homotypic fusion of Rab7 compartments. However, we accept that this is not necessarily due to simply not having enough Rab's so have rephrased the discussion accordingly.

      (6) The intention of the paragraph about phagosomal ion channels is for this reviewer somehow out of context. It is not clear to me how the authors relate this to their findings. It would be could to bring this into a broader context.

      __ __We mention ion channels in the background as they represent the main class of PI(3,5)P2 effectors known so far. We feel this is important background context, even if our studies do not directly relate to this.

      Reviewer 3:

      Their disclosure and use of statistics is incomplete and/or inconsistent, and potentially wrong in some cases. For example, the authors disclose the number biological repeats in a few experiments (Fig. 3C, F) but not in the majority. Instead, they state the number of phagosomes without indicating biological repeats (eg. Fig. 2 and others). So, it is not possible to know if their data are reproducible. Despite not indicating independent experiments in some cases, they speak of SEM, which applies to mean of means from biological repeats. In other cases, none of this is disclosed (eg Fig. 3G). Often there is no indication of what statistical test was done OR if a statistical test was done (eg. Fig. 3G, Fig. 4, etc). I would recommend the authors review the excellent resource paper published in JCB on SuperPlots to better follow statistical expectations. This is essential to improve reproducibility and confidence in their observations.

      We apologise if this was unclear for the referee, but we have tried to be clear in each case. The confusion likely lies in the definition of a biological repeat, which depends on the type of experiment. For quantification of phagocytic events over time, we feel it reasonable to take each individual event (each from an individual organism) as a biological repeat. This is because events are relatively rare and taken from multiple different movies, and it is not technically possible to film both mutants and controls simultaneously. In all these sort of experiments (e.g. Figure 2) we have shown standard deviation, which indicates the reproducibility between phagocytic events. We have clarified that these events are from movies obtained on at least 3 independent days in the methods.

      In other cases, such as Figure 3C and F and Figures 5-6, we are able to take measurements across multiple cells simultaneously at each timepoint. It is therefore appropriate to average over multiple independent experimental repeats rather than individual cells. We have therefore used SEM in our analysis, and both the number of individual cells and independent repeats are stated on the graphs and legend. This was incomplete in a few cases but has now been clarified in all cases.

      Regarding statistical tests, which ones were used now been clarified in each figure legend. Note that in Fig 3G, we do not apply any test as both lines essentially overlap and it is clear there would not be any convincing differences. In Figure 4, the graphs all compare co-expression of different reporters rather than different mutants or conditions and are from single events. We therefore feel statistical tests are unnecessary and inappropriate. Comparison of the same reporters between strains averaged across multiple events, with statistical analysis is shown in Fig 2 instead. All these points have now been added to the statistics section of the methods (P9 L1-6)

      Minor Comments

      It is interesting that 2FYVE-GFP stays on phagosomes for 50 min or more - this is distinct from macrophages. Please comment. Have the authors tried other PI(3)P probes to see if the same (PX-GFP).

      We have not used other probes but we have no reason to believe 2xFYVE does not behave as predicted as it is the same probe used for most macrophage studies (FYVE domain from human Hrs), and gets removed from macropinosomes exactly as expected. We did not originally comment in this manuscript but PI3P dynamics are even more interesting as our previous data indicate that latex-bead containing phagosomes lose PI3P after 10 minutes (Buckley et al 2019, Figure 4F-G) This indicates phagosome maturation can be regulated by the cargo (under further investigation). Importantly however, both bead and yeast-containing phagosomes have comparable defects in the absence of PIKfyve. This is more fully discussed in our previous paper (Vines et al. JCB 2023) where we characterise PI(3)P and PI(3,5)P2 dynamics in more detail.

      Fig. 7 model: the macropinosome in the diagram seems like a dead end as depicted - is there any arrow or change that could be added to show that it doesn't just sit there in the middle? Also, the light green on yellow hurts the eyes!

      We apologise, there was actually supposed to be an arrow there but it was lost somewhere in the drafting process. The whole figure has now been updated to more clearly describe our full and more complex model.

      Fig. 3F, could be converted to volume assuming macropinosomes are spheres.

      This is true, however as these images are taken from single planes we cannot know where in the sphere the slices are and therefore what the maximum diameter would be. We therefore prefer to keep it as area so as not to confuse and over-interpret the data.

      Pg. 10, line 10 - Vps34 is Class III PI3K, not Class II.

      Corrected.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      Reviewer 2:

      (3) ("OPTIONAL") Optionally, the authors could also try to clarify these structures' identity by including further colocalization studies with additional early and late endosomal marker proteins. Are they for example positive for early or late endosomal markers like EEA1, ESCRT or Retromer? How about organelle-specific SNAREs? This would give further insights into the character of the "Rab7-donor" structures and would allow to clarify if multiple subpopulations are contributing to phagosome maturation in a sequential order as stated in the abstract. As I am not an expert on Dictyostellium I can`t estimate the effort that would go into such an experimental setup. However, since the time scale of the events in the cell is nicely worked out in this study, these colocalization studies would not need to be conducted as live-cell microscopy experiments.

      This is a sensible suggestion that would in theory help define these populations. However many of these markers are poorly defined with respect to phagosomes and/or Dictyostelium. Dictyostelium does not posses an EEA orthologue, but our data also indicate that these vesicles do not possess PI3P so cannot be canonical early endosomes. We have previously characterised WASH/retromer and whilst it is recruited to phagosomes at around the time of Rab5/7 transition Retromer appears to be recruited from the cytosol and drive recycling rather than being delivered on endosomes that fuse (see King et al. PNAS 2016). We have also previously looked at ESCRT (Lopez-Jimenez et al. PLoS Pathogens 2018) which also does not appear to have any recruitment to early phagosomes that would be consistent with a Rab7-sub-population. The SNAREs are yet to be studied in any detail, as they are often too divergent to assign a direct mammalian orthologue.

      Therefore, whilst this is a sensible suggestion, and something we would like to follow up in the future, this is not straight-forward and we feel outside the scope of the current study. We have however included additional discussion of this in the revised manuscript (P20 L21-26).

      Reviewer 3:

      Major Comments:

      1. Based on the current data, I am not entirely convinced that Rab7 is delivered mostly by fusion with other compartments. At least the data as provided cannot exclude other models. For example, Rab7-containing organelles that cluster with phagosomes may form contact sites that provide a local environment to load cytosolic Rab7. There's also a possibility that some of their Rab7 clusters are membrane sub-domains and not vesicles. Or perhaps, there is a first wave of cytosolic Rab7 recruitment, which then initiates fusion with Rab7 compartments, i.e., there is a two-phase Rab7 recruitment. While this last possibility is consistent with recruitment of Rab7 by fusion (the second phase), the authors present a model that is too simplistic and conclusive based on the data. The authors may be right, but they need to strengthen their evidence towards their claim. Maybe EM could help determine some of these issues. Perhaps better would be the use of FRAP, photo-activation, or optigenetics of Rab7. For example, if Rab7 is acquired on phagosomes after photobleaching clusters of Rab7, this would suggest a cytosolic Rab7 contribution, and if not, this would support their model. I recognize that these experiments are not necessarily trivial, but either the authors augment their data (as suggested or with other approaches) or significantly pare down their conclusions.

      We agree with the Referee that we cannot completely exclude other models, and as we talk about in the discussion, we do not wish to do so. We apologise if the role of fusion was over-stated but the model we propose is as the referee suggests: there is likely an early first wave of canonical Rab7 recruitment from the cytosol that is independent of PIKfyve before the majority of Rab7 is subsequently delivered by fusion in a PIKfyve-dependent manner. Our data indicate that the second wave is both quantitively and functionally more significant (see functional data in Buckley et al. 2019).

      We do however agree with the referee that we cannot formally exclude things such as contact-site mediated recruitment from the cytosol or sub-domains but not fusion however there is no data to support these either. In contrast, the hypothetical clustered Rab7 contacts/subdomains often (but not always) contain the transmembrane V-ATPase complex (Figure 2G) which must be delivered by fusion.

      However we do not wish to over-simplify our conclusions and as we state in the discussion, we do think there is probably a small amount of Rab7 recruited from the cytosol by the canonical pathway. We accept that our cartoon in Figure 7 is over-focussed on fusion so we have substantially revised this, as well as the discussion to give a more balanced and complex view.

      Regarding the proposed experiments, unfortunately, the imaging required to acquire these movies is already at the very limit of what is possible so we do not believe it would be technically feasible to employ methods such as FRAP and optogenetics on these relatively fast-moving phagosomes with the temporal resolution required. Furthermore, to differentiate recruitment from a cytosolic pool, every GFP-Rab7 cluster would need to be photobleached, which could not be reliably achieved.

      However, this point will be largely addressed by the suggestion of Reviewer 2 to look at the Mon1/Ccz complex. The presence or absence of this will give strong evidence for canonical Rab5/7 transition and Rab7 recruitment from the cytosol which would significantly clarify our model and define the two different mechanisms of Rab7 recruitment to phagosomes.

      Early macropinosomes fuse with early phagosomes more readily than 10-min old macropinosomes. Do 10-min old macropinosomes not fuse with older phagosomes? Is this not an issue of mismatched age?

      This is an interesting point that we have clarified in the text. We agree with reviewer that it appears the ages of the macropinosomes and phagosomes must match but our data indicate this only occurs when both parties possess PI(3,5)P2 as macropinosome fusions appears to happen in a single burst at about 240 seconds (Figure 6F) rather than as a continuous process. We also do not start to see any fusion of these older macropinosomes when the phagosomes get past the initial first 10 minutes of maturation (Figure 6G).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1) List of the detailed experiments we plan to perform (including aforementioned experiments):

      • Careful analysis of the daughter cell size by measuring the real volume.

      • Quantifications of PCM (pericentrin and γ-tubulin) proteins and Plk1 with respect to centrosome age in G2 and metaphase (for Plk1) cells.

      • Analysis of the amount of Plk1 of metaphase cells when cenexin protein is absent (siControl vs siCenexin), and measurements of Plk1 in WT-cenexin vs. cenexinS796A mutant to test if Cenexin controls a subpool of Plk1 at centrosomes.

      • Careful analysis of Ctrl and TPX2 depletion experiment data in 1:1 cells. We plan to repeat the experiment to confirm or infirm on the contribution of TPX2 in spindle asymmetry.

      • Measurement of the PCM volume/intensity in 2:2 and 1:1 metaphase cells, to highlight on the contribution of the daughter centrioles in recruiting PCM proteins.

      • Live cell imaging of 2:2 cells and measurements of different parameters; cortex-to-centrosome and spindle pole to metaphase plate (half-spindle (a)symmetry) distances.

      • Long-term live cell imaging of 2:2 cells to investigate whether the asymmetry in centrosome-age dependent daughter cell size also affects the duration of the ensuing cell cycle. While we have carried out such long-term movies in the past, we are aware that they can be challenging due to high cell mobility over longer time courses.

      • Investigation of the microtubule nucleation capacity under different conditions of PCM protein depletion (depletion of Cdk5rap2 and/or pericentrin).

      • Analysis of the effect of the over-expression of PCM protein (Cdk5Rap2) on the (a)symmetry of the mitotic spindle size


      2) detailed answers (in green) to the reviewers’ comments:


      __Reviewer #1: __

      __(Major points) __

      1. The discovery of differences in half-spindle size during symmetric division is intriguing. However, the methodology for quantification of the data remains unclear. Key questions, such as how the center of the metaphase plate is determined from the image data, the definition of exact pole position when centrioles are located at spindle poles, the objective determination of daughter cell diameter and width from the image data, and the referential position of the cortex, need more detailed explanation in the manuscript. Additionally, it's crucial to elucidate the specific index used to quantify differences from the image data, especially when dealing with data that only varies by a few percent. Providing clarity on these aspects and, in some cases, re-quantifying the data should be necessary.

      We have already included clearer explanations in the method parts and results part about our methodology and will include a supplementary figure on how precisely we defined and measured the half-spindle sizes, as well as the index used for the asymmetry (using a methodology that we previously used in Dudka et al., Nature Comm., 2018). In addition, we will use a second method to measure the real daughter cell volume.

      The mechanism behind the difference in half-spindle size, related to the subdistal appendage (SDA), raises questions, especially considering that SDA is believed to disassemble during mitosis. Exploring whether differences in the localization of PCM components and half-spindle size result from disparities in Plk1 and PCM loading during G2/early mitosis, prior to SDA disassembly, necessitates experimental verification.

      As suggested by the reviewer we will quantify the amounts of PCM proteins on the old and young centrosome in G2 cells (and therefore prior SDA reorganization). This will also allow us to test whether the asymmetry depends on the SDA themselves, or the corresponding SDA proteins, which still accumulate specifically on the oldest centrosomes during mitosis

      For investigating the mechanism of half-spindle size asymmetry, many perturbation experiments employ knock-down techniques. To directly address the cause of asymmetry, it might be valuable to artificially localize Plk1 and PCM factors to one spindle pole using optogenetic tools or similar approaches and then quantify half-spindle and daughter cell sizes.

      We thank the reviewers for this suggestion, as it could indeed, be of great interest and provide a direct proof of principle. Unfortunately, based on our experience in establishing such a cell line we know that just the generation of such a light-manipulated stable cell line that contains markers for centrosomes and chromosomes or kinetochores takes 6-9 months, in the best-case scenario. This experiment is therefore not possible within a normal revision round (even if extended to 6 months).

      The asymmetry in Plk1 sub-population recruitment by SDA triggers the observed effects, but the evidence for this is relatively weak, given the small difference in spindle asymmetry. Quantifying the amount of Plk1 in its activated form, particularly in the context of SDA dismantling during metaphase, could strengthen this aspect of the study.

      While the commercial antibodies against the activated form of Plk1 (phospho-T210) work very well by immunoblotting, we have not been able to get it to work by immunofluorescence. We will nevertheless, test whether variation in the fixation methods can solve this issue. Alternatively, we will test to which extend depletion of Cenexin, or the presence of Cenexin WT vs the non-phosphorylatable Cenexin mutant affects the overall population of Plk1 on both spindle poles.

      While the focus on half-spindle size asymmetry during symmetric division is intriguing, it's important to address the broader physiological significance. The primary outcome of this asymmetry is differences in daughter cell size, which limits the broader significance of the study. Furthermore, the quantification method for daughter cell size warrants scrutiny and clarification.

      As mentioned above, we will use different method to measure and investigate daughter cell size (a)symmetry. Moreover, we will attempt with long-term live cell movies to test whether the variation in centrosome-age dependent daughter cell size also affects the duration of the ensuing cell cycle.

      (Minor points)

      1. Table 1 lists factors with asymmetric localization not analyzed in detail in this paper. It would be beneficial to discuss whether these factors play a role in spindle asymmetry, and the authors should address the completeness of the data in Table 1 in terms of selecting factors for analysis.

      We agree with this comment that other factors may participate in the regulation of spindle asymmetry. However, we performed this screening to identify key drivers of spindle (a)symmetry based on an investigation of the Pearson’s correlation coefficient and the value of slope.

      In addition, some of these proteins are known to control spindle size in acting in a same pathway (TPX2/Kif2A/Katanin) and (Pericentrin/CDK5RAP2/ϒ-tubulin). We will incorporate these points and the reasons for our selection in the discussion

      In Figure 1H, the impact of centriolin knock-out on the distribution of unaligned polar chromosomes is different from the effect of cenexin S796A in Figure 6H. This difference should be explained to provide clarity on the observed discrepancies.

      We will better explain this difference.

      In Figure 2A, there is no correlation data presented between daughter cell asymmetry and the presence or absence of cenexin signal. This relationship should be elucidated for a more comprehensive understanding.

      We will clarify this point. Specifically, we plotted the daughter cell symmetry index for 2:2 and 1:1 cells with respect to centrosome age. All the daughter cells display the presence of a cenexin signal at both grandmother and mother centrioles with a difference in fluorescence intensity that enables us to assign them to “old” vs “young centrosomes. We found a significant result indicating that there is a relationship between centrosome age and the formation of daughter cell with different sizes.

      In Figure 4G and H, the mean value of spindle asymmetry increases with siRNA treatment of Cdk5Rap2 or PCNT compared to the control. The possible interpretation of this finding should be discussed.

      This is an interesting observation that needs to be discussed in our revision.

      Figure 4K shows that the asymmetry of PCNT distribution is not eliminated by centriolin knock-down. This observation requires clarification and discussion.

      It has been shown that pericentrin is directly recruited by Plk1 at centriole (Soung et al., 2009). In addition, pericentrin has a PACT-domain that directly targets pericentrin to the centriole (Gillingham and Munro., 2000). Moreover, it has been demonstrated that the grandmother centriole is slightly longer than the mother one (Kong et al., 2020). Altogether, this suggests that the old and young centrosomes, based on this intrinsic property, may recruit different amount of pericentrin.

      We will add this explanation in the discussion.

      It appears that the difference in spindle asymmetry of the control group in Figure 5A is smaller than in other data. This discrepancy should be addressed. Additionally, the influence of TPX2 depletion on spindle formation, and any corresponding spindle staining data, should be included.

      This point will be discussed in the revised version of the manuscript.

      Claiming that the daughter centriole recruits PCM based on Figure 6A data alone may require additional supporting evidence. It is essential to investigate whether there is a clear PCM signal when the daughter centriole disengages in late mitosis and maintain consistency in the interpretation.

      As suggested by the reviewer 2, we will measure PCM volume/intensity in both 2:2 and 1:1 cells to demonstrate that daughter centrioles directly recruit PCM proteins.

      The lack of difference in TPX2 distribution in Figure 7E should be explained, along with a discussion of how this observation aligns with the spindle asymmetry data and any inconsistencies.

      We will discuss this point in the revised manuscript.

      The differing N numbers between samples in all the figures may affect the validity of comparisons. The authors should discuss whether it is necessary to have consistent N numbers in each experiment for more robust conclusions.

      Indeed, this is an important point that must be discussed.

      Reviewer #2____:

      Major comments:

      1) It is not completely clear how the authors determined whether a spindle was asymmetric or not. In the methods, they say that statistical tests are described in the legends. In Figure 1 legend they say: "Each condition was compared to a theoretical distribution centered at 0 (dashed line)". How did they generate this theoretical distribution?

      As explained under point 1 of reviewer 1, we will provide a more thorough explanation of our methodology and how we decide whether a spindle is symmetric or not. In brief, a perfectly symmetric spindle would yield an asymmetry index of 0, as there is no difference between the two half-spindle sizes.

      2) The authors claim that TPX2 depletion results in loss of spindle asymmetry in 1:1 cells, but the difference is very small (1.7% in control vs 1.3% in TPX2 depletion, Fig 5B) and the data is more variable in TPX2 depletion, which makes it less likely that a statistically significant difference from 0 would be found. Firstly, perhaps the authors could check the standard error of the mean, which provides a measure of how accurate the mean is with regard to N and variation. If a dataset is more spread (such as in TPX2 depletion) a higher N is required to attain the same accuracy in the mean value. This is normally not so important when directly comparing two datasets, but in this case the authors are comparing each dataset to 0. So, are the authors measuring enough cells in the TPX2 depletion to be sure that a 1.3% value is not significantly different from 0? Secondly, I don't understand why the control cells have such a low asymmetry index (1.7%), when previous data in the paper shows an asymmetry index of 4.1% (Fig 1D) and 3.4% (Fig 4E) in control 1:1 cells. This suggests that something about the way this experiment was carried out dampens the asymmetry, which could therefore lead the authors to conclude that TPX2 is more important than it really is.

      We agree with this comment, the mean of the control condition is smaller compared to others controls. As mentioned above, we will carefully look at the data (SD vs SEM) and in case add a new replicate to confirm or infirm the involvement of TPX2 in the formation of asymmetric spindles.

      3) The authors claim that daughter centrioles are associated with some Pericentrin and suggest that this may be why 2:2 centrosomes have less of an asymmetry than 1:1 centrosomes (Fig 6A). It is unclear whether the authors consider these daughter centrioles as being prematurely disengaged (they make reference to the fact that they previously showed how disengaged daughters recruit γ-tubulin, but it's unclear if this is related to their current observations). In Figure 6A, the Centrin spots look too far apart for engaged centrioles (~750nm). I appreciate that this may be the only way to dectect Pericentrin around the daughter at this resolution, but it may also force the authors to select cells where the centrioles have prematurely disengaged. For the asymmetry measurements, the authors presumably did not select cells where they could distinguish mother and daughter centrioles. One way to address this issue would be to compare PCM size at centrosomes in 2:2 cells with centrosomes in 1:1 cells. The expectation would be that centrosomes in 2:2 cells would have more PCM, due to the contribution of the daughter centrioles.

      We agree that on those high-resolution images the daughter centrioles seem to be far from the mother ones. The metaphase cells presented in this figure, are wild-type non-treated cells for which the daughter centrioles are engaged. Indeed, our own investigation of the centriole engagement status by expansion microscopy, indicates that over 98% of centriole pairs in metaphase RPE1 cells are engaged.

      Nevertheless, as suggested by the reviewer and to validate that daughter centrioles participate in this process, we will compare PCM size in 2:2 and 1:1 metaphase cells.

      4) The authors show that Plk1 recruitment by Cenexin (via S796 phosphorylation), which happens only at mother centrosomes, is important for asymmetry. Nevertheless, they show that Plk1 is symmetrically distributed between mother and daughter centrosomes (Table 1). This does not really fit, unless daughter centrosomes recruit more cenexin-independent Plk1 than mother centrosomes or if the cenexin-bound pool of Plk1 is only a minor fraction of total Plk1. If so, do the authors think that the Cenexin-bound pool of Plk1 is more potent than the rest of centrosomal Plk1?

      As indicated in point 4 of reviewer 1 we will test which proportion of the Plk1 pool at spindle poles depends on the presence of Cenexin, as we suspect that this Plk1 population is only a subpopulation.

      5) The circles drawn to measure cell size in Figures 2A,E and 7C do not look like a good representation of cell area (as the cells are not perfectly round). The authors use a formular for circle area with an approximation of the radius (based on mean length/width of an oval. It would be much better to use ImageJ to draw a freehand line around the perimeter of the cell and use the in-built tool to measure the area.

      As mentioned in point 1 of reviewer 1 we will use another method to measure daughter cell size.

      Minor comments:

      1) Asymmetry in centrosome size that correlates with centrosome age in apparently symmetrically dividing "cells" has been observed previously in Drosophila syncytial embryos (Conduit et al., 2010a, Curr. Bio.). I think this should be mentioned somewhere given the topic of the study.

      We thank the reviewer for this information. This paper will be discussed in the revised version.

      2) A full description of statistical tests and n numbers for each experiment should be provided in the methods, even if this duplicates information in the Figure legends.

      We will add this information in the method.

      OPTIONAL EXPERIMENTS:

      3) Given that chTOG is very important for microtubule nucleation, it seems strange that this protein was not analysed for a potential asymmetry.

      As suggested by the reviewers we will test for a potential chTOG asymmetry and its impact on spindle size asymmetry.

      4) Cooling-warming experiments could be done using higher concentration of formaldehyde, as it's likely that microtubule nucleation is not immediately halted when using 4% formaldehyde.

      The fixation solution was chilled at 4°C, which should halt any further depolymerization. We will specify this point in the Material and Methods section.

      Reviewer ____#____3:

      Major points:

      1) The evaluation of spindle and cell size asymmetry related to centrosome age only relies on fixed sample preparation. Cells should be followed by time-lapse microscopy as the metaphase plate position relative to the spindle poles and/or the cell cortex may fluctuate over time and as the observed differences remain in a very subtle range. This is an important possibility to consider for 1:1, 1:0 or 0:0 spindle pole configurations where centrosome integrity is impaired.

      We agree with the reviewer that this is a drawback of our approach, but the experiments the reviewer suggests is not possible for 1:0 or 0:0 or only in an approximate manner. Indeed, we do not have a centriole-independent spindle pole marker that would allow us to mark precisely the position of the spindle pole. In the past we used Sir-tubulin, which gave us an approximate position of the spindle poles, and which allowed to us monitor the spindle asymmetry over time of 1:0 cells (see Dudka et al., 2019), a point that we will discuss. Nevertheless, as suggested by the reviewer we will attempt to monitor these asymmetries in 2:2 and/or 1:1 cells expressing GFP-Centrin1 and GFP-CENPA (kinetochore marker) in WT conditions. Indeed, we cannot expand this approach to all the conditions, as the calculation of the spindle asymmetry index is based on a very high number of cells, and the monitoring of spindle asymmetry can only be achieved by selecting mitotic cells one-by-one and then monitoring them over a short period of them (Tan et al., eLife, 2015), which makes such an approach extremely time-consuming.

      2) Cell size asymmetry was evaluated based on cell area at the equator. Volumes will be a better indicator as daughter cell shapes can be different in telophase if they do not re-adhere at the same speed. This evaluation should also be confirmed with another readout, like the position of the cleavage furrow relative to the spindle poles in late anaphase, as again the observed differences are in a very subtle range.

      As indicated in the similar points of reviewer 1 and 2, we will improve our methodology to take this comment in account

      3) The authors propose that differential microtubule nucleation at the spindle poles underlies spindle size symmetry breaking without providing direct evidence. If the observed spindle symmetry in the 1:1 configuration after pericentrin, CDK5RAP2 or g-tubulin siRNA fuels this interpretation (Fig4C), the differential microtubule nucleation capacity at the spindle poles after microtubule-depolymerisation-repolymerisation assays was not evaluated in these conditions, as compared to the control situation.

      As suggested by the reviewer we will analyze the microtubule nucleation capacity after the downregulation of PCM proteins.

      4) If differential microtubule nucleation at the spindle poles is responsible for spindle asymmetry, overexpression of PCM proteins or g-tubulin should be sufficient for re-establishment of symmetric protein distribution, spindle and cell size symmetry in 2:2 or 1:1 configuration. The authors should evaluate whether this is the case or not.

      This is an interesting suggestion, which we will test, although overexpression of these proteins might also lead to other defects in the spindle, such as multipolar spindles.

      5) The authors describe that the cortex-centrosome distance is not changed according to centrosome age (Fig2C), but centrosome-metaphase plate distance is (Fig1D). These observations are difficult to reconcile if differential microtubule-nucleation capacity is at play. Again, time-lapse microscopy would enable to detect over time whether only metaphase plate position relative to spindle poles is changing or if spindle pole position relative to the cell cortex is also fluctuating.

      We plan to give a try to image WT 2:2 cells by time lapse microscopy and to measure several parameters such as half-spindle size, spindle (a)symmetry and the cortex to centrosome distance over time.

      Minor points:

      6) Main PCM and MT nucleation protein "depletion" do not appear to impact spindle assembly, but only spindle symmetry in 1:1 and 1:0 configurations (Fig4A and 4F-H). Can it be explained by the fact that their depletion is not always total (for pericentrin, Fig5F versus FigS2A or Fig7G)? Can they comment on this point?

      Spindles displaying abnormal centriole number at spindle poles (1:1 and 1:0) can still assemble bipolar spindle in absence of the main PCM proteins (Chinen et al., JCB, 2021, and Watanabe et al., JCB, 2020).

      In our study, the depletion of PCM protein is almost total (97% for pericentrin, 98% for Cdk5Rap2).

      7) If centrosome age dictates spindle and cell size asymmetry through differential MT-nucleation capacity at the spindle poles, how can this process be modulated? Indeed, centrosome age is common to all cell types, but cell size asymmetry is more or less pronounced. The authors should further discuss this point based on the literature.

      We will discuss this point in the discussion.

      __ Description of the revisions that we have already carried out in the revised manuscript__


      1. The discovery of differences in half-spindle size during symmetric division is intriguing. However, the methodology for quantification of the data remains unclear. Key questions, such as how the center of the metaphase plate is determined from the image data, the definition of exact pole position when centrioles are located at spindle poles, the objective determination of daughter cell diameter and width from the image data, and the referential position of the cortex, need more detailed explanation in the manuscript. Additionally, it's crucial to elucidate the specific index used to quantify differences from the image data, especially when dealing with data that only varies by a few percent. Providing clarity on these aspects and, in some cases, re-quantifying the data should be necessary.

      We have already included clearer explanations in the method parts and results part about our methodology and will include a supplementary figure on how precisely we defined and measured the half-spindle sizes, as well as the index used for the asymmetry (using a methodology that we previously used in Dudka et al., Nature Comm., 2018). In addition, we will use a second method to measure the real daughter cell volume.


      __ Description of the experiments that we prefer not to carry out:__


      Point 3 of reviewer 1 : For investigating the mechanism of half-spindle size asymmetry, many perturbation experiments employ knock-down techniques. To directly address the cause of asymmetry, it might be valuable to artificially localize Plk1 and PCM factors to one spindle pole using optogenetic tools or similar approaches and then quantify half-spindle and daughter cell sizes.

      We thank the reviewers for this suggestion, as it could indeed, be of great interest and provide a direct proof of principle. Unfortunately, based on our experience in establishing such a cell line we know that just the generation of such a light-manipulated stable cell line that contains markers for centrosomes and chromosomes or kinetochores takes 6-9 months, in the best-case scenario. This experiment is therefore not possible within a normal revision round (even if extended to 6 months).


    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02172

      Corresponding author(s): Philip Elks

      [The “revision plan” should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      • *

      In this paper we report the discovery that a member of the tribbles pseudokinase family, TRIB1 is expressed in human monocytes and is upregulated after stimulation with mycobacterial antigen in a human patient challenge model, the first direct link between immune cell Tribbles expression and innate immune response to infection. We then interrogated the mechanisms of Tribbles roles in TB using a human disease relevant whole-organism in vivo zebrafish model of TB. We show that specifically TRIB1 modulation can tip the battle between host and pathogen enhancing the innate immune response and reducing bacterial burden. We then uncover the molecular mechanisms responsible for the host protective effect of TRIB1, with enhanced antimicrobial reactive nitrogen species and il-1beta, via cooperation with Cop1 E3 ubiquitin ligase. Our findings demonstrate, for the first time, TRIB1 as a host moderator of antimicrobial mechanisms, whose manipulation is of benefit to the host during mycobacterial infection and as such, a potential novel therapeutic target against TB infection.

      We thank the reviewers for their positive appraisal of our work and for their helpful suggestions that will improve our manuscript. In particular we would like to highlight the reviewer’s comments on the gap/need for a new zebrafish in vivo model to understand the roles of tribbles in infection that can “be extrapolated into the human system”, and how they feel these findings will be of broad interest and “significance to cross section of the research community” attracting “interest from readers in the fields of infection, immunity, hematology and animal models” alongside “researchers studying all aspects of Tribbles pseudokinase function, especially researchers seeking models to test small molecule agonists and antagonists.”

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      Reviewer 1

      The major weakness of the manuscript is that the authors do not evaluate C/EBP transcription factors at all. It is rather surprising as they emphasize cooperation between Trib1 and Cop1 in the main title. C/EBP family proteins are key factors of Trib1-mediated modulation of granulocytes and monocytes. Also, slbo, a drosophila homolog of C/EBP, is a target of tribbles, indicating that the pathway is evolutionary conserved. I would request the following experiments and discussions.

      RESPONSE: We agree that possible C/EBP roles should be discussed in detail, and we will add a new discussion section on this.

      We stand by our data that the host protective mechanism of Trib1 acts requires Cop1, but we are not able to directly show a C/EBP mechanism within the scope of the current project due to a lack of tools/knowledge in the zebrafish on this (further points/comments below on this). It is important to note that we have not claimed a C/EBP mechanism in our manuscript, and we think it is possibly unlikely given that monocyte and granulocyte numbers are not altered after TRIB1 manipulation. Indeed, there are many other candidates other than C/EBP that COP1 could be acting through. Some examples include MAPK (Niespolo et al., Front Immunol, 2020), serine threonine kinases (Durzynska et al., Structure, 2017) and beta-Catenin (Zahid et al., Proteins, 2022).

      In response to this comment, we have modified the title from “Tribbles1 and Cop1 cooperate to protect the host during in vivo mycobacterial infection” to “Tribbles1 is host protective during in vivo mycobacterial infection”. We believe our data does show that the protective effect of Tribbles requires Cop1, but changing the title in this way removes any suggestion that they directly cooperate in the potential C/EBP dependent manner, suggested by the reviewer.

      Although the authors found the number of neutrophils and monocytes unchanged by Trib1 overexpression nor knockdown, they did not demonstrate the differentiation status of both cell types. This is quite an important issue, given that Trib1 knockout promotes granulocytic differentiation via C/EBPa accumulation in mice. Also, the analysis of granulocytic/monocytic differentiation will provide the crucial information how Trib1 protects the host from mycobacterial infection regulating hematopoietic cell functions. The authors should perform morphological analysis and examine cell surface marker expression to examine whether Trib1 and Cop1 modulates granulocytic and monocytic differentiation with and without Mm infection.

      RESPONSE: Unfortunately, we do not have the same level of immunology knowledge nor the antibodies to look at cell surface markers in zebrafish larvae (it is noted that the reviewer identifies that they “not have sufficient expertise in zebrafish models.” We agree with the reviewer that this would be an obvious and informative experiment to do in mouse models, but is not currently possible in zebrafish larval models). The transgenic promoters used (mpx for neutrophils and mpeg1) are robust and widely published to look at total neutrophil and macrophage numbers (Renshaw et al., Blood 2006; Ellett et al., Blood 2011). Mpx, encoding myeloperoxidase, is expressed late in neutrophil differentiation. It is also worth noting that the zebrafish larval model is still a developing organism, and neutrophil/macrophage numbers rise every day between 1 and 5 days post fertilisation, therefore any effect/delay in leukocyte differentiation would likely be captured at the 2dpf timepoint we have already quantified. We cannot perform leukocyte counts during Mm infection reliably as neutrophils/macrophages cluster around infected areas making counting challenging.

      However, in response to this comment we will:

      1. Use a new Tribbles 1 stable CRISPR-Cas9 knockout mutant we have generated and assess neutrophil differentiation using Sudan Black (SD). SD stains neutrophil granules the development of which is during a late phase of neutrophil differentiation.
      2. Interestingly, it has been shown that a zebrafish myeloid specific C/EBP (c/ebp1) is not required for initial macrophage or granulocyte development, but knockdown does result in a loss of the secondary granule gene LysC (Su et al., Zebrafish, 2007). Therefore, our findings are not inconsistent with existing literature, even if C/EBPs are regulated by Tribbles. However, to test this further we will use an LysC:mCherry transgenic line (Buchan et al., PLoS One 2019) to assess expression in developing neutrophils after trib1 manipulation.

      It is interesting that Cop1 knockdown zebrafish is viable, given its ubiquitous expression and multiple important targets of protein degradation. The authors should provide the details of phenotype of Cop1 KO larva and discuss on this issue.

      RESPONSE: Zebrafish mutants are much less often embryonic lethal than mice as maternally contributed protein stores allow for basic metabolic functions to occur throughout the short period of embryonic development (Rossant and Hopkins. Genes and Development 1992). However, in the case of Cop1 Crispant, this is a knockdown rather than a knockout, so there may be sufficient remaining Cop1 availability for development if it is indeed a requirement for larval viability. Although Cop1 knockout mice are non-viable, hypomorphs are viable and develop relatively normally (similar to our knockdown zebrafish) but are tumour prone as Cop1 is required for effective tumour suppression (Milgliorini et al., JCI, 2011).

      We had not commented on the Cop1 larvae phenotype as they look like they develop normally eg. normal body axis, development. However, we agree that this is a relevant point to incorporate into the manuscript and thus will add a comment on this in the Results section. Furthermore, we will add wholebody neutrophil counts into supplementary information, which we have performed and there is no change with cop1 knockdown, suggesting no difference in granulopoiesis.

      [Optional] To obtain the more solid evidence for the Cop1 dependent function of Trib1 on mycobacteria infection, it is better to use the Trib1 mutant that loses the Cop1 binding activity. This experiment will strength the authors' conclusion of the Trib1 and Cop1 cooperation.

      RESPONSE: We will address this comment by using a newly generated stable zebrafish CRISPR-Cas9 Tribbles 1 knockout line with a 14 base pair deletion that is predicted to lead a premature stop at 94aa in the middle of the pseudokinase domain, lacking the catalytic loop. This also lacks the predicted COP1 binding area at the C terminal of the protein. We will assess bacterial burden in this model.

      1. Previous studies have shown multiple defects in hematopoietic lineages such as M2-like macrophages and eosinophils in Trib1 KO mice, suggesting that Trib1 affects cellular functions of macrophages upon mycobacteria infection. I would request the authors to mention some ideas on this point in discussion.

      RESPONSE: We will add a section in the discussion to address this.

      • *

      Reviewer 2

      Structural comparisons are relatively descriptive of identity etc. Nowadays it should be relatively straightforward to comment on structural conservation based on Alphafold models. Specific details may not be accurate but gross folds will be, and comparing those may be more informative.

      RESPONSE: We have taken an initial look at Alphafold models and there are indeed structural similarities between zebrafish and human Tribbles. We will incorporate Alphafold structural models and comment on similarities/differences.

      Some discussion of the mechanisms regulating TRIB1/2/3 transcriptionally is probably relevant given the differential upregulation observed during infection. There is quite a bit of characterisation of different Tribble promoter regions in humans-how Edoes this translate to Zebrafish?

      RESPONSE: We will add a discussion point on what is known about Tribbles promoter regions in humans. We will assess whether anything is known about the promoter regions in zebrafish Tribbles (we have not identified literature on this currently). If nothing is known on this in zebrafish we will attempt to search for regulatory regions found in humans in the zebrafish promoters.

      In terms of Crispr use-can it be confirmed that Crispr modified cell lines have effects at the protein level? This is not my specific expertise, but the supplementary evidence shown seems to show some genomic editing is occurring, but not necessarily how it effects protein levels.

      RESPONSE: We do not have antibodies that work on zebrafish Tribbles proteins to assess this directly. However, we will address this comment by using a newly generated stable zebrafish CRISPR Tribbles 1 knockout line with a 14 base pair deletion that is predicted to lead a premature stop at 94aa in the middle of the pseudokinase domain, lacking the catalytic loop. Unlike the “CRISPant” knockdown work in the peer-reviewed version, this represents a full knockout of Tribbles 1. We will assess the trib1 cDNA of the full knockout line to assess the knockout in terms of transcript.

      A major conclusion of the paper seems to be that TRIB1 works with COP1 in Zebrafish to mediate response to infection. However the discussion does not particularly tie this with the other discussed mechanisms. E.g. JAK/STS, and EBP-linked responses are discussed separately from COP1, where they could well be linked?

      RESPONSE: We agree and this comment fits in with some comments from reviewer 1. We will rework areas of the discussion to address this and bring possible mechanisms together into a new discission section.

      • *

      Reviewer 3

      All comments addressed in new revision (see below).

      It is noted that this reviewer has “expertise from genetic studies of model organisms to assess all aspects of the tools and approaches used in the paper.”

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      Reviewer 1

      Figure 1D, E, F is mislabeled in lines 268-271.

      RESPONSE: Apologies for this typo, this has now been changed.

      Typo in line 399.

      RESPONSE: We have changed “suggesting” to “suggest”.

      Figure 6A-B is mislabeled in line 415

      RESPONSE: Apologies for this typo. We have changed this from “Figure 5A-B” to “Figure 6A-B”.

      • *

      Reviewer 2

      • *

      While the protective effect is stated as an effect size 'close to that of HIF-1a', is there additional rationale suggesting that the two may be linked?

      RESPONSE: Yes, there have been a number of studies that link Tribbles and Hif1-alpha. The best characterised link is in different cancer cells where Tribbles 3 has been linked to HIF-1alpha or hypoxia (in breast cancer (Wennemers, Breast Cancer Research 2011), renal cell carcinoma cells (Hong et al., Inj J Biol Sci, 2019) and adenocarcinoma (Xing et al., Cancer Management Research, 2020). In Drosophila Hif-1alpha induces TRIB in fat body tissue (Noguchi et al., Genes Cells 2022). We have now added references to these studies to the relevant section in the results.

      Reviewer 3

      • *

      Minor issues: small problems with clarity and figure panel correlation as detailed below:

      Mycobacterium marinum Lines 363-365 Refers to Fig 2C-D should be 3C-D

      RESPONSE: Apologies for this typo. This has now been changed to 3C-D.

      negative controls DN Hif-1alpha and PR (Figure 4A-B). Similarly, trib1 overexpression increased the levels of anti-nitrotyrosine staining, a proxy for immune cell antimicrobial nitric oxide production (Forlenza et al. 2008), to similar levels of DA Hif-1alpha (Elks et al. 2014; Elks et al) Not seeing this for Trib1

      RESPONSE: We are not completely sure what the reviewer is referring to here. We think possible confusion stems from the increase of nitric oxide in trib1 is compared to the phenol red control, so we have now clarified that in the text.

      As previously observed, overexpression of trib1 significantly reduced bacterial burden compared to phenol red controls when co-injected with tyrosinase guide (Figure 5A-B).

      The Fig 3 A-B is correct, although 6A-B appear to be novel panels showing this result

      RESPONSE: Yes, we agree, 6A-B has new results showing similar results to 3A-B, as it is necessary to include siblings from the same clutch in each graph to make direct comparisons. To avoid unnecessary confusion, we have removed the “as previously observed” for figure 6 as we had not previously had the tyrosinase co-injection so these are indeed new data.

      444 no comma 446 no comma 457 no comma after "activation"

      RESPONSE: We have removed these punctuations.

      472-475 confusing - better structure in particular in 474 what does "this" refer to?

      RESPONSE: We agree, and have clarified in the following new, clarified sentences:

      “Lipid droplets form in macrophages during Mtb infection that are potentially used as source of lipids by Mtb to allow for intracellular growth (Daniel et al. 2011). However, more recent findings suggest that lipid droplets are formed during the immune activation process after macrophage Mtb infection (Knight et al. 2018), that can subsequently influence the dynamics response of macrophage host defence (Menon et al. 2019). This macrophage lipid metabolism and handling could potentially be influenced by Tribbles.”

      525-526 confusing - better structure perhaps begin with 'Because...'

      RESPONSE: We have changed this confusing sentence to:

      “Here, we demonstrate il-1b and NO control by Trib1, suggesting that Trib1 controls multiple immune pathways and that therapeutic Trib1 manipulation may be more effective than targeting individual immune pathways alone.”

      confusing 538 "this and 539 pave the way for further research into TRIB1 as a target for host-derived therapies" Perhaps "further research into TRIB1 as a target for host-derived therapies could potentially improve infection outcome of mycobacterial infection via pharmacological targeted delivery methods and transient manipulation through genetic approaches"

      RESPONSE: We have changed this sentence as suggested.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      Reviewer 1

      1. The authors should investigate the expression of the C/EBPa protein p42 isoform and/or other C/EBP family proteins such as C/EBPb, and confirm that the p42 is degraded by Trib1 overexpression and recovered by Trib1 and Cop1 knockout. It is also important to determine both p42 and p30 isoforms are preserved in zebrafish.

      RESPONSE: This is a complex point to unpick in zebrafish and we believe this to be out of the scope of the current project. We do not claim a link to C/EBP. As mentioned in above comments we think that a link to C/EBP may be unlikely given that monocyte and granulocyte numbers are not altered after TRIB1 manipulation. We will add more data to look at different markers of neutrophils (see above comments). There are many other candidates other than C/EBP that COP1 could be acting through. Some examples include MAPK (Niespolo et al., Front Immunol, 2020), serine threonine kinases (Durzynska et al., Structure, 2017) and beta-Catenin (Zahid et al., Proteins, 2022). There is also evidence suggesting that COP1 and C/EBP have distinct binding sites on TRIB1, potentially unlinking their activity in some biological situations (Murphy et al., Structure, 2015).

      C/EBPa is found in zebrafish and is involved in myeloid differentiation and haematopoeisis (Yuan et al., Blood 2011). There is not a huge amount in the literature on this, but it has been shown in zebrafish models that the drug Tanshinone IIA reduces C/EBPa (Park et al., In J Mol Sci, 2017) and we know from previous work in our department that Tanshinone IIA does not affect total neutrophil numbers in the zebrafish larvae (Robertson et al., Sci Trans Medicine, 2014). The most involved C/EBP in zebrafish myelopoiesis appears is a zebrafish specific isoform called c/ebp1 that is myeloid expressed (Lyons et al., Blood 2001). This has a highly conserved carboxy-terminal bZIP domain but the amino-terminal domains are unique. Interestingly, reduction of c/ebp1 does not ablate initial macrophage or granulocyte development, but did result in loss of expression of LysC, a secondary granule marker (we are checking expression of this gene after Trib1 modulation using a LysC:mCherry transgenic zebrafish line).

      We do not have antibodies or tools to detect p42 and p30 in zebrafish. As Tribbles1 regulation of C/EBPa appears to be post-translational (Bauer et al., J Clin Invest, 2015), this would be incredibly challenging to unpick in the zebrafish model due to lack of tools to do this. Due to this and the reasons above we believe this to be out of the scope of the current project.

      [Optional] The effect of enhanced ERK phosphorylation by Trib1 for the protective effect against mycobacterial infection is another interesting point. It would be better if the authors could provide the ERK phosphorylation status upon Trib1 overexpression.

      RESPONSE: Unfortunately, we have no method to answer this question to a conclusive level within the scope of this project. There are limited reports of phosphorylated ERK antibodies that work in wholemount zebrafish (eg, Maurer and Sagerström, BMC Developmental Biology, 2018, that use a rabbit antibody), but this is widely expressed in many tissues of the zebrafish and immune cells would be challenging to resolve.

      Reviewer 2

      We have addressed or propose to address all of reviewer 2’s comments.

      • *

      Reviewer 3

      We have addressed or propose to address all of reviewer 3’s comments.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):*

      The mechanisms that differentiate ER from the nuclear envelope (NE) remain to be fully elucidated but likely depend at least in part on junctions between the ER and NE. How such junctions are formed and maintained is the subject of this manuscript where extensive correlative light and electron microscopy is used to observe and characterize ER-nuclear envelope (ER-NE) junctions at distinct phases of the cell cycle. The authors make use of their own electron tomography data as well as publicly available focused-ion beam scanning electron microscopy (FIB-SEM) datasets to compare the morphology of these junctions in different human cell types as well as in budding yeast. The major finding is that ER-NE junctions in human cell lines are more constricted than ER-ER junctions, often to the point of excluding lumen. The examination of mitotic cells suggests that this constriction likely occurs at the end of mitosis as the NE is completing its maturation from ER to NE. The implications of these morphological changes are discussed but there are no mechanistic or functional studies. Overall, the data are well presented, are of high quality and are rigorously evaluated. The manuscript is well written and scholarly, and the speculations as to the function of the constrictions are reasonable. I only have minor comments. *We thank the reviewer for the positive evaluation on our work and for the useful suggestions on how to further improve the manuscript.

      1. * In Figure 2D, the authors present evidence to demonstrate that an hourglass-like constriction occurs at ER-NE junctions. From the side view, it is difficult to interpret this on the plot, particularly for the ER-NE junctions with a lumen. Perhaps, in the supplemental data, the authors could plot both with and without lumen data separately, and color-code individual traces? I believe this would convey the hourglass nature of these constrictions more clearly.* To make it easier to see individual membrane profiles, we will plot the profiles with and without lumen separately and labelled each profile with distinct colour, as the reviewer suggested.

      * In the Methods section, the authors should describe how carbon-coating of sapphire discs was achieved. If these were provided from the manufacturer precoated, this should be specified.*

      We coated the sapphire discs with carbon by ourselves. We will specify how the carbon-coating was done in the revised manuscript.

      * On page 10, Figure 5F callout 9 lines from the bottom likely should be 5E. We will correct this error.

      Reviewer #1 (Significance (Required)):

      Overall, this work provides an important new morphological perspective on the nature of ER-NE junctions in human cells. As the authors describe in their introduction, such junctions have been noted previously in the literature but not in a dedicated study using modern imaging techniques in human cell lines. In describing the morphology of these junctions, the authors lay the groundwork for future mechanistic, functional, and structural studies. We thank the reviewer for appreciating the significance and the impact of our work.

      *

      • *

      • *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):*

      Summary: In this manuscript, Bragulat-Teixidor et al., use correlative live-cell imaging and electron tomography to study the structure of the endoplasmic reticulum-nuclear envelope (ER-NE) junction in HeLa cells (and also in S. cerevisiae). The authors also make use of publicly available whole-cell FIB-SEM datasets to study ER-NE junctions in mouse pancreatic islet, HeLa, and human macrophage cells to corroborate their findings in other cell types.

      The authors show that the structure of the ER-NE junction in interphase cells adopts an hourglass shape with a constricted neck. Comparing the ER-NE junction to the ER tubule-sheet junction, the authors show that these structures are different: the ER tubule-sheet junction is not constricted. Because the NE forms from the ER during postmitotic NE assembly, the authors compare the structure of the ER-NE junctions in anaphase, telophase, and interphase cells, and find that the junction becomes constricted in telophase. The number of ER-NE junctions increase going from telophase to interphase.

      While the authors do not provide any direct evidence for this, they propose a functional model where the ER-NE junction is constricted because it regulates the supply of certain lipids and proteins from the ER to the NE. One proposed example is that the constriction of the ER-NE junction might prevent the passage of large protein aggregates from entering the NE.

      The general question of how the structure of the ER-NE junction might regulate the passage of lipids and proteins from the ER to the NE is interesting and potentially important. However, the authors should address the following issues to improve the accuracy and completeness of this manuscript for it to be considered for publication. *We thank the reviewer for the appreciation of our work and the thoughtful suggestions for further improvements.

      * Major comments: 1. The authors compare the structure of the ER-NE junction to the structure of the ER tubule-sheet junction in interphase cells. They should instead or in addition be comparing the ER-NE junction to ER sheet-sheet junctions. This is likely a better comparison for two reasons:

      i) The NE is similar to an ER sheet due to its flat and extended structure. The ER membranes surrounding the NE consists mostly of a dense network of sheet-like ER (Zheng et al., 2022, PMID: 34912111). Therefore, the ER-NE junction should be compared to these NE-adjacent ER sheet-sheet junctions and not ER tubule-sheet junctions which are likely to be found in the cell periphery.

      ii) In HeLa cells, the NE assembles from large ER sheets and not ER tubules (Zhao et al., 2023, PMID: 37098350; Otsuka et al., 2018, PMID: 29323269; Lu et al., 2011, PMID: 21825076). Therefore, the ER-ER junctions the authors are already studying in anaphase cells are likely to be ER sheet-sheet junctions, which should be kept the same in their analysis of the ER-ER junctions in interphase cells.

      Related to this point, comparing the side view panels in Figure 2D with 2H, it seems that the width of the ER membranes on either side of the neck region of the ER-NE junction is in fact getting wider (more sheet-like). This is in contrast to the ER-ER junction where the width stays constant for the ER tubule that is fusing onto the ER sheet. This suggests that indeed, the ER-NE junction is more similar to an ER sheet-sheet junction. *It is a very interesting possibility that the ER-NE junction might be similar to the ER sheet-sheet junction. We will inspect whether the ER that forms the ER-NE junction consists of sheet or tubular ER in our EM tomograms, and describe the outcome in the revised manuscript.

      * The authors claim that in late anaphase cells, the ER-ER/NE (written like this because the ER and NE cannot be distinguished like the authors also point out) junctions are not constricted and had a similar morphology to ER-ER junctions in interphase. However, this claim is only qualitative at the moment, as the authors do not provide any quantification of the width of the ER-ER/NE junctions in late anaphase cells. To make the current claim that the ER-NE junction only becomes constricted in telophase, the authors should report the width of the ER-ER/NE junctions in late anaphase cells.

      In late anaphase cells, large ER sheets initially wrap around chromatin at the periphery of the chromosome mass (Zhao et al., 2023, PMID: 37098350; Otsuka et al., 2018, PMID: 29323269; Lu et al., 2011, PMID: 21825076). Therefore, the authors might find it easier to identify ER-ER/NE junctions in the so-called "non-core" regions, instead of in the current regions shown in Figure 3A. *As the reviewer pointed out, we did not provide quantification of the width of ER-ER/NE junctions in late anaphase cells. We will measure them and show the quantification in the revised manuscript.

      * Minor comments: 1. In the Supplementary Figures 1 A-D, make the scale bars white. Currently, the black scale bars are especially difficult to see in the top panels in Supplementary Figure 1C. *We will change the colour of some scale bars to make them more visible in the Supplementary Figure 1.

      * In the Results section entitled "The number of ER-NE junctions per cell increases from telophase to interphase", the authors should tone down this claim because the number of telophase cells examined is low (only 2 telophase versus 9 interphase cells). It would be better to include the word "slightly" in the title to change it to "slightly increases". *We will modify the text accordingly. * In the Results section entitled "The number of ER-NE junctions per cell increases from telophase to interphase", the authors state "These densities were much lower than those of ER-ER junctions...". For sure this is true for ER tubule-tubule junctions in the periphery of the cell as ER tubules form an intricate network by constantly fusing to each other, but it's not clear if this is also the case for ER tubule-sheet or ER sheet-sheet junctions. For clarity, the authors should state that they mean ER tubule-tubule junctions.

      Same comment also for the statement "...although their abundance remains considerably lower than that of ER-ER junctions or nuclear pores at both cell cycle stages". The authors should state that they mean ER tubule-tubule junctions. We will clarify what we mean by ER-ER junctions in the revised manuscript. * In the Results section entitled "The constricted morphology of ER-NE junctions is observed in different mammalian cells, but not in budding yeast", the authors state "...pancreatic islet cells (Figure 5A), HeLa (Figure 5B), and macrophage (Figure 5C) were significantly smaller than most ER-ER junctions (Figure 5F)". The last figure reference here is wrong and should be changed to Figures 5D-E. We will correct this error. * In Discussion, the authors state "Proteins known to form and stabilize junctions in the ER, including Atlastins and Lunapark...". The authors should specify that they mean ER tubule-tubule three-way junctions. Also more generally throughout the manuscript, the authors should be more careful in specifying which ER-ER junctions they mean in each case.*

      As pointed out in the Major comment 3 above, we will clarify this point in the revised manuscript.*

      1. In Discussion, the authors state "Thus, we favour a second scenario in which ER-NE junctions are generated from ER tubules that contact and eventually fuse with the ONM". Given that the ER membranes adjacent to the NE are mostly sheet-like (as pointed out in Major comment 1 above), the authors need to explain how they think an ER tubule (mostly found in the cell periphery) could access and fuse to the NE. As mentioned in the response to Major comment 1 above, we will examine if the ER that forms ER-NE junctions is tubule or sheet in our EM tomograms. Depending on the outcome of the examination, we will rephrase the text.

      *

      * Reviewer #2 (Significance (Required)):

      Although the ER-NE junction has been studied in other organisms before, this study represents the first structural characterisation of the ER-NE junction in mammalian cells. Therefore, this study represents an advance for the field in gaining a better understanding of different ER structures and morphologies. How the ER is remodelled during the cell cycle is also an interesting question and an active field of research (Merta et al., 2021 PMID: 34853314; Zhao et al., 2023, PMID: 37098350) which this study further contributes to. This study would therefore be interesting for anyone interested in ER structure/morphology, ER-NE connections, and cell cycle regulation of such ER-NE connections.

      My field expertise is in ER and NE. I do not have sufficient expertise to evaluate the methodology for the EM tomography part of this paper. We thank the reviewer for appreciating the novelty and the impact of our work.

      *

      *

      *

      * Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Bragulat-Teixidor et al. is a study of the connection of the ER with the nuclear envelope. It uses advanced ultrastructural techniques: high pressure freezing instead of chemical fixation and EM tomography instead of serial sectioning. Synchronized HeLa cell cultures were examined during interphase, late anaphase (4-6 min after anaphase onset) and early telophase (8-10 minutes after anaphase onset).

      The investigators find an unexpected, unusual structure - a constricted neck 7-20 wide and about 10 nm long where the ER connects to the nuclear envelope. The 7 nm connections had no apparent lumen. These are not seen in late anaphase when the NE has not yet formed, but they are seen a few minutes later during early telophase when there is a newly formed NE surrounding the chromosomes. A quantitation was made of their abundance, more was found later during interphase, and with wider lumens.

      It is very nice to show the EM images as uncolored and segmented (colored). The images shown in the figures are presumably the best that were obtained during the study. Heavy metals do not stain membranes uniformly or exclusively, and identification of structures doesn't always seem unambiguous. The three dimensional information can certainly make this easier though this information is difficult or not possible to show in journal format. In the end, the reader must depend on the judgment of the person who did the analysis. Overall, the analysis seems trustworthy. *We thank the reviewer for the comment. To better present the three-dimensional structure of ER-NE junctions, we will provide movies of the EM sub-tomograms containing the junctions. In this way, the readers will be able to inspect the three-dimensional structure of six ER-NE junctions.

      * HeLa cells are very convenient for getting information on cell cycle dependence. However, they are cancer cells in culture, so it is important to look at other cell types as well. The same methodology was used on budding yeast and they saw a wide tentlike connection, which reproduces an earlier study. This seems more consistent with what is known or expected from ER membranes. It is not less interesting but perhaps less puzzling. To get evidence on other mammalian cells, the authors did an analysis of data from OpenOrganelle. These are high pressure frozen cells / tissue imaged by FIB-SEM. The voxels are 4 nm, which is significantly larger than those in EM tomography. Unfortunately, the difficulty of identifying structures is correspondingly more significant. The images shown do not contradict the HeLa results but by themselves (without the HeLa cell data), a convincing case for narrow connections probably couldn't be made. *The reviewer raises a very good point about a limitation of the FIB-SEM datasets in OpenOrganelle. We agree with the reviewer that, as we had mentioned in the manuscript (line 6–11, page 10), the spatial resolution of the FIB-SEM datasets are not enough to gain insights into the exact morphology of the 7–20 nm wide ER-NE junctions because the voxel size is 4 nm. However, the resolution is good enough to examine if ER-NE junctions are narrower than ER-ER junctions, as shown in Figure 5A–E. The fact that we rarely found non-constricted ER-NE junctions in FIB-SEM datasets confirms the tiny nature of ER-NE junctions. To clarify this point, we will modify the text (line 24–25 on page 10) as below:

      Previous: This analysis of FIB-SEM images confirms the hourglass morphology that distinguishes ER–NE from ER–ER junctions as seen in our EM tomograms…

      Revised: This analysis of FIB-SEM images confirms that ER-NE junctions are narrower than ER-ER junctions as seen in our EM tomograms…

      * The work in this manuscript seems to have been done well. Assuming that this structure is confirmed in other mammalian cells, another kind of question comes to mind: is this the final word on ER to NE connections? The lumenless neck does not seem like it would be a stable structure, somehow it seems like a transient one. In the future, it would help if a new structural protein was identified or some theoretical analysis to help explain the shape. *Certainly, this will not be the final word on ER-NE junctions, which are crucial for the ER-to-NE transport of lipids and transmembrane proteins. In the future, it will be important to identify structural proteins regulating the junctions and reveal how their constricted morphology affects the ER-to-NE transport. We believe that, as you kindly mentioned in the last paragraph of your comments, our observations “serve as a starting point for further structural and functional work” for this unique yet fundamental junctions that connect the ER to nucleus.

      * It is generally now assumed that high pressure freezing preserves structure perfectly. However, in this reviewer's mind, there is a possibility that some structures are not. The sample is brought to 2000 atmospheres within a few milliseconds, frozen, then the high pressure is released after a second. Although many intracellular structures do seem well preserved, could the junction be susceptible to high pressure? A second source of uncertainty is that in order to embed the samples in resin, the water was removed by freeze substitution. This is known to cause a small amount of tissue shrinkage and possibly could alter a delicate structure. Another way to look at this kind of structure is cryo-EM tomography on hydrated lamellae from plunge frozen cells. I don't recommend that the authors do another arduous, possibly too arduous set of experiments with a completely different technique, but perhaps another group has data which could support their findings. *We think it is very unlikely that ER-NE junctions were deformed due to the high-pressure freezing. In general, high-pressure freezing allows vitrification of specimens up to 0.5 mm in thickness and the vitrification works better for thinner specimens. Our specimens are only 0.02 mm thick monolayer cells frozen in a chamber with 0.03 mm depth. Thus, the vitrification is expected to occur fast and the ER-NE junctions must have been frozen in the same way as in other regions of the cell.

      However, as the reviewer pointed out, it is possible that the dehydration of the samples due to freeze substitution might cause deformation in ER-NE junctions. To verify the structural preservation of ER-NE junctions in our protocol, we will compare the morphology of the ER and NE in cryo-EM datasets that are available in public databases with ours. We will describe the outcome in the revised manuscript.

      We think that our conclusion from the EM analysis is solid, because we observed significant structural difference between ER-NE junctions and ER-ER junctions in the same cells (Figure 2). In addition, we found the morphology change of ER-NE junctions in late-anaphase, early-telophase, and interphase cells that were high-pressure frozen and freeze-substituted on the same sapphire disc, and found that the ER-NE junctions became progressively constricted from telophase to interphase (Figure 3).

      * The following are suggestions for the Discussion:

      Yeast have many of the same biochemical processes as mammalian cells. Perhaps their lack of narrow connections can be used as a clue to the function of the narrow necks seen in HeLa cells. For instance, the authors speculate that the narrow connection serves to keep phosphatidylserine in the nuclear envelope low. If the yeast nucleus has the same concentration of phosphatidylserine as the ER, it would provide good evidence for this idea. Yes, it is indeed the case. It was shown that the yeast outer nuclear membrane has the same concentration of phosphatidylserine as the ER (Tsuji et al., Proc. Natl. Acad. Sci. U. S. A.*, 2019). We had described this in the discussion on page 14 “this phosphatidylserine enrichment occurs in mammalian cells and not in budding yeast (Tsuji et al., 2019)”, which was probably overlooked by the reviewer. In the revised manuscript, we will rephrase the text to make this point clearer.

      * There might be other instances of lumenless neck structures. Dynamin mutants can cause a stable constricted tubule - are the dimensions of this tubule similar to that of the ER / NE connections? Or possibly some ESCRT related structure? These are very interesting questions. As shown in Figure 2A-D and Supplementary Figure 1B, the inner diameter (an inner leaflet distance) of the lumenless ER-NE junctions is below 1 nm. In contrast, the inner diameter of most constricted membrane tubules that the dynamin mutant K44A Dynamin 1 generates is 3.7 nm (Antonny et al., EMBO J., 2016, doi: 10.15252/embj.201694613). The inner diameter of membrane tubules that ESCRT-III subunits CHMP1B and IST1 form is 4.4 nm (Nguyen et al., Nat. Struct. Mol. Biol.*, 2020, doi: 10.1038/s41594-020-0404-x). Thus, the lumenless ER-NE junctions is unique in their highly-constricted nature and might be regulated by proteins other than dynamin or ESCRT proteins. We will discuss this point in the revised manuscript.

      * There do not seem to be any recent studies of the ER / nuclear membrane connection in fixed cells. However, there is serial section data online which can be inspected. There are connections in mouse brain cortex in the data of Kasthuri et al., 2015 (https://neurodata.io/project/ocp/). Instead of a tubule connection, there seems to be a narrow sheet of ER that connects to the nuclear envelope. But there is something odd about these too. The authors may like to mention something about this or similar work in their manuscript. This reviewer has looked at chemically fixed data from several cell types from his own unpublished data and connections are surprisingly hard to find. Possibly, the connection is particularly sensitive to chemical fixation.* We inspected the serial section data of mouse brain cortex that was chemically fixed. The nuclear envelope in this dataset is deformed and does not seem well preserved. We do not think that we can extract useful information on the ultrastructure of ER-NE junctions from this dataset, and thus will not mention this work in our manuscript.

      It is great to hear that the reviewer tried to look for ER-NE junctions in their own EM data. The frequency of ER-NE junctions is rare (only 0.1 junction per square micrometer, Figure 4). Thus, we think that the reason why it was hard to find the junctions in the reviewer’s data is due to the low-frequent nature of this junction and not due to the chemical fixation.

      • *

      * Reviewer #3 (Significance (Required)):

      This is a careful and thorough study of the connection between the ER and the nuclear envelope. The discovery of reticulons and similar proteins, along with biophysical modeling, made the form of the ER accessible to analysis. The factors that govern ER structure are now much better understood. This is particularly true of sheets versus tubules, the three way tubule junctions and to some extent, the junction of ER tubules coming out of the edge of a sheet. However, with all this activity, the subject of the connection of the ER to the nucleus has not been examined in detail. What makes it different is that the tubule is connected perpendicular to the plane of a sheet.*

      We thank the reviewer for appreciating the quality and novelty of our work.

      * The manuscript uses the best ultrastructural techniques and provides strong evidence for a narrow neck at this connection in HeLa cells. With the same methodology, yeast cells (S. cerevisiae) have a wider connection. OpenOrganelle data from other mammalian cell types was examined. This data has less resolution and although it does not contradict the HeLa cell data, it does not support it strongly. *As mentioned in the response to one of this reviewer’s comments above, the spatial resolution of FIB-SEM datasets is good enough to examine if ER-NE junctions are narrower than ER-ER junctions. We think that our observation of several mammalian cells in FIB-SEM datasets strongly supports the conclusion that ER-NE junctions are narrower than ER-ER junctions and extends our findings in HeLa cells to two other mammalian cell types.

      * This work is of interest to cell biologists specializing in membranous organelles or those interested in nuclear physiology. The connection of ER to nuclear envelope is an interesting problem that has not been studied recently. This manuscript could very well serve as a starting point for further structural or functional work by the authors or other groups. *We thank the reviewer for appreciating the significance and impact of our work.

      *

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary: Membrane bound ribosomes and ER exit sites are present in the cytosolic side of nuclear envelope (NE), suggesting that NE shares protein translocation, folding and quality control functions with the endoplasmic reticulum (ER). Moreover, membrane continuity between the ER and outer NE membrane is evident, and, thus, NE is considered as a subdomain of the ER. To support this, during cell division, NE loses its identity, and participates to daughter cells as part of the ER. However, NE has also membrane proteins and luminal proteins that are enriched to NE and absent from ER during interface, and the segregation of NE specific proteins/lipids occurs concomitantly with NE formation during late anaphase/telophase. In this study, the ultrastructure of the ER-NE junctions is described using high resolution electron tomography. Results show convincingly a specific constriction at the ER-NE neck during interface in several mammalian cell types. This structure is absent during metaphase, and also from the budding yeast. Authors present a model for the formation of ER-NE junctions in higher eukaryotes and speculate about their functional role. *We thank the reviewer for the appreciation of our work and the valuable suggestions for further improvements.

      * Major comments: The main conclusion of the paper is that although the ER and outer NE membranes are continuous, a specific hourglass shaped constriction at the neck is found in higher mammalian cells during interphase. The structure is specific to ER-NE necks, as it is absent during metaphase and ER-ER junctions. For the analysis, authors used high pressure freezing to ensure best structural preservation. Unfortunately, fixation is not the only potential source of artifacts; during tomography at ambient temperature, the thinning of the plastic sections under the beam can be up to 30%. In evaluation of the results, authors should consider how this thinning could affect the measurements of membrane distances and luminal width, and what type of distortions may happen as a consequence of asymmetric shrinkage.*

      In addition to analysis of own samples, authors took advantage of the publicly available whole-cell datasets in OpenOrganelle and used these datasets to expand the number of cell types analyzed. Moreover, the 3D-datasets were generated with different imaging technique, FIB-SEM. Although this technique provides lower resolution in general, it provides isotropic resolution, and the data could be used to eliminate the shortcomings of the tomography, thinning of the sections and the missing wedge. The authors could expand the comparison of the data from these different sources from this perspective, especially since HeLa cells were used in their own tomography studies and FIB-SEM datasets in OpenOrganelle. Similarly, it would be interesting to see if similar approach could be used to compare their results to those obtained by cryo-EM by utilizing the cryo-EM database. Have authors checked if any suitable datasets for analysis of ER-NE junctions could be found from public archives? For the analysis of mitotic cells, double thymidine block was used to synchronize the cell culture. It is not clear, why synchronization was necessary, as CLEM was used to select the cells, and their number was rather low. Do cells continue growing and synthesizing new proteins during thymidine blocks? As one way to control potential artifacts due to the synchronization treatment, authors could compare the average thickness of ER and NE in naturally occurring interphase and mitotic cells vs. synchronized cells. We agree with the reviewer that it is important to clarify the degree of shrinkage and deformation of the sample that our EM protocol might introduce. To access the degree of sample shrinkage and deformation in the plastic sections, we will compare the ONM-INM distance measured in our plastic sections with the one in cryo-EM tomograms of rapidly-frozen and FIB-milled mammalian cells that are publically available (EMPIAR, the Electron Microscopy Public Image Archive, https://www.ebi.ac.uk/pdbe/emdb/empiar/), and describe the outcome in the revised manuscript.

      The reason why we synchronized the cell cycle is to enrich cells in late anaphase and early telophase in the same plastic sections, so that we can compare their ultrastructure side-by-side. In the revised manuscript, we will examine if the double thymidine block affects the ER-NE junction morphology by comparing the morphology of the ER and NE between the synchronised and non-synchronised cells.

      As we described in the response to Reviewer 3, we think that our conclusion from the EM analysis is solid because of the following reasons. (i) We observed a significant structural difference between ER-NE junctions and ER-ER junctions in the same cells (Figure 2). (ii) We discovered a morphology change of ER-NE junctions in late-anaphase, early-telophase, and interphase cells that were freeze-substituted on the same sapphire disc; the ER-NE junctions became progressively constricted from telophase to interphase (Figure 3).

      Minor comments: On page 5, last chapter (+ Fig.1 legend and materials and methods): "the quick tomograms covered the entire NE" is misleading, as the imaging covered a thin layer of the entire NE only. - Authors could have analyzed the entire NE from the FIB-SEM datasets but chose to use stereological approach to minimize their work.

      We will modify the text to make it clear that the quick tomograms covered the NE in a section and not the entire NE of the cell in the revised manuscript.

      * To save time from the readers to follow the reference, authors could describe how the specimens used in OpenOrganelle datasets were fixed and processed, especially as they emphasize the importance of high pressure freezing in their own sample prep. Similarly, in Fig.4 legend, authors refer to measurements done in the previous study without explaining how and from what type of data. *We thank the reviewer for pointing these out. We will describe how the OpenOrganelle datasets were generated and how the nuclear surface area measurement was done.

      • *

      Is there a difference between mesh generation and segmentation, or is it just two different terms used for the same thing by different programs? We apologize our short description of these terms. We will clarify these terms in the revised manuscript.

      *

      Reviewer #4 (Significance (Required)):

      General assessment: ER-NE gates were described earlier in the literature for specific cell types using standard thin-section TEM imaging, and in this study, the analysis was done with modern technology at 3D. The text is fluent and clear, and the quality of the images was excellent. The analysis of the data was thorough, and materials and methods including image analysis part were presented accurately and clearly. Ultrastructural analysis was done systematically, and generated models are beautiful and informative. Much thought has put into planning of the experiments and experimental approach. The shortcoming of the study is its limitation to ultrastructural analysis only without attempts to connect to any mechanism. The discussion part contains lot of speculation of the factors that might be needed for the formation and maintenance of the constriction and present several hypotheses for the function of the constriction. The paper would be much stronger if one of few of the leads would be followed, and if there would be any explanation for the role of these structures, or factors affecting them. *We thank the reviewer for the appreciation of the clarity and quality of our work. The molecular mechanism that regulates the function, shape and biogenesis of ER-NE junctions will be the subject of future studies, for which our discovery of a highly-constricted morphology of the ER-NE junctions lays the groundwork.

      * Advance: The paper provides a very nice example for the reuse of publicly archived imaging datasets to complement own experimental work. Hopefully this paper encourages others to the same path, as the large volumeEM datasets require significant investments and contain wealth of potential for reuse. *We strongly agree with the reviewer. The volume EM datasets that are publically available contain wealth of potential for new discoveries. We also hope that our paper encourages other scientists to make good use of those datasets and also to deposit their own data to the public databases. We will deposit our EM tomograms to EMPIAR, the Electron Microscopy Public Image Archive.

      * The paper strengthens the description of the ER-NE junction structure significantly and convincingly but does not further our understanding of the mechanisms behind the structure nor the function of them and raises more questions than provides answers. For structural analysis of this kind, the state-of-the-art technology is cryo-EM (e.g., preparation of lamella with cryo-FIB-SEM followed by cryo-tomography), and in this study, the technical limitations come from plastic embedding and ambient temperature imaging. The used techniques would be more adequate for cell biological study, where the described structure is somehow connected to the function in cell, or the factor(s) needed to the formation or maintenance are identified. *Indeed, a limitation of our current study is that we did not reveal the underlying molecular mechanism and the functions of the constricted morphology of ER-NE junctions. We do not think that cryo-EM is necessarily required because we have collected evidence that the ER-NE connections are distinct from the ER-ER junctions in not only our EM tomography data (Fig. 2) but also in the EM datasets deposited in public databases (Fig. 5).

      * Audience: This study will be of special interest to cell biology community. The study could be an opening to several lines of research, e.g., identification of the factors forming or maintaining the structure, the potential function of the structure, how the structure affects the dynamics of the NE/ER membrane and luminal proteins. *We thank the reviewer for appreciating the impact of our work.

      * Reviewer's expertise: The reviewer has long experience in electron microscopy, volumeEM techniques and image analysis, and operates mainly in the field of cell biology.*

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript entitled "The Drosophila Tumour Suppressor Lgl and Vap33 activate the Hippo pathway by a dual mechanism, involving RtGEF/Git/Arf79F and inhibition of the V-ATPase." by Portela et al. presents an interesting perspective of the molecular mechanism regulating Hippo pathway, revealing new proteins involved in this process. In this study, the authors try to show us that Lgl activates the Hippo pathway via Vap33 either by interacting with RtGEF/Git/Arf79F or by inhibiting V-ATPase, thus controlling epithelial tissue growth. The methodology used by the authors is adequate but could benefit from further experiments that would allow them to reach the conclusions stated in their research. Thus, based on the interpretation of the results presented by the authors some concerns were raised that should be addressed during the review process and that are explained in the major comments. Major comments: • It is not clear why in "The Hippo signaling pathway is negatively regulated by V-ATPase activity in Drosophila" section, the authors use Vha68-2 RNAi to reduce the activity of V-ATPase and later they use the overexpression of Vha44 to activate V-ATPase. The authors should explain why they used different proteins to regulate V-ATPase. The way the authors wrote their results sounds like different Vha proteins regulate V-ATPase, which means that cells may have different ways to activate V-ATPases, not being clear if regardless that the downstream effect of V-ATPase activation is always reflected in the Hippo pathway. Thus, the authors should state what other Vha proteins may have a similar effect, I would like to see evidence that Vha44 and Vha68 knockdown and overexpression leads to similar results.

      Response: Vha68-2 and Vha44 are both components of the V-ATPase. We have added further details to the results to make this clearer. We have previously shown that knocking down several components of the V-ATPase, which disrupt V-ATPase function, have a similar effect on the Notch pathway (Portela et al., 2018 Sci. Signal., PMID: 29871910). Vha44 overexpression had been documented to result in V-ATPase activation (Petzoldt et al., 2013, Dis Model Mech., PMID: 23335205), and no other Drosophila V-ATPase transgenes were available to conduct experiments with other lines.

      • In "Vap33 activates the Hippo pathway" section, the authors' conclusions represent a big statement considering the results obtained. Though Diap1 is a Hippo pathway target, it does not mean that this protein is solely regulated by this pathway. For example, there are studies that show that this gene can also be transcribed by STAT activity. Though in the following section the authors show how Vap33 activates this pathway, the results obtained in the section "Vap33 activates the Hippo pathway" are not enough to make this assumption. We suggest that the authors rephrase this section. (Optional: To maintain this statement, the authors should have performed, for example, a luciferase assay containing specifically Hippo pathway binding sites in the Diap1 gene, showing that the transcription factor of the Hippo pathway is somehow regulated by Vap33). Response: Whilst Jak-STAT signalling has been shown to induce Diap1 expression in the wing disc during development (PMID: 28045022), however expression profiling after activation of the Jak-STAT signalling in the eye epithelium did not identify Diap1 as a target (PMID: 19504457). Additionally, there are no reports that Lgl depletion in eye disc clones elevates Jak-STAT signalling (Stephens et al., J. Mol. Biol. 2018, PMID: 29409995), but instead loss of cell polarity in scrib mutant cells in the eye disc results in expression of the Jak-STAT pathway ligand, Upd, and non-cell autonomous induction of Jak-STAT signalling in the surrounding wild-type cells (PMID: 25719210, __PMID: __23108407). We have previously shown that Lgl depletion leads to inactivation of the Hippo pathway and elevates expression of the canonical Yki targets, Ex and Diap1 (Grzeschik et al., 2010, Curr Biol., PMID: 20362447). In this current study we show that Vap33 overexpression leads to the downregulation of Diap1 and in lgl mutant tissue reduces the elevated Diap1 expression. Since there is no evidence that either Lgl or Vap33 (VAPB) perturbations affect the Jak-STAT signalling pathway, we conclude from our results that Vap33 acts by reducing Yki activity and thus activating the Hippo pathway. We have added additional explanation to this section of our manuscript.

      • The authors present a highly speculative discussion, raising different hypotheses. Though such hypotheses are well supported by the literature, the authors would enrich the quality of their research if indeed they could prove them. Particularly, testing for vesicle acidification, testing if V-ATPase indeed blocks the interaction of Lgl/Vap33/RtGEF/Git/Arf79F, and alters Hpo localization, testing if Git/RtGEF inhibits Arf79F and consequent Hpo localization. Response: Although it would extend the paper to conduct further experiments, my lab is now closed so this is not possible. We have already published that vesicle acidification is increased in lgl mutant tissue (Portela et al., 2018, Sci. Signal., PMID: 29871910) and that Hpo localization is altered in lgl mutant tissue (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447).

      • The authors should also apply more specific techniques to infer how the Hippo pathway is affected by such genetic manipulation since diap1 can be a target gene of different pathways. Response: We have shown that lgl mutant tissue also shows upregulation of the Hippo pathway target, Ex-LacZ, and affects the phosphorylation of Yki (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447), and RtGEF/Git mutant tissue shows upregulation of the Yki target, Ex-LacZ (Dent et al., 2015, Curr. Biol., PMID: 25484297). Since RtGEF/Git are positive regulators of Hippo, but there is no evidence that they are involved in the regulation of the Jak-STAT pathway, the effect of Vap33 overexpression on Diap1 levels in the context of a RtGEF knockdown (Fig 5) is most likely to be due to effects on the Hippo pathway. Similarly, since Lgl deficiency upregulates Yki targets, Ex-LacZ and Diap1 (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447), the reduction of the elevated Diap1 levels in lgl mutant clones by knocking down or reducing Arf79F activity (Fig 7), is most likely due to inhibition of Yki activity and therefore elevated Hippo pathway signalling.

      Minor comments: • The authors present a well-structured manuscript, that generally is easy to understand. However, at some points, the statements given by the authors seem highly speculative. • The figures presented in this manuscript and the statistical analysis seem adequate and are clearly described.

      Response: We thank the reviewer for their support of our study. We have added more explanation to support our conclusions.

      Reviewer #1 (Significance (Required)):

      The study presented by Portela et al. gives new insights into the regulation of the Hippo pathway with the discovery of new proteins involved in this mechanism, which can be interesting to those working on basic research and focused on studying signal transduction. However, this study lacks some novelty. Throughout the manuscript, the authors only observed the physiological consequences of manipulating this pathway based on the eye phenotypes, and in the discussion, many hypotheses were raised based on the already available literature, which shows that much is already known about the Hippo pathway. The advances shown in this study are limited to the description of the signaling pathway itself and to the eye morphology. As a suggestion, the authors should explore the knowledge of their findings in order to understand how we can use them to achieve advances in other fields and physiological conditions. For example, only at the end of the discussion, did the authors raise the questions that would really push their discoveries a step forward, namely how this mechanism acts during the response to tissue wounding and whether the mammalian orthologs of Lgl and Vap33 also act via these mechanisms to control tissue growth in mammals. It would be interesting if the authors could direct their research efforts to understand if the proteins identified can be targeted to improve wound healing or to delay aging for example. Altogether, the authors present an interesting study but, at this moment, it still lacks the significance and novelty needed for publication. We encourage the authors to keep up their good work to address these suggestions, which will definitely improve the quality of their study.

      Response: We respectfully disagree with the reviewer’s comments regarding the significance of our study. On the contrary, our study is significant since it has discovered a mechanism linking Lgl and Vap33-RtGEF/Git/Arf79F and the V-ATPase to the regulation of the Hippo pathway, an important tissue growth regulatory and tumour suppressor pathway. The Drosophila eye epithelium is a highly validated model for exploring mechanisms that are relevant to human epithelial biology and cancer. Whilst extending our studies of the mechanism by which Lgl controls the Hippo pathway to wound healing and mammalian systems would be the next step, this is beyond the scope of this discovery paper.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary This manuscript investigates potential mechanisms through which the lgl gene might affect the Hippo signaling pathway. The authors employ a combination of physical interaction studies and clonal analysis in Drosophila eye discs to investigate potential links between lgl and other genes. Some of the results are intriguing, but the analysis is rather preliminary, and there are technical concerns with some of the results presented.

      Main issues - The authors propose effects of genes involved in vesicle trafficking and acidification in Hippo signaling, but there is no clear cellular mechanism described by which these effects could be mediated. This deserves further consideration. eg if they think there are effects on the localization of Hippo, this could be directly examined. In the Discussion, the authors suggest that "The V-ATPase might therefore act to inhibit Hippo pathway signalling by blocking the interaction of Lgl/Vap33/RtGEF/Git/Arf79F with Hpo in vesicles, thereby altering Hpo localization and inhibiting its activity." but Hippo is a cytoplasmic protein and has never been reported to be within vesicles.

      Response: Whilst Hpo is a cytoplasmic protein there is evidence that it could also be associated with vesicles, since Hpo pathway components bind to several endocytic proteins by mass spectrometry analysis (Kwon et al., 2013, Science, PMID: 24114784; Verghese and Moberg, 2020, Front. Cell Dev. Biol., PMID: 32010696). We have previously published that Hippo localization is altered in lgl mutant tissue (Grzeschik et al., 2010, Curr. Biol., PMID: 20362447). For a better precision, we have updated the wording to state that the proteins described in our manuscript may alter Hippo localization “on endosomes” as opposed to the previous “in vesicles”.

      • The Yki stains in Fig. 1 are confusing. The nature of the signal throughout the wing disc looks very different in 1A vs 1B vs 1C, this needs to be explained or re-examined. Fig 1C (wts RNAi ) seems to show an elevated Yki signal in some cells, and lower in others in - prior studies have reported that wts affects the nuclear vs cytoplasmic localization of Yki, but not its levels, so this needs to be clarified.

      Response: There are some tissue folds in the eye disc tissues that might be confusing the reviewer, but Yki nuclear staining is lower in Vha68-2 mutant clones, and higher in wts knockdown and Vha44 over-expressing clones (arrowheads). When Yki is concentrated in the nucleus the staining appears more intense, as it does in the wts knockdown clones. Similar results on Yki staining upon Hippo pathway impairment in epithelial tissues have been obtained by other Hippo pathway researchers (eg PMID: 20362445, __PMID: 19900439, PMID: __19913529, __PMID: __26364751).

      • In Fig 1D the clones appear to have different effects in different regions of the eye disc; the authors should clarify. Also, the disc in 1D appears much younger than the discs in 1A-C, but similar age discs should be used for all comparisons.

      Response: All eye discs are from wandering 3rd instar larvae, but the mounting of the samples on the slide and the confocal Z-section could account for apparent different regions of the eye disc showing stronger upregulation of Ex-LacZ and Yki staining. The data has been statistically analysed from multiple eye discs and the effects observed are significantly different to the control (as plotted in Fig 1E).

      • The authors should clarify whether any the manipulations they perform are associated with Jnk activation, as this could potentially provide an alternative explanation for downregulation of Hippo signaling.

      Response: Lgl mutant clones only upregulate the JNK target MMP1 in some cells at the border of the clones but show elevated Yki activity within the clones. Vha44 overexpressing clones do show upregulation of JNK signalling (Petzoldt et al., 2013, Dis Model Mech., PMID: 23335205), but since JNK signalling is known to inhibit Yki activity in the eye epithelium (PMID: 22190496), it is unlikely that the upregulation of Yki activity (downregulation of Hippo signalling) in Vha44 overexpressing clones is due to JNK activation.

      • The authors report in Fig 2C,E that over-expression of Vap33 reduced expression of Diap1, which they interpret as evidence of increased Hippo pathway activity, but this experiment is lacking essential controls, as the apparent reduction of Diap1 could simply reflect increased cell death or a change in focal plane, and indeed the difference in the label stain makes it look like these cells are undergoing apoptosis. Thus it's important to also have a stain for a neutral protein, or at least a DNA stain. Additionally, it is important to stain for at least one additional marker of Hippo pathway activity (eg ex-lacZ or Yki localization), as there are other pathways that regulate Diap1

      Response: We have previously examined the effect of Vap33 overexpressing clones on the Notch signalling pathway and do not see a reduction in Notch target gene expression relative to the control (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig 3). Thus, although there might be some cell death in Vap33 overexpressing clones (possibly due to lower Diap1 levels), it is unlikely that cell death per se results in lower Diap1 levels. We are unable to conduct further experiments to examine other Hippo pathway activity markers since my lab is now closed.

      • In Fig. 4 the authors perform PLA experiments to examine potential association between various pairs of proteins, but they don't show us key controls. They report in the text using single antibodies as negative controls, but this doesn't control for non-specific localization of antibodies. The better negative control is to do the PLA experiments in parallel on tissues lacking the protein being detected (eg from animals not expressing the GFP- or RFP-tagged proteins they are examining). Also, there is a lot of variation in the apparent signals shown in different PLA experiments in fig 4, the authors should comment on this.

      Response: We have previously used the PLA assay to examine Lgl and Vap33 interactions (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig 2) and have conducted an experiment expressing Vap33 tagged with HA via the GMR driver in the posterior region of the eye disc and then detected Lgl-HA protein interactions, which only showed PLA foci in the posterior region where Vap33-HA is expressed but not in the anterior region where Vap33-HA is not expressed. This may be thought of as the best possible control since these differentially expressing regions were part of the same tissue sample. Furthermore, in our previous study (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig S2), we conducted a negative control PLA using the GFP and Vap33 antibodies in eye tissue not expressing GFP-Lgl and observed no PLA foci. We have edited the text to refer to these controls.

      The variation in PLA signal may be due to low levels of expression of certain proteins or lower levels of protein-protein interactions. We have edited the text to add this explanation.

      • The authors claim that RtGEF mutant cells increase Diap1 expression, and that Vap33 over-expression reverses this effect (Fig. 5). The effect of RtGEF looks very subtle and variable, it should be confirmed by examining additional reporters of Hippo pathway activity. It also seems like the disc in 5A is at a different stage &/or the quantitation is done from a different region as compared to the disc in 5C.

      Response: RtGEF mutant cells have also been shown to upregulate the Yki target, Ex-LacZ (Dent et al., 2015). Unfortunately, we were unable to construct an Ex-LacZ RtGEF mutant stock and there was no available Ex antibody.

      For Diap1 quantification, clones were chosen just posterior to the morphogenetic furrow of each eye disc and multiple clones were analysed relative to the adjacent wild-type clones in many samples and quantified and plotted in Fig 5E.

      • The analysis of the influence of Vha68-2 mutant clones, and their genetic interaction with Git, similarly suffers from missing controls and incomplete analysis. Additional Hippo reporters besides just Diap1 should be examined. The Diap1 analysis which shows reduced expression needs examination of neutral controls or nuclear markers to assess potential apoptosis within clones, or changes in focal plane.

      Response: We have also examined the effect of Vha68-2 clones on Ex-LacZ expression (Figure 1) and show that it is also reduced relative to the surrounding wild-type clones.

      We have previously examined Vha68-2 mutant clones for the expression of a Notch pathway target (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig S1) and show with DAP1 staining that cells are in the same plane and are retained in pupal retina, so are not dying. We now refer to our previous study in the text.

      Similarly, the analysis of Arf79F mutant clones in Fig 7E,G is compromised by lack of controls for viability and tissue layer, and analysis of an additional Hippo reporter is once again essential.

      Response: We don’t believe DAPI stains are necessary as the GFP membrane/cytoplasmic staining clearly shows the outline of the cells and where the nucleus is in the mutant clones and shows that the cells are intact and not dying.

      Reviewer #2 (Significance (Required)):

      The strength of the study is the potential dissection of novel connections between the lgl tumor suppressor and the Hippo pathway. However, there are signifiant limitations due to the preliminary nature of the study, which is incomplete and missing essential controls. If these limitations are overcome the work will be of interest to specialists in the field.

      Response: We are hoping that our explanations and responses to the main issues above alleviate concerns regarding controls.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, Portela and colleagues identified new regulators of Hippo pathway downstream of the core apico-basal polarity protein Lgl. While the impact of Lgl depletion of Yki activation was already characterised both in Drosophila and Vertebrates, the mechanism connecting these two pathways was still unclear. Using the Drosophila eye, mosaic analysis, epistatic analysis and mass-spectrometry, they identified two routes through which Lgl depletion can lead to Hippo pathway downregulation and eye overgrowth. This regulation required the previously characterised Lgl interactor Vap33, which on the one hand activates Hippo by inhibiting the V-ATPase, and on the other hand activates Hippo through its interactions with the actin regulators Git, RtGEF (two previously characterised regulators of Hippo, https://pubmed.ncbi.nlm.nih.gov/25484297/) . They also identified another regulator of Hippo downstream of Lgl, Arf79F, whose ortholog interact with Git in mammals and is also in close proximity with Hippo, Git and RtGEF in Drosophila, and whose depletion abolish Hippo downregulation and eye overgrowth in Lgl mutant. This is a well performed study which identified new links between Lgl and regulation of the Hippo pathway. Many of them are conserved in mammals and may be relevant in pathological conditions associated with Lgl loss of function and Yap missregulation. The experiments are well conducted with a quite thorough epistatic analysis combined with many assays to characterize protein interactions. Admittedly, the molecular mechanism remains uncharacterised and some experiments may help to indicate putative mechanisms, but the characterisation of these news regulators and clear genetic interactions results constitute already solid and interesting data. I have some suggestions though that could help to reinforce the conclusions.

      Main suggestions :

      1. While the precise molecular mechanisms is not absolutely necessary, it would be interesting to document the subcellular localisation of these new Hippo regulators in WT and Lgl mutant context (Git, RtGEF Vap33 and Arf79F), either with Antibody if they exist, or with fusion protein (which for a good part were already generated for the PLA results). This may reveal obvious misslocalisation which would support the role of Lgl as a scaffolding protein that maintain proper subcellular localisation of these factors.

      Response: Whilst this experiment would extend the study, we are unable to do this since my lab has now closed.

      Most of the epistatic experiments focus on factors that rescue the overgrowth and increase of diap1 expression in Lgl mutant. Did the author test if any of these core regulators are sufficient to recapitulate Lgl mutant eye phenotype, for instance Vap33 KD in the eye, or Arf79F overexpression. Negative results would still be informative as they would point to the existence of other downstream regulators of the eye phenotype

      Response: Vap33 knockdown by RNAi in clones does not phenocopy the lgl mutant mosaic adult eye phenotype (Portela et al., 2018, Sci. Signal., PMID: 29871910, Fig 2), presumably due to other functions of Vap33. We have added further details regarding this point In the Discussion.

      We have not examined Arf79F overexpressing clones.

      It is at the moment hard to interpret the relevance of the results obtained by PLA. While there are some negative controls based on the absence of secondary antibody, what is the number of particle obtained for two non-interacting cortical proteins ? Since this is based on proximity, I would expect that some positive particles would still appear by chance, but much less than for two physically interacting proteins or subunits of a complex. Could the author provide such a negative control by testing for instance Git/RtGEF with another non-interacting cortical protein ? That would help to assess the relevance of the conclusions based on PLA.

      Response: The PLA is a robust assay to assess protein-protein interactions of proteins that are

      Some of the epistatic links are a bit hard to interpret at the moment, and additional epistatic test may be relevant. For instance, the increase of diap1 upon Git depletion in the Vha68 mutant (Figure 6) is used to conclude that Git is required for the Hippo upregulation upon reduced V-ATPase activity. However this could be compatible with two independent branches regulating Hippo (in an opposite manner), which is more less what is suggested by the authors in their conclusion and the model of figure 8. I would suggest to reformulate this conclusion in the result part. Similarly, there is currently no experimental exploration of the epistatic link between Arf79F, Git and RtGEF (which is based on results in mammals). It would be interesting to check if Git and RtGEF mutant phenotype (Hippo downregulation) can also be suppressed by downregulation of Arf79F.

      Response: We have now added further explanation to the result section regarding Fig 6.

      Unfortunately, we are unable to do further experiments since my lab is now closed.

      Apart from very obvious phenotype (eye in Lgl mutant mosaic) it is a bit hard to interpret the picture of adult eye provided in this study (specially for mild phenotype). Could the authors provide more explanation in the legends, and if possible some quantitative evaluation of the phenotype when relevant? Otherwise, apart from obvious rescue of the Lgl mutant, it is a bit hard to interpret the other genotypes (e.g. : Vap33OE, RtGEF mutant, Vha68 mutant)

      Response: We have added more explanation of the adult eye phenotypes in the text/fig legends.

      Other minor points :

      1. I would recommend when possible to clearly indicate in Figure 8 which part of the pathway are clearly documented in this study, and which part are still hypothetical (eg: link with PAK).

      Response: We have re-drawn the model figure to highlight what we have found in this study by adding orange arrows between Lgl-Vap33-RtGEF/Git-Arf79F-Hpo and Lgl-Vap33-V-ATPase and V-ATPase-Hpo.

      1. Page 4, the sentence "as aPKC's association with the Hpo orthologs, MST1/2, and uncoupling MST from the downstream kinase, LATS (Wts), thereby leading to increased nuclear YAP (Yki) activity [17], consistent with what we observe in Drosophila [5]." may need to be reformulated (at least I had trouble to understand it).

      Response: We have edited the sentence to "In mammalian systems, deregulation of Lgl/aPKC impairs Hippo signalling and induces cell transformation, which mechanistically involves the association of aPKC with the Hpo orthologs, MST1/2, thereby uncoupling MST from the downstream kinase, LATS (Wts) and leading to increased nuclear YAP (Yki) activity [17], consistent with what we observe in Drosophila [5]."

      1. Page 11 : "a decrease in Diap1 expression was observed and clones were smaller than wild-type clones (Fig 7E), suggesting that the Arf79F knockdown clones were being out-competed" I am not sure one can conclude from this that the clone are "outcompeted" (which would suggest at context dependent disappearance of clone, while here the data could be totally compatible with a cell-autonomous decrease of growth and survival). This statement would only make sense if global eye depletion of Ar79F had no adult eye phenotype.

      Response: We have edited the sentence to "a decrease in Diap1 expression was observed and clones were smaller than wild-type clones (Fig 7E), suggesting that the Arf79F knockdown clones have reduced tissue growth ----."

      Reviewer #3 (Significance (Required)):

      This study identifies regulators of Hippo which through their interactions with Vap33 explains for the first time how Lgl depletion leads to Hippo misregulation (without impairing apico-basal polarity). This is an interesting study based on epistatic analysis and mass-spectrometry and identify several regulators conserved in mammals. While the molecular mechanism remained to be explored, it clarifies for the first time how Lgl depletion ( a core regulator of apico-basal polarity) leads to Hippo downregulation and tissue overgrowth, a phenotype also observed in mammals and characterised several years ago in Drosophila. The authors previously characterised the interaction between Vap33 and Lgl and its role in the regulation of Notch signaling through the V-ATPase. This study nicely complement these previous results and connect now Vap33 with Hippo and Lgl while answering a long unresolved question (how Lgl depletion affect Hippo pathway).

      This results will be interesting for the large community studying the hippo pathway, apico-basal polarity and tissue growth. It also outlines interesting factors that could be relevant for tumour neoplasia and hyperplasia.

      I have expertise in epithelial biology, apoptosis, cell competition, Drosophila, cell extrusion, mechanobiology, morphogenesis and growth regulation.

      Response: We thank the reviewer for recognizing the significance of our study.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this article, the authors delve into an intriguing topic, aiming to enhance our understanding of the organization of the mitochondrial genome of T. gondii, a parasite of significant importance in both human and animal health contexts.

      In essence, their approach involves enriching mitochondrial material, followed by genome sequencing and the analysis of mitochondrial short RNAs. They achieve a remarkable depth of mitochondrial sequencing and generate valuable RNA data. Furthermore, their efforts lead to the discovery and annotation of new short RNAs.

      Overall, the article is well-crafted and presents compelling results. However, it's worth noting that, at times, the authors appear somewhat self-congratulatory, and certain results might be perceived as overly ambitious. Nevertheless, the discussion is aptly constructed.

      Major comment:

      They assert certain discoveries that had already been reported. Notably, they adapt an existing protocol for mitochondrial enrichment and describe it as 'We developed a protocol to enrich T. gondii mitochondria.' It's worth noting that they neither reference a more recently described protocol (PMC6851545) nor compare the performance of their modified protocol with the original.

      The protocol they employ does not seem to yield exceptionally high success rates, as mitochondrial DNA constitutes less than 10% of the total sequenced DNA.

      Additionally, they frequently mention the identification of specific combinations of sequence blocks previously identified by Namasivayam et al. (PMC8092004), which was also discussed in Namasivayam et al. 2021."

      Missing in the supplementary material are basic details on the sequences performed. Distribution of mitochondrial reads length, depth, etc.

      Further clarification is needed for Figure 2. Specifically, the frequency units or combinations of frequency (A, B, and C) are not clearly explained. While the matrix's asymmetry suggests a 5'- 3' orientation difference, this orientation difference is not explicitly specified (B). Additionally, the fragment Mp does not appear in the block combination figure (C).

      Some points to improve the introduction:

      Provide an evolutionary context for the following phrase: 'An idiosyncratic feature of Apicomplexa is a highly derived mitochondrial genome.' Specify what you intend to emphasize.

      Line56: The sentence must begin with a capital letter

      In line 58 "Nuclear genes encoding proteins with functions in mitochondria contribute strongly to P. falciparum and T. gondii cell fitness" Although it is mentioned later, it would be more effective to introduce the fact that all but three genes are encoded in the nucleus.

      Line68: "Apicomplexan mitogenomes usually code only for three proteins" It seems to me that 'usually' should not be included.

      Line 65-67: The sentence should include that the mitochondrial genome is composed of a total of 20 blocks of repeating sequences organized in multiple DNA molecules of varying length and non-random combinations

      At the end of the introduction, the authors state that they have developed a protocol for mitochondrial enrichment. The text should be modified taking into account that: 1- The new protocol is an adaptation of another existing protocol. In fact, the Methods the authors say the protocol was "slightly" modified. 2- There is already existing mitochondrial enrichment protocol available [Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6851545/#mmi14357-bib-0074]. In any case, they should consider performing a comparative analysis between the proposed protocol and existing ones to determine its relative effectiveness. It should be noted that the proposed protocol enriches in organelles (including the nucleus and apicoplast), but when sequencing DNA, mitochondrial DNA accounts for only 5% of the total reads, which may raise doubts about its overall efficacy.

      Some points related to Results section:

      Lines 113-115: 'To distinguish between NUMTs (nuclear DNA sequences that originated from mitochondria) and true mitochondrial sequences, it is necessary to enrich mitochondrial DNA.' I disagree with this sentence. NUMTs, in general, consist of very short sequences. With long reads, it is relatively straightforward to differentiate mitochondrial sequences from those nuclear sequences that have small mitochondrial fractions. In my opinion, even many Illumina reads can be confidently identified as belonging solely to the mitochondria. I found this article that supports this argument, indicating that the majority of NUMTs are less than 100 nucleotides in length [Reference: https://pubmed.ncbi.nlm.nih.gov/37293002/].

      Lines 166-168: 'A previous sequencing study used Oxford Nanopore sequencing technology (ONT) to identify combinations of sequence blocks in T. gondii mitochondria (Namasivayam et al. 2021).' However, it's important to note that Namasivayam's group did not merely use ONT to identify combinations of blocks; rather, they discovered, identified, and defined these combinations based on sequencing with long reads.

      Line 177: "The length of mitochondrial reads ranged from 87 nt to 17,424 nt" It would be beneficial to include a histogram depicting the length distribution of the obtained reads. It's worth noting that nanopore reads tend to be shorter than Illumina reads

      Line 194-195 "we found that only a small fraction of all possible block combinations are prevalent within the genome" this has been previously described (PMC8092004)

      Line 201. "This indicates that the genome's flexibility is limited and that not all block combinations are realized". This is consistent with the findings published by Namasivayam et al. in 2021, which have already established that the combination of the 21 blocks is non-random.

      Line 205: "All combinations are well covered in our ONT results and helped to refine block borders relative to previous annotations (Fig. S2)" In the supplementary materials the authors say: "However, the blocks Fp, Kp, and Mp frequently occur separately in the mitochondrial genome We therefore treated Fp, Kp and Mp as separate blocks and have shortened the blocks F, K and M accordingly". As far as I understand, for this very reason, Namasivayam and collaborators annotate them as partial fragments, which may appear in other regions but are, in turn, parts of larger F, K, and M fragments. To redefine the segments F, K, and M without the sequences corresponding to Fp, Kp, and Mp, as shown in Figure S2, these fragments should be distinct from the 'partials.' In other words, segments of the type (F minus Fp), (K minus Kp), and (M minus Mp) should appear in the reads, and should be distinguishable from Fp, Kp, and Mp. If this distinction is made, I am satisfied with the new definition.However, if such a separation is not evident, it seems important to clarify it in the text or to reconsider this new definition.

      Lines 221-223: "This suggests that there is no need to postulate mechanisms of genomic or posttranscriptional block shuffling to arrive at full-length open reading frames." The authors argue that invoking mechanisms of genomic or post-transcriptional block shuffling is unnecessary to explain the presence of full-length open reading frames, given that genes represent 2-3% of mitochondrial sequences. However, there is a missing estimate regarding the probability of encountering all three genes within a single molecule or mitochondrial genome, as well as the total number of sequenced mitochondria. Consequently, the statement appears overly assertive. In the absence of alternative mechanisms for generating complete genes, this would mean that at most only 1646 mitochondrial genomes would have been sequenced. To comprehensively address this issue, the authors should consider discussing this scenario further. They should also provide information about how many reads they found containing all three genes and how many contained two of the genes.

      Lines 249-250 "using the block combinations identified here by ONT sequencing " which is the difference between blocks identified here with those on Namasivayam ? The division of M, K and F fragments?

      Line 287: "The six remaining small RNA fragments are specific to T. gondii" I would suggest being more cautious in this sentence by stating that they were not found in other organisms. Given the similarity of the mitochondrial genome between T. gondii, N. caninum, and other coccidians, it would be expected to find them in these organisms as well.

      Line 300 "Among the novel small RNAs identified, there is also a class that was only detectable due to our insights into genome block combinations." A valid strategy is to map the small RNAs to the generated nanopore reads or to an assembly made with these reads, rather than solely relying on the single blocks or combinations of blocks, as this approach would yield the same result.

      Line 444: "Upon closer scrutiny, however, the reshuffling appears limited to specific block borders and is not random" This was already established by Namasivayam et al 2021.

      I would like to highlight the potential for a more comprehensive examination of the mitochondrial genome in the discussion. While the proposed explanations for the presence of sRNAs at the 'block borders' appear plausible, it's worth noting that the definition of these blocks is artificial rather than biological. I think it is interesting to discuss without the concept of block sequences, but of sequences existing in the mitochondrial genome. Therefore, it's important to discuss whether these sequences (the block borders) are consistently present in all mitochondrial genomes. The total cumulative length of the blocks is 5.9 Kb, which is relatively small and comparable to one of the smallest mitochondrial genomes on record. It is conceivable that recombination and the generation of new sequences play a role in expanding genomic space for encoding, such as RNAs.

      Line 535-536 "We developed a protocol to enrich T. gondii mitochondria and used Nanopore sequencing to comprehensively map the genome with its repeated sequence blocks." I find this sentence to be somewhat assertive, especially considering that they modified an existing protocol and obtained results that may not be optimal. Additionally, they have not compared their protocol with other available methods for mitochondrial enrichment.

      Some points related to Method section: In none of the experiments is it specified how many parasites were initially used as a starting point

      "Masking NUMTs in the T. gondii nuclear genome" it's unclear whether the authors utilize all hits or filter the results of BLASTN. It would be helpful if they specify the criteria for filtering, such as identity percentage or query coverage. Additionally, it's not clear how they generate the GFF3 file from the BLAST results, or whether they instead create a BED file. Providing clarification on this process would enhance the reproducibility of their methods. Moreover, it would be beneficial if the authors include information regarding the number of sequences they intend to mask, the average length of the NUMTs, and the total percentage of the genome these masked sequences represent.

      Line 657 "Mapping results were filtered using SAMtools"<br /> The text does not specify the filtering criteria or the parameters used for this process.

      Line 673 establish "No matching reads were found" in the "Sequence comparisons of ONT reads found here with published ONT reads for the T. gondii mitochondrial genome" but in the results the authors say: "While smaller reads of our dataset are found in full within longer reads in the published datasets, we do not find any examples for reads that would be full matches between the dataset. Could you provide a more detailed explanation? Specifically, I would like to know how many reads from the dataset (including their length) are also present in other datasets, and at what minimum length do they cease to coincide?

      689 - The text does not specify the filtering criteria or the parameters used for Samtools filtering process.

      Lines 689-693 Please describe better the methodology used.

      Line 696: the program is fastp not fastq (Chen et al. 2018)

      Line 697: what do you mean only the ends of the reads were mapped? how many bases? Or do they mean that they map the reads fowrards and reverse reads?

      Significance

      In this article, the authors delve into an intriguing topic, aiming to enhance our understanding of the organization of the mitochondrial genome of T. gondii, a parasite of significant importance in both human and animal health contexts.

      In essence, their approach involves enriching mitochondrial material, followed by genome sequencing and the analysis of mitochondrial short RNAs. They achieve a remarkable depth of mitochondrial sequencing and generate valuable RNA data. Furthermore, their efforts lead to the discovery and annotation of new short RNAs.

      Overall, the article is well-crafted and presents compelling results. However, it's worth noting that, at times, the authors appear somewhat self-congratulatory, and certain results might be perceived as overly ambitious. Nevertheless, the discussion is aptly constructed.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02012

      Corresponding author(s): Frederic, Berger

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank reviewers for useful suggestions and comments on our manuscript which helped to improve and strengthen our conclusions. Our point-by-point answers are below. We have answered most of the points raised by the reviewers and added numerous new experimental data including detailed structural and biochemical analyses that led to support further that BCP4 (and not BCP3) is the plant functional counterpart of MDC1 because in response to DNA damage it binds phosphorylated H2A.X and recruits the MRN complex. In addition, we provide further support to the phylogenetic analysis and evidence for the plant counterpart of PAXIP1.

      We believe that our revised manuscript which includes a set of new experimental data strongly support our main conclusion that BCP4 is a functional counterpart of metazoan MDC1.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      MDC1 is a key regulator of DNA damage responses (DDR) in animals. MDC1 has multiple protein domains, in which the BRCT domain binds γH2A.X. However, plants lack the homolog of MDC1. In this study, the authors found that BCP4 binds γH2A.X and proposed that BCP4 is a functional counterpart of MDC1, which will greatly enhance our understanding of plant DDR pathway. I have the following concerns.

      1. The relationship between BCP3 and BCP4 needs to be clarified. Line 255, the authors mentioned that"we conclude that BCP3 and BCP4 have functional properties as human MDC1". In the Abstract, the authors mentioned that "we identified BCP4 as a candidate ortholog of human MDC1". I am confused about the conclusion. Both BCP3 and BCP4 are or only BCP4 is MDC1? In addition, in BCP3 and BCP4, only their BRCT domains share homology with MDC1. They lack other domains of MDC1. Therefore, "ortholog" may not be an appropriate term. I think "functional counterpart" may be a better term.

      Response: Our analysis emphasizes the fact that human MDC1 is very derived from an ancestral form MDC1 that did not share most domains found in MDC1 from mammals. Because it is still difficult to establish with certainly what the ancestral MDC1 was, we agree that functional counterpart is a more correct term, so we changed this accordingly throughout the manuscript.

      BCP1-4 all contains tandem BRCT domains. I am wondering whether it is possible to figure out why only BCP3 and BCP4 bind γH2A.X through sequence analysis. Are there any key residues essential for γH2A.X binding?

      Response: We used AlphaFold models of tBRCT domain of BCP1, BCP2, BCP3, and BCP4. While in Alphafold models the tBRCT domain of each BCP protein largely overlaps with a structure of human MDC1 tBRCT domain, only the tBRCT domain of BCP3 and BCP4 are predicted to make contacts with γH2A.X similar to that of human MDC1. Although residues that are involved are not fully conserved between BCP3/4 and human MDC1 we obtain in vitro data supporting that the interaction of BCP4 is mediated by a comparable pocket of three key residues that contact the phosphate group of γH2A.X. See also answers to comments of Referee #2 and new Figures 3 and 4, corresponding description on page 8-9, and Supporting figure 3.

      Line 183, "On an unrooted phylogenetic tree, these two proteins clustered with MDC1 and PAXIP1 (Figure 1B).". In Figure 1B, MDC1 is closer to BCP3 and BCP4 than PAXIP1 and PAXIP1 is closer to BCP2 than MDC1. If the authors want to include PAXIP1 in Figure 1C, the authors should include BCP2 as well. In the γH2A.X binding assays, I do not understand why the authors tested BCP1 instead of BCP2. In Figure 2D, why bcp2 was not included?

      Response: We created a new alignment for Figure 1C including BCP2 tBRCT domain and the tree that includes all BCP BRCT domain (Figure 1D) does support a close relation between MDC1 and BCP3 and 4 and PAXIP1 and BCP1. As we stated on page 5-6 lines 175-178, BCP2, also contains acetyltransferase domain, which is unique for plant BCP2 protein. Based on its domain organization, BCP2 was not considered as a candidate for MDC1 homolog, and we did not perform mutant complementation. This is why after our initial analysis of bcp mutants (DNA damage sensitivity, formation of gammaH2A.X, and phylogeny), and based on similarities with MDC1 and PAXIP1 we focused on bcp1, 3, and 4 mutants and the corresponding proteins. The function of BCP2 remains to be investigated, but this is out of the scope of this manuscript that is primarily dedicated to find the functional counterparts of MDC1 and PAXIP1.

      The expression level of BCP1-4 in the mutants need to be examined using qRT-PCR. Especially, for the bcp3 mutant, which is a weak allele.

      Response: We did not perform this experiment, because it was done in Vladejic et al., 2022 and expression data are available from various genomic dataset on TAIR.

      The authors used "bleomycin" or "zeocin" in different parts. Please be consistent.

      Response: We consistently use Bleomycin for treatment of seedlings followed by western blotting and Zeocin for true leaf assay. These two agents produce DNA double strand brakes in similar ways, and we could show previously that levels of γH2A.X and γH2A.W.7 are similar when using these two agents (Rosa M, Mittelsten Scheid O Bio. Protoc. 4:e1093. doi: 10.21769/BioProtoc.1093: Lorkovic et al., Curr Biol. 2017, doi: 10.1016/j.cub.2017.03.002). Zeocin was chosen for true leaf assays because we observe lower variation between batches and biological repeats compared with bleomycin.

      1. Figure 3E and 3F, please indicate the treatments of the upper and lower panels.

      Response: Thank you for pointing this out. This has been indicated in the corresponding legend of the new Figure 3 A - C.

      Line 338, "bcp1 mutants show reduced homologous recombination rates (Fan et al., 2022; Vladejić et al., 2022; Yu et al., 2023)". The bcp1 mutant was not reported in Fan et al. paper.

      Response: This sentence has been changed to accurately describe data in each of the mentioned papers.

      Line 40, please add a comma after "In ". Line 331, please add a comma after "In mammals". animal

      Response: This has been corrected.

      Line 123, "only BRCA1 and BARD1 were described in plant lineage". Additional BRCT proteins were described in plants, including XIP1 (Nat. Commun. 13:7942), BCP1/DDRM2 (New Phytol. 238:1073-1084; Front. Plant Sci. 13:1023358), and DDRM1 (PNAS, 119: e2202970119).

      Response: This sentence refers to known BRCT domain mediator/effector proteins. From the published data about XIP1, BCP1/DDRM2, and DDRM1, it is not possible to assign these functions to proteins in question. Nevertheless, we changed this sentence to avoid ambiguous interpretation and we later in the text introduce XIP1, BCP1/DDRM2, and DDRM1 proteins as needed.

      Reviewer #1 (Significance (Required)):

      This study identified BCP4 as a functional counterpart of MDC1, which filled the gap of plant DDR signaling.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ In this study, Frédéric Berger and colleagues identified BCP4 in Arabidopsis thaliana as a potential plant orthologue of vertebrate MDC1. The conclusions are based on both in silico analysis (phylogenetic analysis) and in vitro biochemical and cell biological experiments. BCP4 loss causes sensitivity of DNA damage. Moreover, BCP4 binds to a phosphopeptide derived from the C-terminus of H2AX, via its C-terminal BRCT domains and forms foci in cells exposed to DNA damage, which co-localize with gammaH2AX foci.

      Major comments: The conclusions are generally supported by the data, but the evidence presented is still quite limited. For example, it is still possible that BCP4 recruitment to sites of DNA damage is mediated by another protein and not by direct interaction with gammaH2AX. To firmly conclude that BCP4 is an MDC1 orthologue, it is in my opinion essential to perform a (limited) mutagenesis analysis. The key amino acids in the BRCT domains that recognize gammaH2AX need to be mutated and it has to be shown that these mutants are defective for H2AX phosphopeptide binding and are not recruited to sites of DNA damage. Such residues may be tricky to identify, but one obvious candidate would be the Ser residue in beta1 (VLFS motif). In vertebrates, this is a Thr that directly interacts with the phosphate in gammaH2AX. Another possible critical site may be shortly before alpha2 (RTRN motif). In vertebrates, it is RTVK, and the K makes direct contacts with the phosphate in gammaH2AX. This function is perhaps carried out by an R. Structure prediction with alphafold may help to identify the most critical residues

      Response: We thank the reviewer for these suggestions. We used AphaFold to predict structures of tBRCT domains of all BCPproteins and compared them with structure of human MDC1 in complex with gamaH2A.X peptide. Based on these analyses we performed mutagenesis of critical amino acids in BCP4 based on their predicted interaction and their conservation. We showed that mutations of critical residues reduced or almost completely abolished binding of BCP4 to γH2A.X. These data are now part of the new Figure 4. See also corresponding description on page 8-9. In addition we provide genetic data that show that the foci formation of BCP4 depends on H2A.X (new Fig 3B and C). We did not attempt genetic complementation experiments with these mutants because it would take nine months to obtain stable transgenic plant lines expressing various mutant versions of BCP4 and the limitation of Arabidopsis transgenesis does not allow to control precisely the expression of transgenes, which could cause a difficult interpretation in this particular case.

      Another critical issue is the introduction of the study. This needs to be revised, because the literature is not correctly cited in several places. For example, the cited paper by Salguero et al., 2019 did not show that the PST repeats of MDC1 constitute a docking site of TP53BP1, but instead, that the PST repeats can bind to chromatin independently of gammaH2AX.

      Response: We thank the reviewer for spotting this mistake. We carefully checked all references and corrected all wrongly associated ones or used original reports instead of reviews.

      Also, we did re-write some parts of the Introduction as referee #1 also asked for some clarification.

      The data are generally well presented and convincing. The only thing that needs to be added is a quantification of the microscopic analysis (e.g. number of foci per cell, or similar).

      Response: We quantified the foci number in all mutants reported in Figure 2C. These data are now included in the new Figure 2D. Optional: it would be interesting to address the question why plants seem to have two MDC1 orthologues. The longer BCP4 and the shorter BCP3. What is the functional difference between those? Do they perhaps distribute functions that are combined in one protein in vertebrate MDC1 on two different proteins? Response: Thank you for prompting us to address this outstanding question. We now provide evidence supporting that only BCP4 is a functional counterpart of MDC1. We show that a specific region of BCP4 but not BCP3 is able to interact with NBS1 of the MRN complex (see new Figure 6). Also, BCP3 is missing the N-terminal TQxϕ repeats present in BCP4. Although the function of these repeats is unknown at this point, these data together suggest some functional diversification between BCP3 and BCP4. We mention this on page 11, lines 372-374.

      Reviewer #2 (Significance (Required)):

      The strength of the study is the detailed phylogenetic analysis. Also, the biochemistry and cell biology is well done.

      Limitations are the lack of evidence that BCP4 carries out functions in the cell (beyond recognising gammaH2AX) that are carried out by MDC1 in vertebrate cells

      Response: We thank the reviewer for pointing out this important point. To address it we performed pull-down assays with TQxϕ and SQ/DWD regions of BCP4 with NBS1 and found that Arabidopsis NBS1 interacts with the SQ/DWD region, and that this interaction is mediated by FHA+tBRCT of NBS1. Based on Alphafold prediction, we performed further deletion and point mutation analysis of the SQ/DWD region and defined that the binding of NBS1 requires an alpha-helix comprising sequence that is not conserved in BCP3. So, we concluded that a sequence specific of BCP4 (not in BCP3) is capable of recruiting the MRN subunit NBS1.

      At this point we could not demonstrate this in vivo by analyzing NBS1 foci in BCP4 mutant background. Unfortunately, commercial antibodies for plant NBS1 or other subunits of the MRN complex are not available, and to get transgenic plants expressing fluorescent protein tagged NBS1 would require a period much longer than the time for reasonable revisions of a manuscript. Nevertheless, our in vitro interaction data strongly argue for BCP4 having function in binding MRN complex as human MDC1, although the mode of interaction of BCP4 with NBS1 is different from that of human MDC1 and NBS1.

      Please see the new Figure 6 and corresponding description on page 11-12.

      The study is of great interest to readers working on chromatin responses to DNA damage in plants.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary The authors set out to find proteins containing BRCT domain to isolate the readers of phosphorylated H2A.X in plants. Using systematic analysis of the BRCT domain proteome, they discovered 21 proteins. Further analysis showed that BCP3 and BCP4 are the ortholog of animal MDC1 and BCP1 is the animal ortholog of PAXIP1. They also extended their work to an evolutionary perspective, finding that BCP1 and BCP4 in plants and PAXIP1 and MDC1 in metazoans evolved independently form a common ancestor. However, this manuscript raises some concerns. Checkout the comments and questions below.

      Major comments: 1) If you think that BCP3 and BCP4 work as a mediator of DDR, can you show us that those mutants have a defect of DDR? The authors only assessed true leaf developing. Leaf developing is affected by not only DNA damage but also other factor. Therefore, authors should show us additional data showing the BCP mutant lines show defective of DNA damage response.

      Response: The “true leaf assay” is a classical assay for testing plant mutants for DNA damage sensitivity (Rosa M, Mittelsten Scheid O Bio. Protoc. 4:e1093. doi: 10.21769/BioProtoc.1093). If DNA damage occurs and is not efficiently repaired, meristematic cells in shoot meristem are arrested and do not divide, hence plants do not produce the first pair of “true” leaves after cotyledons expand. In this assay cotyledons open and grow normally as they are already fully determined and do not undergo any cell division after seed germination.

      In this assay the treated WT seedlings also show a reduction of the number of plants with true leaves as compared with untreated WT (100%). Furthermore, WT and mutant seedlings develop normally and comparably without Zeocin induced DNA damage.

      2) Do you have DNA damage sensitivity data for bcp3 bcp4 double mutants?

      Response: We obtained bcp3bcp4 double mutant and tested it for DNA damage sensitivity. The double mutant is slightly more sensitive than bcp4 single mutant, but not as sensitive as H2A.X mutant. The reason for this is presumably the nature of the bcp3 mutant allele available, with a T-DNA insertion located in the 5’-UTR with some residual expression of BCP3 protein as reported by Vladejic et al., 2022. We did not feel that this would improve the manuscript, so we did not include this data. To obtain a new mutant allele would take time and work beyond the reasonable time required for revision. In addition, since we show that the functional counterpart of MDC1 is BCP4, we did not think that it is relevant to pursue further the characterization of the function of BCP3 in the context of this manuscript.

      3) Some red algae have H2A.X but don't have BCP4 and BCP1 (Figure 4). In this case, how do they read the phosphorylated H2A.X? Can you discuss the point?

      Response: Actually, most red algae do not even have H2A.X. At this point we do not have data that could answer this question and it is difficult to make any prediction about this. Analysis of DDR system in red algae is totally beyond the scope of the current manuscript. See also answer to comment #5.

      4) L307-L312: I thought that the timing of the appearance of SQEF motif in H2A.X differ from the appearance of BCP4 from Figure 4. Why do you say that the evolution of BCP4 and H2A.X coincides?

      Response: we thank the Reviewer for pointing out the need for clarification.

      Histone H2A with a C-terminal SQEF/Y motif is categorized as H2A.X that distinguishes this variant from H2A.Z (not discussed here) and H2A itself. In Archaeplastida many algal species possess either H2A or H2A.X. Only in streptophytes the ancestral gene duplicated leading to neofunctionalization of both H2A and H2A.X and in this case H2A.X form a monophyletic clade. The evolution of BPC1 and 4 are slightly posterior or coincident with this neofunctionalized H2A.X variant, suggesting co-evolution in streptophytes.

      5) Some red algae don't have BCP1, BCP4 and H2A.X. How do they transfer the signal to downstream? Do you have any idea about this?

      Response: To address this interesting question we re-analyzed BRCT domain proteome of the red algae and again could not find any protein containing features of BCP4 present in green algae and land plants or in Opistokont MDC1.

      We did find that red algae without MDC1 do encode MRE11, RAD50 but not NBS1. Also, components of non-homologous end joining DNA repair pathway, Ku70 and Ku80 are conserved in these organisms. So, how some red algae cope with DNA damage remains enigmatic. Similarly unicellular red algae do not have the classical autophagy pathway. This is the result of the very strong genome reduction (Response: Thanks for this comment. We did change title of the manuscript to avoid ambiguity.

      Minor comments: 6) I think you should show us a schematic representation of BAP1 and PAXIP1 to compare both protein features.

      Response: We added schematic presentation of PAXIP1 to Supporting Figure 2B.

      7) L176-L178: Which data support this sentence? Response: The sentence in question: “BCP1 has two tBRCT domains positioned at the N- and C-terminus and a so far unrecognized C-terminal PHD finger which is present in all plant lineages except Brassicaceae (Supporting Figure S1A and S2A).”

      Response: Our data presented on Supporting Figure S1A (schematic presentation of BCP1 protein with indicated PHD finger consensus sequence) and S2A and Source data (alignment of PHD fingers in BCP1 in flowering plants, non-flowering land plants and multicellular green algae) clearly demonstrate the presence of a C-terminal PHD finger in BCP1 except in Brassicaceae. These can also be seen in the full complement of BCP1 sequences that are available in Source data.

      8) L271-L279: There are unreadable characters at "TQx_".

      Response: This very likely appeared during conversion into PDF file. We fixed this now.

      Reviewer #3 (Significance (Required)):

      Significance: General assessment: This study give us an idea how organisms have evolved the upstream system of DDR.

      Advances: This study extend the knowledge of DNA damage response in plants.

      Audience: broad and basic research

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      MDC1 is a key regulator of DNA damage responses (DDR) in animals. MDC1 has multiple protein domains, in which the BRCT domain binds γH2A.X. However, plants lack the homolog of MDC1. In this study, the authors found that BCP4 binds γH2A.X and proposed that BCP4 is a functional counterpart of MDC1, which will greatly enhance our understanding of plant DDR pathway. I have the following concerns.

      1. The relationship between BCP3 and BCP4 needs to be clarified. Line 255, the authors mentioned that"we conclude that BCP3 and BCP4 have functional properties as human MDC1". In the Abstract, the authors mentioned that "we identified BCP4 as a candidate ortholog of human MDC1". I am confused about the conclusion. Both BCP3 and BCP4 are or only BCP4 is MDC1? In addition, in BCP3 and BCP4, only their BRCT domains share homology with MDC1. They lack other domains of MDC1. Therefore, "ortholog" may not be an appropriate term. I think "functional counterpart" may be a better term.
      2. BCP1-4 all contains tandem BRCT domains. I am wondering whether it is possible to figure out why only BCP3 and BCP4 bindγH2A.X through sequence analysis. Are there any key residues essential for γH2A.X binding?
      3. Line 183, "On an unrooted phylogenetic tree, these two proteins clustered with MDC1 and PAXIP1 (Figure 1B).". In Figure 1B, MDC1 is closer to BCP3 and BCP4 than PAXIP1 and PAXIP1 is closer to BCP2 than MDC1. If the authors want to include PAXIP1 in Figure 1C, the authors should include BCP2 as well. In the γH2A.X binding assays, I do not understand why the authors tested BCP1 instead of BCP2.
      4. The expression level of BCP1-4 in the mutants need to be examined using qRT-PCR. Especially, for the bcp3 mutant, which is a weak allele.
      5. The authors used "bleomycin" or "zeocin" in different parts. Please be consistent.
      6. In Figure 2D, why bcp2 was not included?
      7. Figure 3E and 3F, please indicate the treatments of the upper and lower panels.
      8. Line 338, "bcp1 mutants show reduced homologous recombination rates (Fan et al., 2022; Vladejić et al., 2022; Yu et al., 2023)". The bcp1 mutant was not reported in Fan et al. paper.
      9. Line 40, please add a comma after "In animal". Line 331, please add a comma after "In mammals".
      10. Line 123, "only BRCA1 and BARD1 were described in plant lineage". Additional BRCT proteins were described in plants, including XIP1 (Nat. Commun. 13:7942), BCP1/DDRM2 (New Phytol. 238:1073-1084; Front. Plant Sci. 13:1023358), and DDRM1 (PNAS, 119: e2202970119).

      Significance

      This study identified BCP4 as a functional counterpart of MDC1, which filled the gap of plant DDR signaling.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, it is shown that cofilin severs actin filaments slowly when fascin is present. Authors show that this is due to slower cluster nucleation of cofilin on fascin-induced actin bundles. Interestingly, the authors show that cofilin binding promotes helicity in actin filament bundles which in turn promotes fascin exclusion and more cofilin clustering in adjacent filament bundles; thus, inducing local transmission of structural changes.

      The authors use an elegant approach, and the data is nicely presented. Overall, I

      consider that this manuscript is in good shape to be published. It might benefit from language editing, though.

      We thank the reviewer for their positive comments. We have edited the manuscript to improve its readability (changes are in blue in the manuscript).

      Reviewer #1 (Significance (Required)):

      According to me the significance of this manuscript is that elegantly shows the molecular details of the cofilin severing effect of fascin-induced actin filament bundles. The authors show that cofilin binding promotes helicity in actin filament bundles which in turn promotes fascin exclusion and more cofilin clustering in adjacent filament bundles; thus, inducing local transmission of structural changes.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this study, Chikireddy et al. perform a series of experiments in which they compare the efficiency of cofilin-mediated severing and actin filament disassembly on individual filaments versus bundles of different sizes from by the actin-bundling protein fascin. The key outcome, quite distinct from previously published conclusions by the authors themselves and other authors, is that fascin bundling actually reduces cofilin-mediated severing mostly because of much slower "nucleation" of cofilin clusters on fascin-bound filament bundles. Cofilin cluster formation is followed by local fascin removal, and the nucleation of a cofilin cluster on an adjacent bundle in the absence of fascin is strongly enhanced. The reason for the latter surprising observation is not entirely clear, but proposed to arise from cofilin-mediated changes in filament helicity of neighboring filaments. To my understanding, the main reason why fascin protects from cofilin severing here rather than enhancing it (as reported previously) is due to the lack of constraining of the induced, cofilin-mediated twist, because if this twist is constrained e.g. by anchoring of the bundles to the surface chamber, then severing by cofilin is accelerated.

      We thank the reviewer for their positive feedback on the manuscript. We have substantially edited the manuscript in light of the insightful comments of the reviewer (changes are in blue).

      Major comments:

      I think the study is very well done, most experiments are super-elegant and controlled; I really don't have any objections against the conclusions drawn, as most of what I have seen is totally justified and reasonable. So from a scientific point of view, I can easily agree with all the major conclusions drawn, and so in my view, this should be published fast.

      Minor comments:

      There are two minor points that could be addressed:

      1) I am not entirely convinced by the conclusions drawn from the EM images shown in Figure 6A, and in particular by the filaments in two-filament bundles locally twisting around each other (without breaking) at spatial sites lacking fascin and decorated by cofilin. This is hard to imagine for me, and the evidence for something like this happening is not very strong, as in the EM, only larger bundles could be observed. In addition, I am not sure that the braiding of filaments seen in the presence of cofilin is really occurring just locally on cofilin-decorated bundle segments and thus indeed coincides with loss of fascin as proposed in the scheme in Fig. 6B.

      Can the authors exclude that the braiding is not caused by some experimental artefact, as induced perhaps by sample preparation for negative staining?

      We thank the reviewer for raising this point. We have repeated the negative staining EM experiments several times and now show new images and quantification (new Supp Fig. 13). In our new series of experiments, the braiding that was previously shown in Fig. 6 proved difficult to reproduce and to quantify. We therefore decided to remove EM observations from the main Fig. 6, and we no longer present them as evidence supporting the mechanism that we propose for inter-filament cooperativity.

      From EM images, we now quantify the frequency of fragmentation of large actin filament bundles. We observed that bundles often terminate with the ends of their filaments in close proximity, consistent with sharp breaks due to co-localized cofilin clusters.

      We have rewritten this part of the result section in the manuscript which now reads : ‘To further investigate larger bundles, we imaged them using negative staining electron microscopy. In the absence of cofilin, filaments in bundles are arranged in a parallel manner, as previously reported in vitro (Jansen et al, 2011). Compared with the control, filament bundles exposed to cofilin show numerous sharp breaks (65 breaks per 122 µm of bundles, versus 4 breaks per 68 µm in the control. Supp. Fig. 13). This is consistent with bundle fragmentation occurring at boundaries of co-localized cofilin clusters.’

      Did the authors quantify the occurrence of such braided bundle segments with and without cofilin?

      How large are these braided segments on average when you quantify them? Would you also see them if you prepared the bundles for an alternative EM-technique, such as Cryo-EM, for instance?

      As mentioned in the answer to the previous point, the braided segments proved difficult to reproduce and quantify, and we have removed EM experiments from the main figure 6. Instead of the braided segments, we now quantify the severing of the bundles, and the distribution of filament ends at the extremities of the bundles (new Supp. Fig. 13).

      We have not tried Cryo-EM due to limited access to such experimental tools within the timeframe of the study.

      This may admittedly all be experimentally challenging, but would it be possible to combine the negative staining of filaments with staining for cofilin and/or fascin using immunogold technology, to prove that the braided segments do indeed correlate with high cofilin and low fascin concentrations? In the absence of such data, and in particular in the absence of a clear quantification, the proposal is too strong in my view. Finally, it would be nice (albeit not essential I guess) to also look at two-filament bundles. The authors stated these can not be easily generated due to the tendency of fascin to promote the formation of larger bundles, but can this not be titrated/tuned somehow by lowering fascin concentrations, to come closer in reality to what is proposed to occur in the scheme in Figure 6B? In any case, the way the data are presented right now appears to constitute a pretty large gap between experimental evidence and theoretical model.

      We agree with the reviewer that EM observations are limited and, alone, do not provide strong evidence in favor of braiding/super-twisting being the mechanism responsible for inter-filament cooperativity (please see our answers to the points above). We have performed negative staining EM assays at higher cofilin-1 concentration (500 nM) compared to microfluidics assays, in order for cofilin to quickly bind to filaments, even in large bundles, so that our chances to capture bundles targeted by cofilin would be high.

      Nevertheless, both microfluidics and EM observations point in the same direction : bundle fragmentation by cofilin is caused by the co-localized cooperative nucleation of cofilin clusters.

      2) I think that the proposal of cofilin-decorated filaments to "transfer" the resulting cofilin-induced changes in filament helicity onto neighboring filaments in the bundle, which is proposed to occur locally and in the absence of fascin is a bit vague, and difficult to understand mechanistically. Can the authors speculate, at least, how they think this would occur? Are there no alternative possibilities for explaining obtained results? Maybe I am missing something here, but with considering cofilin to be monomeric and only harboring one actin-binding site, this proposal of helicity transfer onto neighboring filaments seems inconclusive.

      On single actin filaments, the change of helicity induced by cofilin binding has been observed by many groups using EM and cryoEM (e.g. McGough et al, JBC 1997 10.1083/jcb.138.4.771; Egelman et al, PNAS 2011 10.1073/pnas.1110109108 ; Huehn et al, JBC 2018 10.1074/jbc.AC118.001843). These studies have revealed that actin subunits get ‘tilted’ relative to their original orientation along the filament long axis. This leads to the shortening of the helical pitch for cofilin-saturated actin filament segments.

      In our assays, the progressive binding of cofilin along a single filament creates a cluster where all actin subunits are tilted and the helical pitch of the filaments within the cluster is shortened (from a half pitch of 36 nm down to 27 nm). This change of helicity in a cluster induces the rotation of one end of the filament relative to the other (as we have shown previously in Wioland et al, PNAS 2019). Therefore, if two parallel filaments are stapled together, the local twisting of one filament causes the twisting of the other in the overlapping region.

      We have rephrased this point to more clearly explain this in the last paragraph of the results section:

      “From our kinetic analysis, we propose the following model that recapitulates the binding of cofilin to fascin-induced 2-filament bundles (Fig. 6D). Initially, actin filaments in fascin-induced bundles are in conformations that are less favorable for cofilin binding than isolated actin filaments. Once a cofilin cluster has nucleated, its expansion locally triggers fascin unbinding and prevents it from rebinding. The increase of filament helicity induced by cofilin causes a local twisting of the entire bundle, thereby changing the helicity of the adjacent filament in the fascin-free region facing the cofilin cluster. In this region, the increase in filament helicity enhances cofilin affinity, and thus locally promotes the nucleation of a cofilin cluster (inter-filament cooperativity).”

      We have tried to think of other alternative scenarios that might explain our observations, but none appeared to be valid.

      Reviewer #2 (Significance (Required)):

      General assessment:

      The strength of this study is that owing, at least in part, to the microfluidics devices employed and the careful biochemistry, the experimental setups are super-controlled and clean, and they are used in a highly innovative and elegant fashion. The simulations are also nice! A limitation is that it is not entirely clear how precisely the main observations can be translated to what's happening in vivo. The results are largely dependent on the bundles not being constrained I understand, so to what extent would bundles be unconstrained in vivo? Perhaps this is not so important, because the experimental setup allows the authors to dissect specific biochemical behaviors and inter-dependencies between distinct actin binding proteins, but the latter view (if correct) could be stated more clearly!

      We thank the reviewer for their remarks. We have updated the part where we discuss the biological implications of our in vitro observations to better explain how the twist-constraints expected for fascin bundles in cells would accelerate cofilin bundle disassembly.

      Advance:

      As stated above, the results are opposite to the proposed synergistic activities of fascin and cofilin observed for bundles previously, perhaps because they were not constrained. So although touched in part and in a very polite fashion in the discussion, the authors could specify more clearly what the differences between the studies are, and which of the distinct activities observed either here or in previous literature will be dominant or more relevant to consider in the future? This will be hard to discern as is now, in particular for non-experts.

      We agree with the reviewer that the manuscript will benefit from discussing more in depth the plausible reasons why our experimental observations are in disagreement with the earlier interpretation by Breitsprecher and colleagues. We have extended our discussion on this point, which now reads: “Previously, using pyrene-actin bulk experiments, Breitsprecher and colleagues observed a diminished cofilin binding to fascin-induced filament bundles (Breitsprecher et al, 2011). In spite of this, their observation of fluorescently labeled actin filament bundles seemed to indicate an efficient severing activity. Since cofilin was not fluorescently labeled, they could not observe cofilin clusters, and they proposed that severing was enhanced because fascin served as anchors along filaments and impeded cofilin-induced changes in filament helicity”

      Audience:

      This manuscript will be most influential for a specialized audience interested in the complexities of biochemical activities of specific actin binding proteins when looking at them in combination. Although specialized, this is still a quite relevant audience though, since prominent actin binding proteins like cofilin are highly important in virtually any cell type and various actin structures, hence of broad relevance again in this respect.

      Expertise:

      I am a cell biologist and geneticist interested in actin dynamics and actin-based, motile processes.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      My only major concern that is that although the authors provide data that strongly supports interfilament cooperativity in two filament bundles for cofilin binding, the evidence to support that this induces filament twist on the opposing filament is not strong enough to conclusively establish this as the mechanism for the observed interfilament cooperativity. This is stated as such in the results section as a proposed model, but stated with more certainty than the presented data supports in the discussion. It might be better, based on the data presented, to state this as one possible mechanism for the observed cooperativity.

      We thank the reviewer for their remark. We have edited our discussion section to clearly say that inter-filament cooperativity arises from cofilin-induced filament twisting is a proposed model that would best account for what we observed: “Indeed, we report here the exclusion of fascin from within cofilin clusters, and a strong increase in the nucleation of cofilin clusters on adjacent filaments. This inter-filament cooperativity mechanism leads to the co-localized nucleation of cofilin clusters, and permits bundle fragmentation faster than if the nucleation of cofilin clusters on adjacent filaments were purely random. To our knowledge, this is the first time such inter-filament cooperativity is ever reported. To explain this mechanism, we propose that the cofilin-induced change of helicity produced locally on one filament can be transmitted to the adjacent filaments within the bundle (Fig. 6D).”

      So far, we have been unable to propose alternative mechanisms that could explain our observations in light of what is known for cofilin at the single filament level (a similar point was raised by reviewer #2, please see above).

      Areas within the paper, if addressed, will improve the arguments presented as well as the readability of the paper.

      (1) The authors use both the terms cofilin binding (in section I of the results) as well as cofilin nucleation (in section III of the results). It is unclear if these terms are meant to indicate the same, or different, processes. The manuscript would benefit from a clear explanation of the steps of cofilin-mediated disassembly measured and quantified in the experiments, namely nucleation (or binding), cluster growth, and filament or bundle fragmentation. A clear description of these steps would also allow the reader to follow the logic of the experiments from Figure 3 to Figure 5.

      We have edited the introduction to better describe the different steps of cofilin activity, and to remove any ambiguity whereas we are referring to cofilin binding or cofilin nucleation.

      2) Throughout the paper, the authors move from single filaments, to 2-filament bundles, to multifilament bundles, using different concentrations of fascin and cofilin. Given the biphasic behavior of cofilin, namely that low concentrations favor severing and high concentrations can favor coating and filament stabilization, I think it is important that concentrations for the components are consistent across experiments, and if changes of concentrations of important components (such as cofilin and fascin) are changed, a clear explanation as to why is included.

      As explained in the beginning of the result section, most of our experiments and quantification of cofilin activity using the microfluidics assay were done using 200 nM fascin and 200 nM cofilin as a standard. This is the case, in particular, for all the data shown in Fig 2, 3 and 4, where we compare the behavior of single filaments, 2-filament bundles, and larger bundles, exposed to the same protein concentrations.

      We have also explored higher fascin and cofilin concentrations to document their respective impact, always mentioning any change in concentration. We agree with the reviewer that cofilin activity is biphasic at the single filament level (in the range of 0 to 1 µM for mammalian ADF/cofilin, at physiological pH 7.4). In the case of fascin-induced bundles (already for two-filament bundles), filament saturation by cofilin, and thus their stabilization, will occur at higher cofilin concentration. This is mainly due to the lower nucleation activity of cofilin on fascin-induced bundles, preventing the nucleation of numerous cofilin clusters that will eventually fuse together, thus preventing saturation of filament bundles by cofilin before bundle fragmentation.

      (3) In Figure 2, it is mentioned that for the spectrin seeds with the microfluidics, the filaments consisting of larger bundles were not analyzed along with the single filament and 2-filament bundles. Instead, a different experiment with seeds attached to beads is used to assess larger filament bundles. Why were larger bundles not analyzed in the microfluidic experiment?

      We appreciate the insightful observation by the reviewer. When elongating actin filaments from spectrin-actin seeds, the seeds are randomly located on the glass coverslip of the microfluidics chamber. Upon exposure to fascin, only a subsection of any filament will be in contact with one or multiple filaments, ultimately forming a bundle due to the presence of fascin. In the case of high filament densities leading to large bundles, it is very difficult to identify the exact subsection of each filament which is engaged in a bundle or not. Despite our attempts to image individual filaments before and after exposure to fascin for enhanced clarity, the inherent difficulty persisted.

      This limitation hindered our ability to quantify cofilin activity on large bundles when using spectrin-actin seeds randomly distributed on glass. To address this, we opted for an alternative approach involving micron-sized beads coated with spectrin-actin seeds. This modification not only circumvents the aforementioned limitation but also aids in the formation of larger bundles (up to 10 filaments per bundle). This adjustment significantly enhances our ability to study and quantify cofilin activity on larger bundles, contributing to a more robust and comprehensive understanding of cofilin activity on bundles.

      And conversely, why were 2 filament bundles not assessed with the beads? Comparing the findings on two filament bundles with the findings on multifilament bundles would be easier for the reader if the small and large bundles were evaluated in the same experiments. If this is not experimentally feasible, the authors need to provide clearer explanation as to why this analysis is not included.

      Actually, we did assess 2-filament bundles in the bead assay. The cofilin activity on 2-filament bundles from beads are reported, along with larger bundles, in figure 3E-F for nucleation, and in figure 4C for cofilin cluster growth rates.

      (4) The authors indicate that at increased fascin concentration (1uM) that single filaments decrease the nucleation rate of cofilin clusters. The authors should comment on the mechanism for fascin (at 1uM concentration) for affecting cofilin binding.

      We thank the review for this comment. We now comment on this mechanism in the result section:

      “This observation is consistent with the low affinity of fascin for the side of single actin filaments. Furthermore, this indicates that cofilin and fascin may have overlapping binding sites, or that a more complex competition may exist between the two proteins, where the binding of one protein would induce conformational changes on neighboring actin subunits affecting the binding of the other protein.”

      (5) The authors should determine and include the dissociation rate for the labeled cofilin used in this study, especially given the proposed mechanism for cofilin excluding fascin within the bundles.

      • If the reviewer means that we need to characterize the behavior of the labeled cofilin: in Wioland et al 2017, we have previously reported that cofilin dissociates slowly from cluster boundaries (at 0.7 s-1 for cofilin-1 on alpha-skeletal rabbit actin, as used in the present study) and extremely slowly from inside a cofilin cluster (~2.10-5 s-1).

      • If the reviewer means that we should investigate the competition between fascin and cofilin along bundles: we agree that this is indeed an interesting question. However this is quite complex because many unknown parameters are involved. In addition to the on/off-rates of each protein and how it is affected by the presence or the proximity of the other protein, we need to consider that fascin has fewer binding sites than cofilin, and that their accessibility changes as the helicity of the filament evolves as cofilin binds. Investigating this question would require many experiments, which we would need to confront to a model. We believe that this is out of the scope of this manuscript.

      (6) For Figure 4, D and E, what do the dynamics of fascin and cofilin signal look like on a larger filament bundle? It would be informative to provide the cofilin cluster nucleation rate on larger filament bundles with a range of fascin concentrations (as in 3D for a two filament bundle).

      It would be interesting indeed to investigate the dynamics of fascin and cofilin on larger bundles. However, this experiment is quite challenging due to the fluorescence background of fluorescently-labeled fascin in our microfluidics assay (regardless of bundle size). We have been unable to perform this assay with success on large bundles. Moreover, it is difficult for us to carry out more of these experiments now that the first author of the study has left the lab.

      However, based on our results, we would expect that, for large bundles, increasing fascin concentration would also have a limited impact on the reduction of cofilin nucleation. Indeed, for 2-filament bundles, we can note that the increase of fascin concentration has a more limited impact on the nucleation of cofilin clusters (fig. 3D, roughly ~2 fold decrease for fascin from 100 to 500 nM), than the number of filaments per bundle (fig. 3F, a 10-fold decrease when increasing the size of a bundle from 2 to 10 filaments).

      (7) Additionally, it would be useful to report the cofilin severing rate at a range of cofilin concentrations, at least for the 2 filament bundles.

      Cofilin severing rate is not dependent on cofilin concentration in solution. This has been reported previously by several groups, including ours (e.g. Suarez et al, Current Biology 2011 ; Gressin et al, Current Biology 2015; Wioland et al, Current Biology 2017).

      Below is the comparison of cofilin cluster severing at 100 and 200 nM cofilin, on single actin filaments, which we added to supplementary figure 10.

      At 100 nM cofilin, we measured a similar cofilin cluster severing rate on 2-filament bundles, by measuring the survival fraction of overlapping cofilin clusters that lead to 2-filament bundle fragmentation over time. The figure pasted below is new Supp. Fig. 11.

      When the severing occurs in the two filament bundles, does this severing occur mostly at boundaries with cofilin-actin and bare actin or does this severing occur at cofilin-actin/fascin-actin boundaries?

      This is an interesting point. In the presence of a saturating amount of fascin, on 2-filament bundles, one fascin protein is bound every 13 actin subunits along each filament of a bundle. Most of the time, a cofilin boundary will not be in contact with a fasin-bound actin subunit. The limited spatial resolution of optical microscopy does not allow to say whether fascin was present at the boundary of a cofilin cluster or not when severing occurred. Nonetheless, we show that cofilin cluster severing is unaffected by fascin-bundling (i.e. severing rates per cofilin cluster boundary are similar on single filaments and on 2-filament bundles). Overall, bundling by fascin probably does not change the way cofilin severs, i.e. it occurs at the boundary between cofilin-decorated and bare actin regions.

      (8) For the images of large bundles appearing braided in figure 6A, the lower left panel the braided appearance is not obvious. Additionally, what is the number of filaments in the bundles shown? Finally, given that in Figure 3F it is indicated that cofilin cluster nucleation events are rare on large bundles, and the cluster growth rate is reduced on large bundles (Figure 4C), the authors need to indicate how frequently this braided appearance is observed as well as what the nucleation rate, growth rate and severing rate is for 500nM cofilin on bundles.

      We have repeated the negative staining EM experiments several times and now show new images and quantification (new Supp. Fig. 13). In our new series of experiments, the braiding that was previously shown in Fig. 6 proved difficult to reproduce and to quantify. We therefore decided to remove EM observations from the main fig 6, and we no longer present them as evidence supporting the mechanism that we propose for inter-filament cooperativity.

      As stated in point (7) above, the severing rate is independent of cofilin concentration. We’ve used 500 nM cofilin, which is a rather high cofilin concentration, to investigate bundle fragmentation in EM, as in solution we mostly form large bundles and they are more slowly targeted by cofilin than individual or 2-filament bundles (figure 3F & 4C). At the single filament and 2-filament bundle level, the nucleation of cofilin clusters is extremely fast at 500 nM cofilin (> 10-4 s-1 per binding site).

      (9) The authors indicate that the rapid fragmentation of twist constrained 2-filament bundles prevented them from directly quantifying the nucleation rate of the subsequent cofilin clusters that overlapped the initial ones. I'm unclear why this is the case, and if this is the case, I don't understand how the authors can be sure that a second nucleation event occurred in the twist constrained bundles. From the experimental data in 7C, it appears that the fragmentation rate for two filament bundles is similar to the fragmentation rate for twist constrained single filaments. The authors need to clearly state what they were able to observe and quantify as well as include the timing for this severing. If the authors could not observe a second nucleation event prior to severing, this should be clearly stated.

      Fragmentation of a 2-filament bundle requires the severing of two co-localized cofilin clusters, one on each filament. When 2-filament bundles are twist-constrained the sequence of events leading to bundle fragmentation is fast, thus it is difficult to separate the events within the resolution of our experiment. In this case, cofilin clusters sever quickly, thus the size of the clusters is small, which translates into a low fluorescence intensity. Therefore, the quantification of the increase of cofilin fluorescence intensity along a bundle did not allow us to unambiguously identify the ‘cooperative’ nucleation of two-overlapping cofilin clusters before the bundle is fragmented. So, apart from the quantification of the nucleation of cofilin clusters, which we show is unaffected by twist-constraining the bundles, we were unable to measure the growth rate nor the severing rate of cofilin clusters.

      Numerical simulations, using similar severing rates for cofilin clusters on both twist-constrained single filaments and 2-filament bundles, satisfactorily reproduce our experimental observations (dashed lines in Fig. 3C).

      We have edited the ‘Twist-constrained bundle fragmentation’ section to clearly say what we measured and what could not be measured : “We observed that the nucleation rate of cofilin clusters was similar for both twist-constrained and twist-unconstrained fascin bundles (Supp. Fig. 15), in agreement with observations on single actin filaments (Wioland et al, 2019b).

      The rapid fragmentation of twist-constrained 2-filament bundles prevented us from directly quantifying the nucleation rate of the subsequent cofilin clusters that overlapped with the initial ones, as well as cluster growth and severing rates.”

      This could be due to the rapid fragmentation, but it could also be due to severing occurring in the absence of a second cofilin nucleation event. It would be informative to compare the time from cofilin nucleation to severing event for two filament bundles in twist constrained and unconstrained. Clarification of the dynamics of nucleation and spreading of cofilin and the timing of fragmentation of the twist constrained filament bundles is needed.

      As explained in the previous point, cofilin-induced severing occurs significantly faster on twist-constrained single actin filaments compared to unconstrained filaments.

      For twist-unconstrained filament bundles, we never observed bundle fragmentation that originated from only one cofilin cluster. For twist-constrained bundles, while our observation is limited by the rapid fragmentation of the bundles, it is hard to imagine that a single cofilin cluster on one filament would induce the fragmentation of the neighboring filament. Recently, Bibeau et al, PNAS 2023, using magnetic tweezers to twist single actin filaments, showed that, without cofilin, applying up to 1 rotation/µm to an actin filament does not cause its fragmentation. It is thus reasonable to say that cofilin binding is required to fragment twist-constrained filaments.

      Moreover, in our numerical simulations (without inter-filament cooperativity, faithfully reproducing the kinetic of 2-filament fragmentation observed in microfluidics), 75% of bundle fragmentation resulted from a sequential nucleation of cofilin clusters, with the nucleation of the second cofilin cluster occurring after the first cofilin cluster has already severed one filament of the bundle.

      (10) Discussion of how twist constrained fragmentation dynamics might affect the dynamics of larger bundles in structures such as filopodia would be useful.

      We had substantially edited the discussion section of the manuscript, attempting to better discuss the physiological implications of our in vitro observations (bundle size & twist-constraints).

      Minor changes that would improve the paper:

      (11) In Figure 1C, Figure 2B and Figure 2E, the indication, on the graph, of the fold-change between the rates is confusing as it is not clear from the labeling on the graph that the x15 is referring to the slope of the lines, keeping this information in the legend is appropriate, but if it is to be included on the graph, perhaps adding in the linear fit on the graph is also needed.

      We have edited the figures accordingly, and included fit lines in figure 1.

      (12) Figure 7A, lining up the diagram with the kymographs below would help improve interpretation of the diagram and simulation. Alternatively, if the diagram (upper) in A does not diagram the kymographs below, this needs to be clearly stated, and it would be preferable that the diagram above matches the kymographs below.

      We have edited the figure layout accordingly.

      (13) Despite referencing the Breitsprecher, 2011 paper in the introduction, the authors do not explain how their results showing that cofilin fragments filament bundles slower than single actin filaments correspond with the Breitsprecher findings that fascin bundles favors cofilin filament severing. While the authors do not need to explain the Breitsprecher data, if they reference these findings that run counter to their results, an explanation for the discrepancy would be reasonable to include in the discussion.

      We agree with the reviewer comments, which was also a comment made by reviewer #2. We now more directly discuss possible discrepancies between Breitsprecher and our studies : “Previously, using pyrene-actin bulk experiments, Breitsprecher and colleagues reported a diminished cofilin binding to fascin-induced filament bundles (Breitsprecher et al, 2011). In spite of this, their observation of fluorescently labeled actin filament bundles seemed to indicate an efficient severing activity. Since cofilin was not fluorescently labeled, they could not observe cofilin clusters, and they proposed that severing was enhanced because fascin served as anchors along filaments and impeded cofilin-induced changes in filament helicity. This proposed mechanism bears resemblance to our previously reported findings for artificially twist-constrained single actin filaments (Wioland et al, 2019b). Here, we show that this mechanism does not occur in fascin-induced bundles.”

      Reviewer #3 (Significance (Required)):

      The research presented in "Fascin-induced bundling protects actin filament from disassembly by cofilin" is relevant and of interest to the field as it directly addresses a limitation in our understanding of how cofilin-induced severing occurs in F-actin bundles. Bundled F-actin may constitute the majority of linear F-actin within the cell and is specifically important in F-actin-based structures such as filopodia and stress-fibers. The data supports a model for interfilament cooperativity that provides a molecular mechanism for cofilin-mediated severing of fascin-bundled filaments.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Planned Revisions based on comments from Reviewer #1

      • The introductory material and the title of the paper emphasize the ring canal scaling question. This problem is somewhat obscured in the text by the side problem of nuclear scaling, which comes up frequently even though the results are not as thoroughly explored. Could the authors think about moving these data into a different, single figure for the sake of coherence? This is not a required revision. Just a thought.
      • *We have moved the nuclear scaling data from Fig. 5 into Fig. S3, and once we have analyzed the data from the planned experiments (over-expressing either HtsRC or the active form of myosin), then we will have a better idea of whether we should move the rest of the nuclear scaling data out of the main part of the paper, consolidate it into a single figure (as Reviewer #1 suggests), or keep some of it in the main figures. *

      Planned Revisions based on comments from Reviewer #3

      • I cannot see differences in RC size in the panel A images. More importantly, this method altering ring canal size is limited. A more direct way is overexpression of HtsRC (https://doi.org/10.1534/genetics.120.303629).
      • We have requested and just recently received the line to over-express HtsRC in the germline. We plan to cross this UAS line to the mataTub-GAL4 which expresses GAL4 beginning around stage 3 of oogenesis. Because crossing this UAS line with this GAL4 line produced egg chambers with larger ring canals in the original study2*, we do not anticipate any technical issues with this experiment. We will incorporate the results from analysis of these egg chambers in the revised manuscript. *
      • To further explore the effect of ring canal size on scaling, we will also be testing a condition that we hope will have the opposite effect on ring canal size; expression of a phosphomimetic version of the non-muscle myosin II regulatory light chain, encoded by spaghetti squash (Sqh)(UAS-sqhE20E21). We plan to cross this UAS line to two different GAL4 drivers (nos-GAL4, which expresses GAL4 in a pulse during early oogenesis and then in another pulse in mid-oogenesis and the mataTub-GAL4 which expresses GAL4 beginning around stage 3 of oogenesis). We know that expression of sqhE20E21will reduce the size of the ring canals that connect the nurse cells to each other, but it is possible that the posterior ring canals will not show a strong phenotype. In a study that looked at egg chambers homozygous for a mutation in the myosin binding subunit of the myosin phosphatase, DMYPT, which should also increase sqh phosphorylation, it was shown that the posterior ring canals were larger than those connecting nurse cells 1*. Therefore, it is possible that this condition may not allow us to consistently reduce the size of all ring canal types; however, if we do see a significant reduction in posterior ring canal size in these egg chambers, we will include these data in the revised manuscript. *

      • In panel 2E, it would be helpful to plot the y-intercepts separately, too.

      • Based on the analysis of the data from the proposed experiments, we will consider plotting the y-intercepts separately for the various conditions.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      Revisions made based on comments from Reviewer #1

      • One way to think about the dhc-64C experiments presented in Figure 2 is that they are meant to test the hypothesis that ring canal size impacts scaling in such a way that transport across the four ring canals tends towards equilibrium over time. One possibility would therefore be that ring canals aren't programmed to grow to a particular final size but rather they grow at different rates until their diameters are the same. This seems to me an important distinction. It might be made by analysis of the arpC2-RNAi cells, since those ring canals are meant to be initially larger. Unfortunately, I can't see the answer.
      • *Reviewer #3 suggested determining the ratio of the diameter of the M1 ring canal to the M4 ring canal. If ring canals grow toward equilibrium (to achieve a similar final size), then we would expect to see this ratio approach 1; when we performed this analysis, we saw that the ratio did decrease as the egg chambers increased in volume, but it never quite reaches a ratio of 1. We have added a supplemental figure (Fig. S1) showing these data and incorporated this idea into the text within the results and discussion sections. *
      • *Although it would be informative to determine whether ring canals that all started with a similar diameter would grow at the same rate, we have not found a condition that would provide the opportunity to test this hypothesis. We hope that the planned experiments will provide us with a way to test this hypothesis; we will determine the M1/M4 ratio in egg chambers over-expressing either HtsRC or sqhE20E21 and see whether this ratio still decreases as egg chamber volume increases. *
      • *Once we perform the planned experiments to either increase or decrease ring canal size, then we can determine whether we need to further modify Fig. 3 to highlight these size differences between ring canals in the arpC2-RNAi egg chambers or whether we will instead focus more on the results of the planned experiments. *

      • The authors write that arpC2-RNAi "ring canals tended to be larger than those in similarly-sized control egg chambers," but that conclusion isn't obvious to me from the data in Figure 3B. The only difference I can see is that the M4 ring canals look to be consistently smaller in the experimental versus control egg chambers, especially at the final timepoint.

      • *To further clarify the difference in ring canal size between the control and the arpC2-RNAi egg chambers, we have added additional explanation to the results section to highlight that the y-intercepts of the lines of best fit are significantly higher in the arpC2-RNAi egg chambers at each stage. This demonstrates that given an egg chamber volume, the ring canals will be larger in egg chambers depleted of ArpC2 than in the controls. *

      • The authors write that "there was a consistent, but not significant decrease in the scaling exponents for the arpC2-RNAi egg chambers compared to controls," but I don't see this in the M1 (identical) or M2 (almost the same) ring canals. The scaling decrease is most pronounced at M4. All the other ring canals seem to reach a final size that's equivalent to controls. What does this tell us about scaling? Is the M4 more sensitive to the effect of arpC2-RNAi? I note and appreciate that the data for M4 show a wide distribution and might have been impacted by outliers, which could be discussed.

      • *We have separated the arpC2-RNAI ring canal scaling data by lineage (Fig. S2), and we have color-coded the data in Fig. 3B (as suggested by Reviewer #3). *
      • We have expanded the discussion of these results and their implications, and we have added a line in the results section to address this wide distribution of the M4 ring canal sizes.

      • The possibility that ring canal scaling "could generate eggs of different sizes" could use some elaboration (at least) as it does not seem to be especially well supported:

      • Only one of the small egg lines had lower scaling exponents than the big egg lines, and it's a struggle for me to understand the extent of that difference based on the data shown. (Is it significant?).
      • *We have restructured this section of the results and modified Fig. 5 to highlight similarities and differences between the four lines. In the results section (and in the figure legend), we have stated that when we compared the slopes of the regression lines for all four lines, there was a significant difference for M1, M2, and M4 (Fig. 5C, D, and F). We have also modified the results section to highlight that although the slopes for line 9.31.4 was not different from the two big egg lines, the intercepts were significantly different for M1, M2, M3, and M4 ring canals. We moved the nuclear scaling data to Fig. S3 to simplify the figure. *

      • The authors conclude that "the effect of lineage on ring canal scaling is conserved, and it suggests that at least in one line, reducing posterior ring canal scaling could provide a mechanism to produce a smaller mature egg." The first part of this sentence is confusing for me since I don't know what is meant here by "conserved." The second part of the sentence is technically correct but disguises what I would consider the more meaningful and exciting finding. The 9.31.4 line produces the smallest eggs but does not demonstrate scaling differences in comparison to the big egg lines examined (1.40.1 and 3.34.1). The authors have therefore avoided/solved a "chicken and egg" ("fruit fly and egg"?) problem by showing that scaling and egg size can be decoupled!

      • We have modified the first part of the sentence to clarify our point. We appreciate this suggestion and have modified the text in the results section to further elaborate on the results.

      • This point is not made very clearly in the discussion, which concludes with the suggestion that scaling could help explain why some insects produce much larger or much smaller eggs that fruit flies. I can only understand this to be the case if - as the authors point out - scaling "affect the directed transfer of materials into the oocyte." That argument seems predicated on the possibility that these insects make the same amount of initial material then regulate how much is transferred. Seems like a costly way to go about it.

      • *We have modified this section of the discussion. *

      • I really had to look very closely to distinguish the little blue boxes from the little blue circles in panels 2C and especially 2D. I suggest using a different color instead of a different shape, or maybe splitting the graphs up.

      • *We have made the shapes larger in Fig. 2C (nuclear sizes), and we have split the ring canal size data into Fig. 2D, E and made the shapes larger. The legend has been modified to reflect this change. *

      • "Depletion of the linker protein, Short stop (Shot), or dynein heavy chain (Dhc64C), significantly reduced the biased transport at the posterior, which reduced oocyte size (Lu et al., 2021)." I suggest this sentence might be clearer if it was rewritten as "Depletion of either dynein heavy chain (Dhc64C) or the linker protein Short stop (Shot) significantly reduces biased transport at the posterior, in turn reducing oocyte size (Lu et al., 2021).

      • We have made this change.

      • "Because nuclear growth has been shown to be tightly coupled to cell growth (Diegmiller et al., 2021), we can use nuclear size as a proxy for nurse cell size." I think it would help the reader to know that the Diegmiller study was performed using germline cysts in the Drosophila ovary; I paused when I got to this sentence because I initially read it as overly broad. I suggest "Recent work in demonstrates that nuclear growth is tightly coupled to cell growth in this system (Diegmiller et al., 2021), and we can therefore use nuclear size as a proxy for nurse cell size" or similar. This is certainly not a required revision, just a suggestion.

      • We have made this change.

      Planned Revisions based on comments from Reviewer #3

      • Reviewer #3 asked: Does the ratio of the diameter of M1 to M4 stay the same?
      • *We have performed this analysis in the control egg chambers (from Fig. 1), and we found that the ratio does not stay the same, but that it tends to decrease as the egg chamber increases in volume. We plotted the log of egg chamber volume versus this ratio, and the equation for the regression line was y = -0.166x + 2.32, which was significantly different from a slope of 0 (included in Fig. S1). *

      • It would be helpful to explain that the log-log plots were used to derive a line equation (y=mx + b) and why that is useful in this context. In the case of a log-log plot, what does the y-intercept mean biologically? Is it simply a way to compare two things or does it indicate real measurements such as volume or ring canal size? Also, the slope of the line is being used as a scaling value. Be careful to define the terms "scaling" and "scaling exponent".

      • We have added additional explanation in the results section.

      • Are four significant digits called for in calculating slope? The figure has 4 significant digits, the text has three.

      • *We have modified all figures and text to include only 3 significant digits. *

      • Why is isometric scaling 0.66 - is that microns squared over microns cubed? Please explain.

      • We have added additional explanation to the results section.

      • Were all four posterior nuclei measured? The figure indicates just M1 and M4.

      • We apologize that it was not clear that all four posterior nuclei were measured in Fig. 1. For the sake of space, we only showed images of the M1, M4, and Anterior ring canals and nuclei (in Fig. 1A), but all four nuclear measurements were included in the graph in Fig. 1B. We have added M1-M4 to the legend to clarify and revised the text of the legend.

      • It is hard to explain why all four posterior nuclei are bigger than anterior when one of the four is the same age as the anterior nucleus.

      • The posterior nuclei are larger than the anterior nuclei due to their proximity to the oocyte. Multiple recent studies have described this hierarchical nurse cell size relationship in which the nurse cells closest to the oocyte are larger than those separated from the oocyte by additional intercellular bridges 3–5*. *

      • In panel D, a conclusion is, "Further, the scaling exponent [slope] for the anterior ring canals, which are also formed during the fourth mitotic division, was not significantly different from that of the posterior M4 ring canals". Anterior is 0.23, M4 is 0.25. These seem different to me. How is significance determined? Were any of the scaling exponents in M1, M2, M3, M4 or Anterior significantly different?

      • *Significance was determined within the Prism software using a method equivalent to an ANCOVA. If the slopes are compared, M1 is significantly different from M2, M3, and M4, and M2 is significantly different from M4. M4 is not significantly different from the slope for the anterior ring canals, which supports the correlation between scaling and lineage. *

      • References are needed for the statements about biased transport to the oocyte.

      • *There was a reference to the Lu (2021) paper in that paragraph, but we have added an additional reference to that paper to this part of the results section. *

      • In panel 2C, why are the scaling exponents (slopes) of the controls bigger than in Figure 1B? The controls look hyper allometric in Fig. 2.

      • *This experiment was done with a different GAL4 driver, so it is possible that there are some differences in scaling based on genetic background. *

      • In panel 2D it is impossible to pick out the control posterior vs anterior lines - use different colors as in Figure 1. Why do the control lines for posterior and anterior merge?

      • *We have split the ring canal scaling data from Fig. 2D into different separate panels (Fig. 2D,E), as suggested by Reviewer #1. *
      • These lines likely approach each other because the slope of the line for the anterior ring canals (M4 type) is always larger than the slope for the combined posterior ring canals.

      • Re: Fig. 3: Scaling of what? RC size?

      • *We assume that this comment is related to the heading for this section of the results, so we have added “ring canal to the end of this title, so that it now reads: “Increasing initial ring canal size does not dramatically alter ring canal scaling” *

      • Since there was no effect, "dramatically" should be deleted from the section title.

      • This change has been made.

      • Clarify this sentence: If ring canal size inversely correlates with scaling, then increasing initial ring canal diameter should reduce the scaling exponent.

      • We have made this change in the text.

      • How does panel B show that RCs are larger in arpC2 KD? Fig. S1A has smaller y-intercept for control. Again, it is impossible to see which lines go with which M and which genotype.

      • *As mentioned above, we have modified Fig. 3 to highlight these differences and added additional explanation to the results section. *

      • Panels 4D & 4G are clear - should include significance indications.

      • *We have added asterisks to indicate significant differences. *

      • The conclusion from panels 5B and 5C that reducing RC scaling could lead to smaller mature eggs is a stretch. Without looking at the rest of the lines these data are preliminary and detract from the rest of the paper.

      • *As suggested by Reviewer #1, we have modified the results and discussion sections, and we have added a statement about the need for analysis of additional lines. *

      2. Description of analyses that authors prefer not to carry out

      Comment from Reviewer #2

      • I am surprised that the author has not considered controlling the impact of cell cycle regulation on this scaling process, especially as the work of Dorherty et al. has shown that this type of regulation is essential for regulating the size of nurse cell nuclei. The authors should test the impact of at least dacapo and cyclin E in this process.
      • We have attempted to deplete Dacapo from the germline by crossing two different RNAi lines to multiple germline drivers; however, we have been unable to see a consistent effect on nurse cell nuclear size, which suggests that these RNAi lines may not effectively reduce Dacapo protein in the germline. Although we agree with the reviewer that this is an obvious mechanism that should be explored, we believe that it is not necessary for it to be included in this manuscript, because altering Dacapo levels in the germline would not provide a mechanism to explain our model that ring canal lineage impacts ring canal scaling. Dacapo has been shown to contribute to the hierarchical pattern of nurse cell size observed in the germline. Dacapo mRNA produced in the nurse cells is transported into the oocyte, where it is translated. Then, the Dacapo protein diffuses back into the nurse cells, producing a posterior to anterior gradient 4. Doherty (2021) showed that reducing the levels of the Dacapo protein using the deGradFP system eliminated the nurse cell size hierarchy. If our data had supported a model in which proximity to the oocyte was a strong predictor of ring canal size and scaling (as shown for the nurse cells and their nuclei3,5*), then this would have been an excellent way to dig further into the mechanism. Instead, our data supported a role for ring canal lineage in predicting ring canal growth, since the M4 ring canals at the posterior and anterior showed similar scaling with egg chamber volume. *
      • We believe that performing the proposed experiments (over-expressing HtsRC to increase ring canal size or expressing the phosphomimetic form of the myosin regulatory light chain, sqhE20,E21 to reduce ring canal size) will allow us to determine how ring canal size affects scaling, which will provide additional mechanistic insight into this scaling behavior.*

      *

      Comment from Reviewer #3

      • Panel 3E is interesting and would fit better in Figure 1.
      • *This panel is from a different genetic background than the data in Fig. 1. Therefore, we do not think it would be appropriate to move it to Fig. 1. *
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *

      *Major comments: 1. Mirc56_2 and 4 showed lower integration rates, and the authors suggest that this could be due to sgRNA pool imbalance. The authors should validate this by performing sequencing of the input sgRNA and cassettes. *

      →Thank you very much for your comment, and we agree with your suggestion.

      We are going to confirm sgRNA pool imbalance in donor vector library by amplicon short-read NGS.

      In addition, to confirm another possibility that we raised, we re-sequenced sgRNA donor vector for Mirc56_2 and 4, and will add the following sentences:

      “We firstly doubted that their low integration frequencies were caused by any mutations on PB transposon of sgRNA donor vector, on especially ITR or ID that are important for integration efficiency [PMID: 15663772]. Therefore, we sequenced PB transposons for Mirc56_2 and 4 again. However, we could not find any mutations on their PB transposon.” following to “…efficiency or cell growth.” in the Discussion (page14, line 346)

      *2. Clonal analysis in Figure 5c is unclear a. Figure 5c indicates that all changes were homozygous (e.g. both alleles were deleted). Was this the case in all clones? Or were some mutations heterozygous? *

      →Thank you very much for your comment.

      We apologize for the misleading context.

      We targeted mono allele on X chromosome in male mES cells so that all mutations should be hemizygous as mentioned in the Result (page11, line 259-260)

      To enhance our study is monoallelic assessment, we will add the following sentence:

      “This study targeted mono allele on X chromosome in male mES cells so that all genotype on Mirc56 should be hemizygous and these mutations induced might be cis-mutation.” following to“…owing to six tandem repeats [37]” in the Result (page12, line 302)

      *b. Many clones in Figure 5c show that the entire region was deleted (all black dots). Could this be due to some experimental error or misinterpretation of the sequencing data, or could it be validated using some orthogonal method? This is especially surprising for clones in which the final guide (Mirc56_13) was not detected yet the final site (Mirc56_13) was reported as "Regional deletion". *

      →Thank you very much for your comment.

      We apologize for the misleading context.

      Firstly, we just confirmed and sequenced the mature-miRNA genomic regions by amplifying approximately 200 bp around the target sites. Therefore, we defined unamplified regions as “miRNA deletion”. In addition, to make the Figure 5C easy to understand, we added “predicted regional deletion” and each name of clones as attached.

      In fact, only 4 clones harbored entire Mirc56_X deletions on all analysed Mirc56_X genomic region (Mirc56_1 to 13). Besides, these clones could be PCR-amplified by sgRNA cassettes and Sry on Y chromosome so that these results suggested we could successfully obtain their genomic DNA and at least mature-miRNA genomic regions were deleted.

      Moreover, Mirc56_13 deletions without target sites on Mirc56_13 are always within predicted regional deletions that are induced from upstream and downstream of sgRNA target sites. Therefore, it could be estimated that these deletions were induced from the target sites on Mirc56_14, 15, 16, or 17 and upstream of Mirc56_13.

      To clarify them, we will add the following sentences:

      • “Four clones (#2_066, #1_021, #1_029 and #1_046) harboured entire Mirc56_X deletions on all analysed Mirc56_X genomic region. In addition to these clones, only 3 pairs (#2_019 and #2_084, #2_038 and #1_023, #1_016 and #1_027) harboured same combination of mutations.” following to “…combinations of mutations (Figure 5C).” in the Result (page16, line 378-380)
      • “Meanwhile, focusing on relationship between mutations and target sites that targeted by sgRNA cassettes in each clone, all Mirc56_X genomic regions harbouring Indel mutations were target Micr56_X In addition, if sequential Mirc56_Xs on the genome were deleted, the most upstream and downstream of Mirc56_Xs deleted were always on the target Mirc56_X sites except for #2_025 and #1_41.” following to “…combinations of mutations (Figure 5C).” in the Result (page12, line 304)
      • “Genotyping PCR amplified approximately 200 bp around the mature-miRNA genomic region. Unamplified region is defined as miRNA deletion (Black circle) and amplified region was determined as Indel mutation (Gray circle) or Intact by short-NGS. If sequential Mirc56_Xs on the genome were deleted, black translucent square indicates predicted regional deletion assumed that the genomic region flanked by miRNA deletions was also deleted. Besides, if miRNA deletion was induced in Mirc56_13 and the clone have target Mirc56_X on Mirc56_14, 15, 16, or 17” following to “…in each PB mES clone.” in the Figure legend (page23, line 575) Moreover, because we defined “miRNA deletion”, we will change ”regional deletion” to “miRNA deletion” where I mean “deletion of the mature-miRNA genomic regions” in the Result (page13, line 312) and the Discussion (page14, line 363)

      *3. Next-generation targeted sequencing of clones should be made publicly accessible. *

      →Thank you very much for your comment. We apologize for the inconvenience.

      We already informed Review commons that we made publicly available.

      We already described BioProject ID PRJNA996747 in the Data Availability (page16, line 383-384)

      4. OPTIONAL - Cassette integration number is understudied. One important aspect of tiling mutagenesis is the control over how many guides are present in each cell. The authors report an average of 4.7 cassettes/cell. This could be modulated by the amount of donor vector added, and indeed the authors performed titration experiments, but only with a fluorescent reporter readout. It would be very useful to know how the concentration of donor vector corresponds to the number of cassettes/cell - perhaps genotyping of clones from one or two additional experiments would be sufficient.

      → Thank you very much for your suggestion.

      We agree that cassette integration number is one important aspect of tiling mutagenesis.

      To investigate how many copies our concentration of donor vector could integrate, we are going to check actual copy numbers in several clones by qPCR.

      We think that it is other research to confirm “how the concentration of donor vector corresponds to the number of cassettes/cell”. The correlation might not be liner due to transposase overproduction inhibition (OPI) so that it would require huge amounts of experiments to confirm it. Our research is how CTRL-Mutations induce diverse mutations but not how property PB system have.

      Minor comments: 1. The background fails to acknowledge the work of CRISPR-Cas tiling screens (e.g. https://doi.org/10.1038/nbt.3450) or CRISPR-Cas in creating mutagenesis in cell lines (e.g. https://doi.org/10.1007/978-1-0716-0247-8_29*) *

      → Thank you very much for your suggestion, and we agree with your suggestion.

      We will add to acknowledge previous studies for CRISPR-Cas tilling screens.

      • “Recently, targeted mutagenesis combined forward genetics and reverse genetics has been developed such as saturating mutagenesis and tiling mutagenesis that induce random mutation within target gene(s) [PMID: 25141179, 31586052, 27260157, 28118392]. This targeted mutagenesis can construct a mutant library harbouring subtly different mutations within a target gene(s) so that comparative analysis through the mutant library can screen out critical mutation(s) for biological processes. These random mutagenesises have also revealed the function of numerous coding genes” following to “…list of coding genes [6–8].” in the Introduction (page3, line 55-56)
      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82) However, we do not agree that we have to acknowledge previous report about KI or KO by single or double cut in cell lines (as you suggested that https://doi.org/10.1007/978-1-0716-0247-8_29) because it is obvious knowledge. Therefore, we will not add this paper.

      2. Figure 1 left 'ROI random mutant PB mES cell' should be horizontally aligned so Mir_1, Mir_2 and MirX align with the upper figure.

      → Thank you very much for your kind comment, and we agree with your suggestion.

      Therefore, we changed it in the Figure 1.

      *3. It is interesting and unexpected that some guides never induce indels, even in the absence of a regional deletion (e.g. Mirc56_3, Mirc56_7). Why might this be? Was there perhaps an error in the assignment of these guides to these cells? *

      → Thank you very much for your comment.

      As you mentioned, Mirc56_3, 4 and 7 had no indel. We appreciate that we can correct our mistakes by your suggestion. We corrected Figure 5D as attached. In addition, we will correct average Mirc56_X site as 22.6 from 22.7.

      These sgRNA also induced miRNA deletion with low frequency (Mirc56_3: 38.9%, Mirc56_4: 25.0% and Mirc56_7: 68.0%, Figure 5D). Moreover, every deleted Mirc56_3, 4 and 7 was within predicted regional deletion except for target Mirc56_3 of PB mES clone #2_080 (revised Figure 5C).

      Therefore, we raised why some guides never induce indels even in the absence of a regional deletion, as “In addition to low frequencies, Indel mutation might disappear due to regional deletion if these sgRNAs could induce Indel mutation”.

      To clarify them, we will add the following sentences:

      • “In particularly, middle target sites such as Mirc56_3, 4 and 7 were induced only miRNA deletion or Intact (Figure 5D)” following to “…in our mutant library (Figure 5C, D).” in the Discussion (page14, line 364)
      • “In fact, every deleted Mirc56_3, 4 and 7 was within predicted regional deletion except for target Mirc56_3 of PB mES clone #2_080 (Figure 5C). In addition, these sgRNA induced mutation with low frequency (Mirc56_3: 38.9%, Mirc56_4: 25.0% and Mirc56_7: 68.0%, Figure S6). Therefore, we suspected that regional deletion and their low mutation introduction rate facilitated to disappear Indel mutation.” following to “…induced at target sites.” in the Discussion (page14, line 366)

        *4. Regarding Mirc56_2 and 4 integration, on line 34 the authors suggest that "We suspect this was caused by a technical error, such as an unequal amount of sgRNA donor vector or the sequence in sgRNA cassettes affecting integration efficiency or cell growth." sgRNA library imbalance would be a technical error, but integration affecting cell growth is not a technical error. This sentence should be reworded. *

      → Thank you very much for your comment.

      We apologize for the misleading sentence even though this paper was already English-reviewed by English language editor.

      We will reword that “We suspect this was caused by the sequence in sgRNA cassettes affecting integration efficiency or cell growth, or a technical error such as an unequal amount of sgRNA donor vector.” following to “…PB mES clones via FACS..” in the Discussion (page14, line 344)

      *5. Line 540 "ration" is the incorrect word - perhaps "ratio"? *

      → Thank you very much for your kind comment, and we are sorry for the typo.

      We will correct it in the Figure legend (page22, line 540).

      6. Plot 5b should be shown as a histogram rather than a swarm plot to show how many clones were in each category.

      → Thank you very much for your suggestion.

      In Figure 5B, we aimed to indicate the number of sgRNA cassette varieties in each clone but not distribution of the number of integrated sgRNA cassettes. Distribution of the number of integrated sgRNA cassettes in clone library matched with the frequency of target sites in Figure 5D.

      We already described the distribution data as “In addition, an average of 22.7 Mirc56_X sites … the same frequency except for the Mirc56_2- and 4-targeting cassettes.” in the Result (page13, line 312-315)

      *Reviewer #1 (Significance (Required)):

      1. General assessment: The authors are successful in creating clonal cell lines bearing a variety of mutations. Unfortunately, the cell lines also have transposase-mediated insertion events of the sgRNA cassettes at unknown positions in the genome, which will hamper the interpretability of any experiment using these cell lines. The authors fail to justify the use of the transposase and integration of the sgRNA, especially compared to lentiviral transfection or RNPs which would produce edits at the region of interest. Alternately, integrated sgRNA cassettes could have been excised with Flp recombinase as in https://doi.org/10.1007/978-1-0716-0247-8_29. *

      → Thank you very much for your suggestion.

      We agree that we did not mention why we choose PiggyBac system compared to lentiviral delivery.

      Therefore, we will add the following sentences:

      • “In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system. To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812].” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction. However, we are not going to mention comparison to RNPs because it is obvious that random sgRNA expressions is important key for random mutagenesis and design of random sgRNA treatments by RNP is difficult. The reason is that the target region might be cleaved by almost all sgRNA incorporated into cells. On the other hand, it is easier to design the number of sgRNA expression variety using the delivery system via integration into the chromosome because only integrate sgRNA are expressed.

      In addition, we could not agree that “integrated sgRNA cassettes could have been excised with Flp recombinase as in https://doi.org/10.1007/978-1-0716-0247-8_29.”

      This paper reports the concept that one EM7>neoR expression cassette flanked by Frt within KI allele could select intended-KI clone and then the cassette could remove by Flp recombinase. However, this approach is not suitable for our method because it causes structural mutation by recombination of multi Frt cassettes that are integrated into nearby genomic regions. Therefore, we will not mention it.

      *2. Additionally, the genotyping analysis is unclear, and seems to indicate that each clone bears homozygous mutations, with several clones showing deletions of the entire region. *

      → Thank you very much for your suggestion.

      We will revise them in Reviewer #1 Major comment 2a and b.

      3. Advance: The authors are motivated to create clones using tiling mutagenesis. Tiling mutagenesis has already been performed without transposases (e.g. https://doi.org/10.1038/nbt.3450, https://doi.org/10.1371/journal.pone.0170445, https://doi.org/10.1038/s41467-019-12489-8*) in the context of a screen, and clones have already been created using CRISPR/Cas9 mutagenesis so the advance presented in this manuscript over previous published work is unclear. *

      →Thank you very much for your suggestion, and we agree with your suggestion.

      We will add to acknowledge previous studies for CRISPR-Cas tilling screens.

      We will add the following sentences:

      • “Recently, targeted mutagenesis combined forward genetics and reverse genetics has been developed such as saturating mutagenesis and tiling mutagenesis that induce random mutation within target gene(s) [PMID: 25141179, 31586052, 27260157, 28118392]. This targeted mutagenesis can construct a mutant library harbouring subtly different mutations within a target gene(s) so that comparative analysis through the mutant library can screen out critical mutation(s) for biological processes. These random mutagenesises have also revealed the function of numerous coding genes” following to “…list of coding genes [6–8].” in the Introduction (page3, line 55-56)
      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82) The paper you raised as DOI: https://doi.org/10.1038/nbt.3450 applied CRISPRko tiling mutagenesis to find out critical region embedded 2 kb of p53 binding enhancer region by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances.

      The paper you raised as DOI: https://doi.org/10.1371/journal.pone.0170445 applied CRISPRko tiling mutagenesis to find out critical mutation on MAP2K1 and BRAF protein coding sequence by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      The paper you raised as DOI: https://doi.org/10.1038/s41467-019-12489-8 applied CRISPRko tiling mutagenesis for to find out critical domain from protein coding sequence by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      To clarify the advantages, we will add the following sentences:

      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      *4. Audience: The manuscript is written for the basic research audience, and the method could be applied to the study of regions of interest in many diseases. However, the unexcised use of transposases make the method less desirable than other methods. *

      → Thank you very much for your suggestion.

      We do not agree that the PiggyBac make the method less desirable than other methods.

      As mentioned in our response for reviewer #1 Significance 3, only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. However, sgRNA cassettes by lentiviral delivery is never removed from the genome. In addition, other approaches such as Flp recombinase that reviewer #1 proposed in Significance 1 is not better than PiggyBac because Flp recombinase causes stratal mutation by recombination of multi Frt cassettes that are integrated into nearby genomic regions.

      To clarify them, we will add the following sentences:

      • “However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis.” in the Abstract.
      • “However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812]. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome.” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction. In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Major concerns:

      1) Concern about the Novelty of Functional Analysis Platforms: The authors claim that there are no established platforms for the study of cis-elements or microRNA clusters. This assertion seems inaccurate, as previous studies have utilized Cas9 tiling screens to investigate cis-regulatory elements (CREs) and large-scale screens to probe microRNA functions, as exemplified by the works of Canver et al. in Nature 2015, Gasperini et al. in Cell 2019, and others. *

      → Thank you very much for your suggestion, and we agree with your suggestion.

      We apologize our false claim so that we will delete the following sentences:

      • “In contrast, no functional analysis platforms have been established for the study of cis-elements or microRNA cluster regions consisting of multiple microRNAs with functional overlap” in the Abstract (page2, line 28-30)
      • “While loss-of-function analysis has been conducted for numerous coding genes, very limited progress has been made on non-coding genes and cis-elements.” in the Introduction (page3, line 47-49) The paper you raised as DOI: https://doi.org/10.1038/nature15521 (Canver et al. in Nature 2015) applied CRISPRko tiling mutagenesis to find out critical region embedded 12 kb of BCL11A enhancer region by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances.

      The paper you raised as DOI: https://doi.org/10.1016/j.cell.2018.11.029 (Gasperini et al. in Cell 2019) applied CRISPRko tiling mutagenesis to find out critical region embedded maximum 12 kb enhancer candidates, in addition to CRISPRi tilling candidate screening through one sgRNA by one candidate enhancer, by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances. Additionally, to identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions to find out combinations of critical region embedded in target regions.

      Therefore, to add to acknowledge previous studies and clarify the advantages, we will add the following sentences:

      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82)
      • “To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812]. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9.” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. On the other hand, we could not find previous studies employing Cas9 tiling mutagenesis to investigate miRNA functions. The application for miRNA cluster is also one of the advances.

      2) Advantages of PiggyBack System Over Lentiviral Integration: The paper does not clearly articulate the advantages of their proposed PiggyBack-based system for sgRNA integration over traditional lentiviral integration. Both methods facilitate the random integration of multiple gRNAs, but the paper lacks a comparative analysis or justification for choosing the PiggyBack system.

      → Thank you very much for your suggestion, and we agree with your suggestion.

      We agree that we did not mention why we choose PiggyBac system compared to lentiviral delivery.

      Therefore, we will add the following sentences:

      • “In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system. To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812].” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.

        *3) Lack of Comparative Analysis with Alternative Methods: The authors did not provide a comparison of CTRL-Mutagenesis with other existing screening methods. Such a comparison is crucial for understanding the effectiveness and efficiency of the new method in relation to established techniques. *

      → Thank you very much for your suggestion.

      We agree with the comparison is one of important experiments.

      However, our main claim is validation of tiling mutagenesis using PiggyBac that is only integration system with no footprint. Therefore, we propose our novelty without the comparison and not argue higher / lower efficiency of CTRL-Mutagenesis compared to exiting methods.

      *4) Limitations in Library Resolution: The paper acknowledges the limited resolution of their proposed library. The authors might have explored the use of base editors for enhanced resolution in such screens, as base editing could potentially offer more precise and controlled mutagenesis as briefly mentioned in the discussion. *

      → Thank you very much for your suggestion.

      We agree with your suggestion.

      Base editing is occurred within only editing window. In addition, a major limitation of prime editing is low efficiency (https://doi.org/10.1016/j.tibtech.2023.03.004). Therefore, design of sgRNA for base editor or pegRNA and its editing efficiency requires huge amounts of experiments.

      Our study is proof of concept to validate PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions. Thus, we just discussed limited resolution of our mutant library and proposed the use of base editors for enhanced resolution in the Discussion (page14, line 366-370).

      5) Absence of Functional Data Post-Mutagenesis: A significant limitation of the study is the absence of functional data following the creation of cells with different mutations. While the authors speculate about using differentiation systems or organoids for practical applications, they do not provide empirical data to demonstrate the utility of the CTRL-Mutagenesis approach. This lack of functional validation raises questions about the practical applicability of the method.

      → Thank you very much for your suggestion.

      We agree with your suggestion.

      We would make functional analysis future research.

      In this paper, we just validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      To change our tone that claiming usability of our method for functional analysis, we will change the following sentences:

      • Change “to identify functionally important elements in non-coding regions” to “to induce diverse combination and variety of mutations within more than 50 kb non-coding region” in the Title.
      • Add “However, not much loss-of-function screens of non-coding regulatory elements has been conducted due to ambiguous annotations compared with protein-coding genes. Tiling mutagenesis has been employed to identify critical regions embedded in non-coding regulatory elements by comparative analysis through a mutant library harbouring subtly different regions mutated within less than 15 kb region. Conventional tiling mutagenesis construct a mutant library integrated multiple sgRNA cassettes by retroviral delivery. However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. Herein, combining tiling mutagenesis and PiggyBac transposon that can be removed with no footprint on integrated sites, we established an expanded tilling mutagenesis method named CRISPR- & Transposase-based RegionaL Mutagenesis (CTRL-Mutagenesis). We demonstrated that PiggyBac system could integrated diverse combinations and varieties of sgRNA cassettes.and then CTRL-Mutagenesis randomly induces diverse combination and variety of mutations within more than 50 kb non-coding region in murine embryonic stem cells. CTRL-Mutagenesis would apply for wider non-coding regulatory elements with no risk of non-targeted endogenous gene disruptions.” in the Abstract.
      • Delete “Comparative analysis of mutants harbouring subtly different mutations within the same region would facilitate the further study of cis-element and microRNA clusters.” in the Abstract (page2, line 38-40).
      • Change “The generated random mutant mES clone library could facilitate further functional analyses of non-coding regulatory elements within the genome.” to “The generated random mutant mES clone library could develop to investigate critical regions of non-coding regulatory elements within the genome.” In the Introduction (page4, line 88-90)

        *Reviewer #2 (Significance (Required)):

        1. In summary, while the idea to integrate sgRNA in the genome by the PiggyBack system is interesting the claim of novelty is questionable due to existing methods in the field. The advantages of their system over existing technologies are not clearly articulated, and a lack of comparative analysis with other methods leaves the efficiency of CTRL-Mutagenesis uncertain. *

      → Thank you very much for your suggestion.

      Previous studies about CRISPRko and CRISPRi tiling mutagenesis employ lentiviral delivery of sgRNA cassettes into the genome. However, multi sgRNA cassette integrations have higher risk to disrupt non-targeted endogenous functions. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Nevertheless, lentiviral transposon, one of retrotransposon, cannot be removed from the chromosome. On the other hand, only PiggyBac transposon can be removed with no footprint. Therefore, we aimed to validate PiggyBac system for tiling mutagenesis. Moreover, there is no report that CRISPRko tiling mutagenesis apply for more than 15 kb genomic region. Therefore, we aimed to expand the length of target region.

      Therefore, we will change our claim that our method could expand CRISPRko tiling mutagenesis to more than 50 kb with no risk of non-targeted endogenous gene disruption.

      We will add the novelty and advantage of our method.

      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. However, our main claim is validation of tiling mutagenesis using PiggyBac that is only integration system with no footprint. Therefore, we will propose our novelty without the comparison and not argue higher / lower efficiency of CTRL-Mutagenesis compared to exiting methods.

      In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      2. Moreover, the limited resolution of their library and the absence of functional data post-mutagenesis are significant drawbacks that need to be addressed in future research to ascertain the method's practical utility.

      → Thank you very much for your suggestion.

      We agree with your suggestion.

      We would make functional analysis future research.

      Base editing is occurred within only editing window. In addition, a major limitation of prime editing is low efficiency (https://doi.org/10.1016/j.tibtech.2023.03.004). Therefore, design of sgRNA for base editor or pegRNA and its editing efficiency requires huge amounts of experiments.

      Our study is proof of concept to validate PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions. Thus, we just discussed limited resolution of our mutant library and proposed the use of base editors for enhanced resolution in the Discussion (page14, line 366-370).

      Therefore, we just claimed that we validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Major comments: 1. Authors claim that "CTRL-mutagenesis randomly induces diverse mutations only within the targeted regions in murine embryonic stem (mES) cells.", however, the outcome of mutations is not entirely random since most of the mutations are regional deletions. For example, despite the random distribution of gRNAs per cell, the inner regions like Mirc56_5 or Mirc56_8 are mutated with >80% efficiency.*

      → Thank you very much for your comment.

      We agree that middle regions are tending to be deleted and mutation type induced is not entirely random. However, we do not agree that “the outcome of mutations is not entirely random since most of the mutations are regional deletions.” Focusing on the combinations of mutations as mentioned in the Result (page12, line 302-304), CTRL-Mutagenesis could induce diverse mutation combinations randomly at a moderate degree. In fact, 79.2% of clones harboring multiple mutations were induced different combinations of mutations. In addition, to confirm how mutations occurred within Mirc56 by CTRL-Mutagenesis, we constructed only 87 mutant clones though single cloning. Therefore, it is not completely understanded due to fewer clones compared with conventional CRISPRko tiling mutant library. Of course, we should improve the randomness of mutation combinations, but we already discussed it and proposed solutions in the Discussion (page14, line 366-370).

      Certainly, CTRL-Mutagenesis would be difficult to identify necessary and sufficient genomic region due to incomplete randomness. Nevertheless, there is no report to induce diverse combination and variety of mutations within more than 50 kb genomic region. Hence, CTRL-Mutagenesis should be worth screening out critical regions within more than 50 kb regions.

      To clarify them, will add the following sentences:

      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction.
      • “Four clones (#2_066, #1_021, #1_029 and #1_046) harboured entire Mirc56_X deletions on all analysed Mirc56_X genomic region. In addition to these clones, only 3 pairs (#2_019 and #2_084, #2_038 and #1_023, #1_016 and #1_027) were induced same combination of mutations. Besides, 26 clones had only one mutation from Mirc56_1 to Mirc56_13. On the other hand, there was no mutation on Mirc56_1 to 13 in 11 clones including 5 clones (#2_012, #2_015, #2_054, #2_092 and #2_102) carried no sgRNA cassette for Mirc56 _1 to 13 and 6 clones (#2_017, #2_053, #2_098, #1_003, #1_012 and #1_044) even carried any one of sgRNA cassettes for Mirc56 _1 to 13. Among 48 clones carrying multiple mutations except for clones carrying only one mutation or Intact, 38 clones (79.2%) harboured different combinations of mutations. These results suggested that CTRL-Mutagenesis could induce diverse combinations of mutations.” following to “…different combinations of mutations (Figure 5C).” in the Result (page12, line 304)
      • “Note that CTRL-Mutagenesis would be difficult to identify necessary and sufficient genomic region due to incomplete randomness. Nevertheless, CTRL-Mutagenesis should be worth screening out critical regions within more than 50 kb regions” following to “…to induce regional deletions.” in the Discussion (page15, line 378)
      • Change “diverse mutations” to “diverse combination and variety of mutations” in the Title, Abstract (page2, line 37), Introduction (page4, line 87), Result (page13, line 318), Discussion (page13, line 325), (page14, line 363) Additionally, we do not agree with your suggestions that “the inner regions like Mirc56_5 or Mirc56_8 are mutated with >80% efficiency”. We apologize for the misleading context. These high mutation rates were calculated on only the target sites. Actually, maximum mutation rate on all MIrc56_X genomic regions are 44.8% on Mirc56_10, minimum is 14.9% on Mirc56_2 and an average is 30.9% (attached Figure).

      We appreciate that we can recognize our misleading context by your suggestion. It is more important that the analysis focusing all Mirc56_X genomic regions rather than target Mirc56_X. Therefore, we newly made figure about event occurrence in Mirc56_X genomic regions (attached Figure) as Figure 5D and replaced previous Figure 5D about event occurrence in target Mirc56_X to Supplemental Figure S6.

      To clarify them, we will add the following sentences:

      • “As for event occurrences on each Mirc56_X genomic region, miRNA deletions were dominant and an average of 26.7 Mirc56_X genomic region were induced mutations in 87 clones (Figure 5D). Maximum mutation rate on all MIrc56_X genomic regions was 44.8% (39/87) on Mirc56_10, minimum was 14.9% (13/87) on Mirc56_2” following to “…on the same strand” in the Result (page12, line 309)
      • “__D, __Mutations in 87 Mirc56 random mutant clones. The target sites do not include Mirc56_14, 15, 16, and The vertical axis and bar graphs show event occurrence on each Mirc56 genomic region in 87 Mirc56 random mutant clones. The bar colour indicates each event (Black: Regional deletion, Gray: Indel mutation, White: Intact).” in the Figure legend.

        2. Also, although the authors discuss that the lower mutation frequency observed for Mirc56_2 and 4 may be due to a technical error, confirming this by repeating the experiment would be important to prove the usability of this method.

      →Thank you very much for your comment, and we agree with your suggestion.

      We had already constructed bulk PB mES cells twice and showed Figure 4B combined these experimental replicates.

      To clarify that we constructed bulk PB mES cells twice, we changed Figure 4B as attached and will add the following sentences:

      • “Even though these bulk PB mES cells were constructed twice, it seemed that sgRNA cassettes for Mirc56_2 and 4 were difficult to integrate into the genome.” following to “…were rarely detected” in the Result (page11, line 273)
      • “In addition, we suspected technical errors so that we constructed bulk PB mES cells twice. Unfortunately, their low integration frequencies were not improved.” following to “…efficiency or cell growth.” in the Discussion (page14, line 346)
      • “Bulk1 and Bulk2 indicate the experimental replicate.” following to “…next-generation sequencing (NGS).”in the Figure legend (page22 line 563) In addition, we re-sequenced sgRNA donor vector for Mirc56_2 and 4, and will add the following sentences:

      “We firstly doubted that their low integration frequencies were caused by any mutations on PB transposon of sgRNA donor vector, on especially ITR or ID that are important for integration efficiency [PMID: 15663772]. Therefore, we sequenced PB transposons for Mirc56_2 and 4 again. However, we could not find any mutations on their PB transposon.” following to “…efficiency or cell growth.” in the Discussion (page14, line 346)

      Moreover, to confirm the technical error, we are going to confirm sgRNA pool imbalance in donor vector library by amplicon short-read NGS.

      *3. Additionally, the experiments were performed on the haploid X chromosome of a male cell line. It is questionable whether this method can be generalized to other regions located in the other chromosomes. Clarifying These points would be essential especially because the focus of this manuscript is to describe the efficiency of this novel methodology. *

      →Thank you very much for your comment.

      We expect that CTRL-Mutagenesis could be valid on other biallelic locus.

      Therefore, we raised predicted issue such as complex genotyping and proposed one solution.

      When we target other biallelic locus, we must determine whether the combination of mutations induced are cis- or trans-mutations. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.

      We will add the following sentences:

      “In this study, CTRL-Mutagenesis was validated by genotyping on mono allele in male mES cells to avoid investigating whether the combination of mutations induced are cis- or trans-mutations. All genotypes on Mirc56 should be hemizygous and these mutations induced might be cis-mutations so that we determined the genotypes by amplifying approximately 200 bp around the target sites. However, we did not confirm large mutations such as deletion of the genomic region between target sites and inversion. Long-read sequencing might capture their large mutations. Besides, we also expect that CTRL-Mutagenesis could be valid for ROI on biallelic autosome and X chromosome in female. Therefore, it is required to determine whether the combination of mutations induced are cis- or trans-mutation. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.” in the Discussion.

      *4. The limitations of the methods seem not to be fully described in the manuscript and must be clarified. Compared to the previous studies (see "significance" section for details), this method is inferior in that (1) it is time-consuming because it requires clonal expansion of single cells and (2) it has low throughput because it requires genome sequencing due to the occurrence of deletions. These points should be described for the potential users of this methodology. For example, it may be useful to detail the time consumption in each experimental step in Fig. 4A. *

      →Thank you very much for your comment.

      We do not agree that (1) it is time-consuming because it requires clonal expansion of single cells.

      To confirm the mutations that CTRL-Mutagenesis induced, we did not conduct phenotyping screening such as dropout screening in this study. For further high-throughput screening, CTRL-Mutagenesis could apply bulk mutant mES cells, that is treated with Cas9 and EGFP-positive, for phenotyping screening.

      Additionally, we do not agree that (2) it has low throughput because it requires genome sequencing due to the occurrence of deletions. In this study, to prove our concept that CTRL-Mutagenesis could induce diverse combinations and varieties of mutations such as Indel and regional deletion, we conducted genotyping in all random mutant clones. On the other hand, there are alternative comparative method to improve throughput without genotyping. Combination of phenotyping screening and gene expression assay for target miRNAs or transcript regulated by target cis-element help us obtain clones harboring mutations on functionally critical regions within target region. Finally, we should conduct genotyping to identify critical regions embedded in non-coding regulatory elements.

      Even so, we will add the time consumption in Figure 4A as attached because the information may be useful for potential users as you mentioned.

      *Minor comments: 1. Data and methods are well-presented for reproducibility. The EGFP-positive ratio may be added to Fig. 4C for clarity. *

      →Thank you very much for your kind comment.

      We added the EGFP-positive ratio to Figure 4C and will add the following sentence:

      “The percentage above the box indicates the EGFP-positive ratio.” following to “…the gates of the EGFP filter.” in the Figure legends (page23, line 567)

      2. Enhance referencing accuracy, rectify DOI format in ref 21, and ensure consistency in citation formatting, e.g., ref 32.

      →Thank you very much for your kind comment.

      Along with the transfer, we will modify the style of references and have already confirmed the referencing accuracy in the Reference.

      3. It seems that the experimental condition (e.g. The amount of vectors used for transfection) should be re-considered every time the researcher wants to set up an experiment changing target genomic regions, cell types etc. If so, this also should be described in the text for potential users of this method.

      →Thank you very much for your comment, and we agree with your suggestion.

      We will add the following sentences:

      “This study just validated CTRL-Mutagenesis for 17 target sites in mES cells. Therefore, it might be better to adjust the number of integrated sgRNA cassettes according to the number of target sites and cell types.” following to “…sgRNA cassettes to be integrated.” in the Discussion (page14, line 355)

      *Reviewer #3 (Significance (Required)):

      There were various methods described in the late 2010's which aimed to screen for the functional non-coding regions using approaches such as KO-based, HDR-based, and epigenetic silencing using dCas9 (for example, PMID: 25141179, 26751173, 27708057, 28416141, 31784727). The authors should summarize what would be the strength of their method compared to these previously described methodologies. The strength of this methodology seems to be moderate complexity and cost-effectiveness compared to these previous techniques. It may be difficult for this methodology to become a state-of-the-art method to evaluate cis-element combinations, but it can be beneficial to researchers wanting to set up a low-cost system that can produce moderately complex cell libraries.*

      →Thank you very much for your suggestion.

      We will add to acknowledge previous studies for CRISPR-Cas tilling screens.

      • “Recently, targeted mutagenesis combined forward genetics and reverse genetics has been developed such as saturating mutagenesis and tiling mutagenesis that induce random mutation within target gene(s) [PMID: 25141179, 31586052, 27260157, 28118392]. This targeted mutagenesis can construct a mutant library harbouring subtly different mutations within a target gene(s) so that comparative analysis through the mutant library can screen out critical mutation(s) for biological processes. These random mutagenesises have also revealed the function of numerous coding genes” following to “…list of coding genes [6–8].” in the Introduction (page3, line 55-56)
      • “In addition, the saturating mutagenesis are limited in the length of target region due to an approach basing homology-directed repair although it could introduce random mutations on donor template library harbouring any combination and variety of mutations [PMID: 25141179]. On the other hand, the tiling mutagenesis could expand target length in principle because the length depends on multiplex guide RNA (gRNA) designed to target genomic region. Therefore, tiling mutagenesis has been employed to identify critical regions embedded in cis-elements [PMID: 26375006, 30612741, 26751173, 27708057, 28416141, 31784727]. Tiling mutagenesis requires editor such as Cas9 or epigenetic modifier fused to catalytically dead Cas9 (e.g. KRAB-dCas9), and a library containing multiplex gRNA tiling across target genomic region. In general, random single guide RNA (sgRNA) expression cassette are integrated into chromosomes of host cells by retrotransposon system.” following to “…within a narrow region.” in the Introduction (page4, line 82) The paper you raised as DOI: https://doi.org/10.1038/nature13695 (PMID: 25141179) applied saturation mutagenesis to find out critical mutation on BRCA1 and DBR1 protein coding sequence by HDR-based strategy using donor template library. This method based homologous recombination repair, so that the length of target region is limited. Our method employs tiling mutagenesis whose target length depends on sgRNA designed. We expand the length of target region to more than 50 kb from less than 15 kb previously reported. This is our strength compared with this report.

      The paper you raised as DOI: https://doi.org/10.1038/nbt.3450 (PMID: 25141179) applied CRISPRko tiling mutagenesis to find out critical region from 2 kb of p53 binding enhancer region by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions. In addition, we expand the length of target region to more than 50 kb. This is one of the advances.

      The paper you raised as DOI: https://doi.org/10.1126/science.aag2445 (PMID: 27708057) applied CRISPRi tiling mutagenesis to find out critical region from 74 kb genomic region around GATA1 and MYC by lentiviral delivery of sgRNA cassettes. Our method employs CRISPRko and PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9. PiggyBac system can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions to find out combinations of critical region embedded in target regions.

      The paper you raised as DOI: https://doi.org/10.1016/j.molcel.2017.03.007 (PMID: 28416141) reported applied CRISPRi tiling mutagenesis to find out critical region from TAD scale (about 200 kb) with low magnification by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      The paper you raised as DOI: https://doi.org/10.1038/s41588-019-0538-0 (PMID: 31784727) reported applied CRISPRi tiling mutagenesis to develop method that can find out novel regulatory element around protein coding by lentiviral delivery of sgRNA cassettes. Our method employs PiggyBac system that can remove the sgRNA cassettes from the chromosome with no footprint. Therefore, our method should be novel method that generates mutant library with no risk of non-targeted endogenous gene disruptions.

      To clarify the advantages, we will add the following sentences:

      • “To identify combinations of critical region embedded in target regions, it would require diverse combinations of mutations or inactivation sites. To induce multiple mutations or inactivated sites, it requires multiple sgRNA cassettes integration. However, multiple integration of sgRNA cassettes have higher risk of non-targeted endogenous gene disruptions and may impair functional analysis [PMID: 23435812]. To eliminate the risk that integrated sgRNA cassettes disrupt non-targeted endogenous genes, it is best way to remove the sgRNA cassettes from the chromosome. Thus, to identify combinations of critical region embedded in target regions with no artifact owing to no footprint by removal of sgRNA cassettes, CRISPRko tiling mutagenesis rather than CRISPRi is better method because CRISPRi requires integrated cassettes that stably expressed sgRNA and epigenetic modifier fused to dCas9.” in the Introduction.
      • “Here, we proposed that DNA transposon system rather than retrotransposon system is more suitable to remove sgRNA cassettes from a mutant library. Transposons are genetic elements that can relocate between genomic sites and there are two types of transposons: (1) DNA transposon is transferred by a "cut and paste" mechanism in which the transposon sequence is cut directly from the genome, and (2) retrotransposon is transferred by a "copy and paste" mechanism in which the transposon sequence is transcribed into RNA and then integrated by reverse transcribed [PMID: 21958341]. Therefore, retrotransposon is never removed from the genome. DNA transposon such as PiggyBac, Sleeping Beauty and Tol2 systems are also used as gene transfer tools in vertebrates [PMID: 26481584]. Especially, PiggyBac leaves no footprint on integrated sites after transposons relocated while other DNA transposon system leaves small insertion on integrated sites [PMID: 34064900]. In addition, excision-only-PiggyBac transposase that can remove transposons but not integrate them, is developed [PMID: 27929521]. Only PiggyBac system can remove transposons carrying sgRNA cassettes from mutant library with no footprint. Therefore, we aimed to validate PiggyBac system for CRISPRko tilling mutagenesis.” in the Introduction.
      • “CRISPRko tiling mutagenesis is conducted for less than 15 kb target genomic region so far [PMID: 26375006, 30612741], while CRISPRi tiling mutagenesis can target more than 70 kb [PMID: 27708057], it is reported that. Hence, it remains unknown unclear how length CRISPRko tiling mutagenesis could expand” in the Introduction. In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      *Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Major concerns, 1, Authors claim "to identify functionally important elements in non-coding regions in the title but there is no evidence of any functional analysis in the manuscript.*

      → Thank you very much for your suggestion, and we agree with your suggestion.

      In this paper, we just validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      To change our tone that claiming usability of our method for functional analysis, we will change the following sentences:

      • Change “to identify functionally important elements in non-coding regions” to “to induce diverse combination and variety of mutations within more than 50 kb non-coding region” in the Title.
      • Add “However, not much loss-of-function screens of non-coding regulatory elements has been conducted due to ambiguous annotations compared with protein-coding genes. Tiling mutagenesis has been employed to identify critical regions embedded in non-coding regulatory elements by comparative analysis through a mutant library harbouring subtly different regions mutated within less than 15 kb region. Conventional tiling mutagenesis construct a mutant library integrated multiple sgRNA cassettes by retroviral delivery. However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. Herein, combining tiling mutagenesis and PiggyBac transposon that can be removed with no footprint on integrated sites, we established an expanded tilling mutagenesis method named CRISPR- & Transposase-based RegionaL Mutagenesis (CTRL-Mutagenesis). We demonstrated that PiggyBac system could integrated diverse combinations and varieties of sgRNA cassettes.and then CTRL-Mutagenesis randomly induces diverse combination and variety of mutations within more than 50 kb non-coding region in murine embryonic stem cells. CTRL-Mutagenesis would apply for wider non-coding regulatory elements with no risk of non-targeted endogenous gene disruptions.” in the Abstract.
      • Delete “Comparative analysis of mutants harbouring subtly different mutations within the same region would facilitate the further study of cis-element and microRNA clusters.” in the Abstract (page2, line 38-40).
      • Change “The generated random mutant mES clone library could facilitate further functional analyses of non-coding regulatory elements within the genome.” to “The generated random mutant mES clone library could develop to investigate critical regions of non-coding regulatory elements within the genome.” In the Introduction (page4, line 88-90)

        2, Genotypes of mutant library, especially Mirc56, 14,15, 16, 17 were not determined due to six tandem repeats. Thus, analysis of the relationship between genotype and biological functions is not possible. Moreover, the authors did not show any phenotypic analysis.

      → Thank you very much for your suggestion.

      The 6 tandem repeats consisted of each approximately 3.3 kb are hard to determine mutations and are uncommon.

      Therefore, we skipped genotyping Mirc56_14, 15, 16, and 17

      Certainly, it is drawback that we did not determine all mutations induced by CRTL-mutagenesis.

      Even so, we could determine the properties of mutant library within 37 kb genomic region from Mirc56_1 to Mirc56_13.

      Therefore, we could conclude that CTRL-mutagenesis could induce diverse combinations and variations of mutations into more than 50 kb.

      3, Multiple gRNA may cause deletion and inversion to targeted loci. With local PCR based amplification, detection of large deletion and inversion can be very difficult. I think the authors should examine and address this possibility more carefully. The definition of indel in Fig 5C should be explained in more detail.

      → Thank you very much for your comment, and we agree with your suggestion.

      We did not confirm inversion and large deletion.

      To confirm whether inversions were happened, we are going to perform PCR walking in several clones and long-read sequencing.

      4, Although the authors showed a variety of PB cassettes (Max is 17), more importantly would be to determine the actual copy number of PB cassettes. Difference between the highest and the lowest EGFP intensities in Fig 2C (Donor 300ng Effector 350ng) is approximately ~100 fold, thus ES clone bearing highest PB vector may contain ~100 copies of PB vector. PB transposon prefers insertion in active genes compared to other transposon system such as Sleeping Beauty and Tol2 transposon. (Yoshida J et al Sci Rep. 2017 Mar 2;7:43613. doi: 10.1038/srep43613.). Higher integration rates of PB vectors have a higher chance of endogenous gene disruptions and may impair functional analysis.

      → Thank you very much for your suggestion.

      We agree that cassette integration number is one important aspect of tiling mutagenesis. To determine actual copy number of PB transposon is useful information when potential user consider optimizing our method for own target region. However, to confirm whether the relationship between mutations induced and sgRNA cassettes integrated, the number of integrated cassette variety is more important because the diversity of sgRNAs variety expressed is more related to the diversity of mutations induced. Therefore, we identified the number of integrated cassette variety.

      To clarify this point, we will add we the following sentences:

      “rather than the copy number of sgRNA cassettes because the diversity of sgRNAs variety expressed is more related to the diversity of mutations induced” following to “…the number of sgRNA cassette varieties.” in the Result (page12, line 297)

      Certainly, we apologize that it is not accurate that “EGFP signal intensity correlated with the copy number of EGFP cassettes integrated into genomes[23]” in the Result (page11, line 249-250). EGFP expression levels are affected by cell cycle so that the paper reported that “Median EGFP intensities correlated with the copy number of EGFP cassettes integrated into genomes”.

      Therefore, we will delete the following sentence:

      “EGFP signal intensity correlated with the copy number of EGFP cassettes integrated into genomes[23]” in the Result (page11, line 249-250).

      To investigate how many copies our concentration of donor vector could integrate, we are going to check actual copy numbers in several clones by qPCR.

      Besides, we agree with your suggestion that “PB transposon prefers insertion in active genes compared to other transposon system such as Sleeping Beauty and Tol2 transposon. (Yoshida J et al Sci Rep. 2017 Mar 2;7:43613. doi: 10.1038/srep43613.)

      Therefore, we will change the following sentence:

      “random TTAA sites across genomes [24]” to “random TTAA sites of transcribed region rather than intergenic region [PMID: 28252665]” in the Discussion (page14, line 357).

      However, Sleeping Beauty and Tol2 transposon remain footprint at integration sites when these transposons move [PMID: 15133768, 23143102]. Especially, SB transposon leaves canonical 5 bp insertion at integration sites so that the canonical 5bp insertion into coding sequence could disrupt the function of endogenous protein frequently. On the other hand, PB transposon remains no footprint. Therefore, excision-only-PBase can remove the PB transposon from mutant library clearly. Thus, it is no worry about that PB transposon disrupt non-targeted endogenous gene impair functional analysis if PB mutant library is treated with excision-only-PBase.

      In addition, we are going to conduct transposon removal by exicision-only-PBase treatment with several PB mES clones, for the proof of concept that CTRL-Mutagenesis can generate mutant library with no sgRNA cassettes.

      5, Most non-coding regions are located at autosomes. Genotyping would be very difficult or even impossible by the current PCR based strategy.

      → Thank you very much for your comment, and we agree with your suggestion.

      This is one of our issues.

      We expect that CTRL-Mutagenesis could be valid on other biallelic locus.

      Therefore, we raised predicted issue such as complex genotyping and proposed one solution.

      When we target other biallelic locus, we must determine whether the combination of mutations induced are cis- or trans-mutations. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.

      We will add the following sentences:

      “In this study, CTRL-Mutagenesis was validated by genotyping on mono allele in male mES cells to avoid investigating whether the combination of mutations induced are cis- or trans-mutations. All genotypes on Mirc56 should be hemizygous and these mutations induced might be cis-mutations so that we determined the genotypes by amplifying approximately 200 bp around the target sites. However, we did not confirm large mutations such as deletion of the genomic region between target sites and inversion. Long-read sequencing might capture their large mutations. Besides, we also expect that CTRL-Mutagenesis could be valid for ROI on biallelic autosome and X chromosome in female. Therefore, it is required to determine whether the combination of mutations induced are cis- or trans-mutation. Haplotype phasing, combined long-read sequencing with SNP markers within ROI on maternal/paternal chromosome, assembles each allele via SNP markers on each read [PMID: 35710642]. Therefore, combining CTRL-Mutagenesis on heterozygotic alleles of cells derived from such as human or murine hybrid with haplotype phasing might simplify genotyping.” in the Discussion.

      Moreover, genome-wide NGS and nanopore Cas9-treated sequencing (nCATs) could also help us to read the mutations without PCR-amplification. However, both methods can obtain reads of target regions with low frequency. Therefore, it is difficult to perform multiplex samples for mutant library.

      *6, Fig 4C, large amounts of Cas9 independent EGFP positive cells suggest the current system is not efficient. *

      → Thank you very much for your comment.

      We cannot agree with your indication.

      In fact, by the cutoff set in Cas9-untreated cells, the EGxxFP system successfully selected at least 76 mutant clones (87.4%) harboring mutations within Mirc56_1 to Mirc56_13. Moreover, we could seed 180 single-cells for single cloning by FACS once.

      To enhance this point, we added the following sentences:

      “Moreover, at least 76 out of 87 PB mES clones have mutations within all analysed Mirc56_Xs (Figure 5C). Therefore, the EGxxFP system could selected ROI mutant mES clones efficiently.” following to “…depended on integrated sgRNA cassettes.” in the Discussion (page13, line 355)

      *Reviewer #4 (Significance (Required)):

      The authors claim "Functional analysis" in the manuscript title but there is no evidence of functional analysis in the manuscript.*

      → Thank you very much for your suggestion, and we agree with your suggestion.

      In this paper, we just validated PiggyBac system for CRISPRko tilling mutagenesis and expanded the length of target regions.

      To change our tone that claiming usability of our method for functional analysis, we will change the following sentences:

      • Change “to identify functionally important elements in non-coding regions” to “to induce diverse combination and variety of mutations within more than 50 kb non-coding region” in the Title.
      • Add “However, not much loss-of-function screens of non-coding regulatory elements has been conducted due to ambiguous annotations compared with protein-coding genes. Tiling mutagenesis has been employed to identify critical regions embedded in non-coding regulatory elements by comparative analysis through a mutant library harbouring subtly different regions mutated within less than 15 kb region. Conventional tiling mutagenesis construct a mutant library integrated multiple sgRNA cassettes by retroviral delivery. However, multiple integration of single guide RNA (sgRNA) cassettes has higher risk of non-targeted endogenous gene disruptions and may impair functional analysis. Herein, combining tiling mutagenesis and PiggyBac transposon that can be removed with no footprint on integrated sites, we established an expanded tilling mutagenesis method named CRISPR- & Transposase-based RegionaL Mutagenesis (CTRL-Mutagenesis). We demonstrated that PiggyBac system could integrated diverse combinations and varieties of sgRNA cassettes.and then CTRL-Mutagenesis randomly induces diverse combination and variety of mutations within more than 50 kb non-coding region in murine embryonic stem cells. CTRL-Mutagenesis would apply for wider non-coding regulatory elements with no risk of non-targeted endogenous gene disruptions.” in the Abstract.
      • Delete “Comparative analysis of mutants harbouring subtly different mutations within the same region would facilitate the further study of cis-element and microRNA clusters.” in the Abstract (page2, line 38-40).
      • Change “The generated random mutant mES clone library could facilitate further functional analyses of non-coding regulatory elements within the genome.” to “The generated random mutant mES clone library could develop to investigate critical regions of non-coding regulatory elements within the genome.” In the Introduction (page4, line 88-90).
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Review Commons - Revision Plan

      Manuscript number: RC-2023-02228

      Corresponding author(s): Gatfield, David

      1. General Statements

      We are grateful to the three Reviewers for their detailed assessment of our manuscript and are delighted about their very constructive and positive evaluations, highlighting the study’s novelty and rigor.

      Briefly, the main points raised by Reviewers 1 and 3 do not involve additional experiments and are mostly about rethinking manuscript structure (e.g. moving data/analyses to the supplement or removing them altogether, as they distract from the main thrust of the story) and making the text overall less dense and more readable.

      Reviewer 3 also raises a number of additional interesting points that we should discuss in our manuscript, which would allow us placing our findings more effectively into the context of the existing literature.

      All these points are very well taken and will be implemented (see below, under 2).

      Reviewer 2 is overall also rather positive – speaking of “a very careful and detailed study that addresses an important issue” and the study being “really rigorous and the logic […] very well explained”; moreover, this Reviewer also shares the view of both other Reviewers that parts of the manuscript (i.e., in particular its beginning) should be shortened.

      Importantly, this Reviewer remarks in addition under “Significance”: “Without additional mechanistic insights suggesting that there is something particular different about the regulation of these mRNAs the manuscript is not of extremely high significance.” – an important point of criticism that we wish to address in our revision, as detailed below.

      2. Description of the planned revisions

      In the following, we detail how we plan to address the points raised by the Reviewers. The order in which we treat the points follows their – in our view – relative importance according to the Reviewers’ feedback. In particular the first item below, under (A), is the main point of criticism that we feel we should address carefully for the future revised version.

      (A) Major point raised by Reviewer 2: “However, the study falls short on addressing the mechanism of this regulation and if it is different of other feeding regulated mRNA oscillations. This diminishes the significance of the study unless additional mechanistic details are provided.” , which is cross-commented both by Reviewer 1: “More importantly, clues to the mechanism (e.g. iron, heme) regulating the rhythmic translation of IRP1 and IRP2 IRE-mRNAs in liver would increase the significance of the work.” as well as by Reviewer 3: “Reading the comment from Reviewer #2 over the lack of a mechanism to explain why only four transcripts with IREs amongst a larger pool are subject to circadian regulation by IRPs somehow reduces the significance of the study, one has to agree that a discovery - likely another component in the system - is wanting. I remain of the view that the present work exposes this "weakness" of the entire field in a global as opposed to a partial manner and in doing so, makes a significant contribution, especially by further sub-classifying the IRE-containing transcripts according to their responsiveness in the diurnal occupancy of their IREs.”

      Our response and revision plan: Indeed, in the original version of our manuscript we established the link to feeding, yet we did not pinpoint the precise molecular cue that could underlie the rhythmic regulation observed on certain IRE-containing mRNAs. We did discuss the molecular candidates quite extensively in the Discussion section of the manuscript (Fe2+; oxygen; reactive oxygen species), and it remains quite obviously the main question whether the observed diurnal control could be mediated directly by changes in intracellular iron availability.

      Of note, the preprint by Bennett et al., for which we cite the initial biorXiv version in our manuscript, was updated very recently (https://doi.org/10.1101/2023.05.07.539729 – see version submitted December 18, 2023). It now includes new data that analyses around-the-clock iron levels also in liver. Briefly, the preprint shows, first, that serum iron is rhythmic with a peak during the dark phase at ZT16 (Figure 1D in Bennett et al.) yet loses rhythmicity when feeding is restricted to the light phase (Bennett et al., Figure 2E), indicating both feeding-dependence and circadian gating. Moreover, liver total non-heme iron – quantified using a method that measures both ferrous Fe(II) and ferric Fe(III) – shows low-amplitude diurnal variations which, however, do not meet the threshold for rhythmicity significance (Bennett et al., Figure 3G). Still, the difference between timepoints ZT4 (lower iron; light phase) and ZT16 (higher iron; dark phase) is reported as significant, with a fold-change that is not very pronounced (not compatible with the observed direction of regulation of Tfrc mRNA, whose higher abundance in the dark phase would rather be in line with lower *cytoplasmic iron levels, as pointed out by the authors.

      Thus, at first sight the analyses by Bennett et al. would appear to answer part of the Reviewer’s question and point towards other mechanisms of regulation than iron levels themselves. However, it should be pointed out that the particular methodology for iron measurements used by the authors includes the use of reducing reagents and hence quantifies the sum of Fe2+ and Fe3+ iron. Large amounts of iron are stored in the liver in the form of ferritin-bound Fe3+, yet the bioactive, low-complexity iron that is considered relevant for IRP regulation is in the Fe2+ form. Therefore, the question whether bioactive ferrous iron levels follow a daily rhythm, compatible with the observed IRP/IRE rhythms described in our manuscript, still remains an open question and warrants a dedicated set of experiments that we are proposing to conduct in response to the Reviewers’ comments.

      Briefly, for the revision we propose to use liver pieces from the two relevant timepoints of our study (i.e., ZT5 and ZT12) and apply a method that allows the separate quantification of Fe2+ and Fe3+ (Abcam iron assay ab83366; this assay can be adapted to liver iron measurements, see e.g. PMID31610175, Fig. 4A). This experiment will provide novel and decisive data on the molecular mechanism that may regulate the IRP/IRE system in a rhythmic fashion and therefore add to the significance of our findings, as requested by the reviewers.

      Moreover, we believe that the outcome of the experiment would be very interesting either way, i.e. if we find rhythms in Fe2+ that are compatible with rhythmic IRP/IRE regulation, we would be able to provide excellent evidence in term of likely molecular mechanism and rhythmicity cue. If, by contrast, we find that Fe2+ is not rhythmic, it will point towards a mechanism that is distinct from simple Fe2+ concentrations.

      In the latter case, collecting additional evidence on relevant alternative molecular cues would be beyond our capabilities for this particular manuscript, as it would require quite sophisticated methodological setup and preparation. For example, one could imagine that measuring around-the-clock liver oxygen levels in vivo – another candidate cue – would be highly interesting, yet we would not be able to conduct these experiments in a reasonable time frame (to start with, we would first need to request ethics authorisation from the Swiss veterinary authorities, which would in itself take ca. 4-6 months before we could even start an experiment). Thus, in the case of non-rhythmic iron levels, we would leave the question of other responsible cues open, but still think that with a balanced discussion of the resulting hypotheses we could provide significant added value to our work.

      (B) Major comment raised by Reviewer 1: “Alas2 is expressed mainly in erythroid cells and not liver, whereas Alas1 is ubiquitously expressed. Therefore, it is possible that Alas2 in this study may originate from red cells/reticulocytes in the liver, and not from hepatocytes.”

      Our response and revision plan: We would like to thank the Reviewer for the comment that is indeed pertinent. It is well established that Alas1 is the main transcript encoding delta-aminolevulinate synthase activity in hepatocytes, and Alas2 is about 10-fold less abundant in total liver RNA-seq data (quantified form own RNA-seq data, not shown).

      We are nevertheless relatively sure that the Alas2 signal comes from low expression in hepatocytes; the best argument in support of this hypothesis is the analysis of single-cell RNA-seq data, as shown in the following Revision Plan Figure 1, which we would be happy to include in a revised version of the manuscript if the reviewers wish:

      (C) Minor comment raised by Reviewer 1: “The paper is dense and not easy to read. For example, the section on Tfrc regulation and NMD regulation is lengthy and perhaps not necessary for the paper and the section on "Previous observations in IRE-IRP regulation...." could be included in the discussion rather in than in the Results section. Some figures could be included in a supplement.” continued in Referee cross-commenting “I agree with Reviewer 2 that the first sections in the manuscript are lengthy and not needed.”; moreover, Reviewer 2: “Also, the manuscript first sections (which mainly describe negative results) seem too long and descriptive.”

      Our response and revision plan: We shall reorganize the paper accordingly, with the aim of making it an easier, shorter, clearer read. Many thanks for the input.


      (D) Minor comment raised by Reviewer 1: “A description of the new anti-IREB2 antibody is needed. What IRP2 sequence was used to generate antibodies?”

      Our response and revision plan: The following information will be included in the manuscript: “Rat monoclonal antibodies against ACO1/IRP1 and IREB2/IRP2 were generated at the Antibodies Core Facility of the DKFZ. Briefly, full-length murine ACO1/IRP1 and IREB2/IRP2 proteins, fused to a poly-histidine tag, were expressed in E. coli and purified on Ni-NTA columns using standard protocols. Purified His-tagged proteins were used to immunize rats and generate hybridomas. Hybridoma supernatants were first screened by ELISA against His-tagged ACO1/IRP1 and His-tagged IREB2/IRP2. As an additional control, supernatants were tested against full-length His-tagged murine ACO2 (mitochondrial aconitase), which shares 27 and 26% identity with ACO1/IRP1 and IREB2/IRP2, respectively. Supernatants reacting specifically with ACO1 or IREB2 were validated by western blotting using extracts from wild-type versus ACO1- or IREB2-null mice.”

      (E) Minor comment raised by Reviewer 1: “A model summarizing the data would be useful.”

      • *Our response and revision plan: Thank you for the suggestion – this will be done.

      (F) “Optional” idea raised by Reviewer 3: “One nuance in the field of circadian biology is that a rhythm is deemed to be genuinely "circadian" when it continues in the absence of zeitgebers. In this sense, although all experiments are valuable, the "collapse" of the rhythm in the paradigms where dietary rhythms have been disrupted makes the phenomenology a candidate "epiphenomenon" rather than being closer related to the biological clock(s). Likewise, in the manuscript we never learn how the liver IRE-binding activity behaves in constant darkness.”

      Our response and revision plan: This is an important aspect that we can clarify more specifically in our manuscript. It is true that constant (darkness) conditions are used to call a phenomenon circadian. We would nevertheless argue that for a rhythmic feature that is specifically found in liver, the constant darkness definition to distinguish circadian from non-circadian is not fully valid because even in constant darkness, the liver clocks are not in a free-running state but continue to be entrained by the SCN clock (it is only the latter that is free-running under these conditions).

      In our manuscript, we actually suggest that the observed rhythms are not a core output of the circadian machinery (Fig. 6 of our manuscript), but indirectly engendered through feeding rhythms, which are coupled to sleep-wake cycles and thus connect in an indirect way to the central circadian clock activity in the SCN.

      In wild-type mice we would therefore expect that irrespective of constant darkness or light-dark entrainment (and assuming ad libitum feeding), the hepatic rhythms of the relevant IRE-containing transcripts would persist in a similar fashion.

      (G) “Optional” idea raised by Reviewer 3: “Where the authors mention in a parenthesis "moreover, there are documented links between iron and the circadian timekeeping mechanism itself", I invite them to take a closer look to the paper Konstantinos Mandilaras and I coauthored in 2012 "Genes for iron metabolism influence circadian rhythms in Drosophila melanogaster". In that work, we showed that RNA interference of genes that are required for iron sulfur cluster formation (including on IRP1) in the central clock neurons of the fly result in loss of the circadian rhythm when flies were kept at constant darkness (not so when they were kept under light:dark oscillation). So this point should probably remain open..”

      Our response and revision plan: We would like to thank the Reviewer for pointing out this interesting connection that would fit well into the context of our manuscript. It should be cited in the context of our current Figure 3, where we measure in vivo and in tissue explants whether IRP-deficiency affects the clock itself.

      To follow Reviewer 3’s idea, we have gone a little further in our analyses of around-the-clock expression data to see if any of the components of the Fe-S assembly machinery is rhythmic itself, which could have the potential to add novel information.

      Briefly, we have used for this purpose our around-the-clock RNA-seq and ribo-seq data from PMID 26486724. In summary, we find that the expression at RNA and/or footprint level is non-rhythmic for the vast majority of genes involved in FeS biogenesis, assembly or transport, with the exception of low-amplitude rhythms for Glrx5 and Iba57 (Revision Plan Figure 2).

      By contrast, all of the following other genes are non-rhythmic throughout (list of Fe-S-relevant genes from PMID34660592): Cytoplasmic/nuclear, all non-rhythmic: Cfd1=Nubp2, Nbp35=Nubp1 , Ciapin1, Ndor1, Iop1=Ciao3=Narfl, Ciao1, Ciao2b=Fam96b, Mms19, Ciao2a=Fam96a; mitochondrial, all non-rhythmic: Iscu, Nfs1, Isd11=Lyrm4, Acpm=Ndufab1, Fdx1, Fdx2=Fdx1l, Fxn, Hspa9 Hsc20=Hscb, Abcb7, Alr=Gfer, Isca1, Isca2, Nfu1

      As these are mainly “negative results”, and as we are also unable to propose a solid possible mechanistic connection between the Glrx5 and/or Iba57 rhythms and the rest of the story of our manuscript, we do not intend to include such data in our manuscript, but are only putting it for the record into this rebuttal.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      NONE

      4. Description of analyses that authors prefer not to carry out

      NONE – we think we can address all points as described above.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02224R

      Corresponding author(s): Austin Smith

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for constructive comments and helpful suggestions which we have adopted to clarify and improve the manuscript. In addition, we have added a link to a web portal that will allow readers to visualise gene expression profiles and create their own plots using our early human embryo UMAP embedding (https://bioinformatics.crick.ac.uk/shiny/users/boeings/radley2024umap_app/). Stefan Boeing created this tool and is added to the author list with agreement of other authors.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, Arthur Radley and Austin Smith designed a new feature selection method for scRNA-Seq, which is a successor to ESFW previously proposed by the same authors. As an evolution of this earlier framework, cESFW is also based on the idea that informative genes share information with other genes, whereas non-informative genes have a more random relative expression. The authors emphasize the key importance of feature selection in the scRNA-Seq workflow and assess the current state of the art for this step. They also propose that better feature selection leads to less data transformation. They show that cESFW outperforms Scran and Seurat feature selection in most cases of synthetic datasets. cESFW is then used in the context of early human development, re-analysing data from several published datasets where they show that they do not require batch correction. They also further strengthen the conclusion that a "2-step" model for TE-ICM and EPI-Hyp differentiation is also present in human embyros. Finally, they map several types of in vitro pluripotent stem cells, in particular primed and naive, to their manifold and study the evolution of the gene signatures during early human development. Overall, the manuscript is well written and presents a solid methodology. The re-analysis of human early development is convincing and justified. The main critic is that the quality of figures can be greatly improved: their resolution is too low and they are hard to read. For instance, more contrasted color schemes could be used to improve clarity, and given the high number of clusters for some UMAPs, indicating the name of some cluster near their centroids should improve clarity.

      We agree that the resolution of the figures should be improved. We had to compress the images to satisfy the size limit for uploaded documents to bioRxiv. Our final submission will be of higher quality (original figures are at 900dpi). With regards to colour schemes, this is a surprisingly difficult problem. We tried multiple colour palettes but could not achieve greater contrast. The suggestion to add key cluster names near to their centroids on the UMAPs is an excellent idea, which we have implemented.

      Comments: Page 2 I think the criticism of PCA is unfair because it is not a true feature selection method, and it is mainly used for computational purposes. I believe that for most workflows, between 30 and 50 PCs are retained, which do not significantly change the results in the downstream analyses. The citation (Yeung and Ruzzo 2001) does not seem appropriate, as they examine cases where only a small number of PCs are retained, outside the context of scRNA-seq.

      We agree that the criticism of PCA is insufficiently justified by the citation. We thank the reviewer for pointing this out and have removed the comment.

      "Furthermore, HVG selection has been found to be biased toward selecting highly expressed genes over low expressed genes." Could the author justify or remove this statement, as the Seurat and Scran methods are specifically designed to consider average expression to determine HVG? The cited article (Yip, Sham, and Wang 2019) raises this issue for methods other than Seurat and scran.

      The reviewer is correct that the provided citation highlights Seurat and Scran HVG selection as relatively insensitive to the average gene expression levels compared with other HVG selection methods. We again thank the reviewer and have deleted the comment.

      More generally, we have shortened the introduction, focusing on cESFW as a new approach to feature selection rather than critiquing alternative methods.

      Page 6 I might have missed it, but I do not understand the number of cells in the early human development dataset also shown in Figure S2B. The Petropoulos et al. dataset alone is larger than the sum of cells from different cell types. Is there some filtering step that is not described?

      We have added text in the data availability section to clarify the cells used in our analysis:

      “The pre-implantation raw counts scRNA-seq data from Yan et al. 2013, Petropoulos et al. 2016, Fogarty et al. 2017, and Meistermann et al. 2021, were compiled into a single gene expression matrix by Meistermann et al. 2021. For information regarding quality control and cell filtering of these 4 datasets, please refer to Meistermann et al. 2021.”

      The unsupervised clustering used to annotate cell types is unconventional (especially with the high number of clusters chosen), which is not a problem, but should be clarified. Improving the figure 3D to make it clearer and providing a cell cluster correlation plot might help to better appreciate the relationship between cell types.

      We agree that the gene expression heatmap in figure 3D contributed little to the interpretation of the data/results. As suggested, we have replaced this heatmap with a cell cluster correlation plot to help appreciate cell state similarities. (Changes in figure 3.)

      It could be emphasized that the ICM/TE branch cell type is a major difference with the mouse topology, as the readers might not be aware that the ICM/TE is an unspecified blastocyst state that only exists in humans.

      There appears to be some misunderstanding around the use of “ICM/TE branch”. The cluster comprises an uncommitted population at the branching point from morula to either ICM or TE, as also described in the mouse embryo. We have adjusted the discussion to make more clear that the two branching point clusters are heterogeneous populations, not unitary cell types or states:

      “The branching populations reside at critical junctures in blastocyst formation, the partitioning of extraembryonic and embryonic lineages. These branchpoint clusters do not define unitary states. On the contrary, cells in these clusters are heterogeneous and may become specified to alternative fates. For example, PDGFRA, a hypoblast marker (Corujo-Simon et al. 2023), and NANOG, an epiblast marker (Allegre et al. 2022), are heterogeneously distributed in the Epi/Hyp branching population. Furthermore, branch cluster boundaries extend beyond the topological bifurcation, potentially indicating that cells remain plastic and may be redirected. This would be consistent with the demonstration in mouse embryos that cells expressing ICM genes remain capable of generating TE up to the late 32-cell stage (Posfai et al. 2017).”

      Page 9 To further substantiate the stepwise ICM/TE and EPI/PrE specification events, authors could project cells from each embryo on the UMAP, and analyze what are the co-occurrence of cells (as performed for instance in Meistermann et al 2021). This should show as reported (and cited by the authors) that some GATA3 positive cells (TE fated) start appearing from late morula stage and that ICM cells almost never co-exist with EPI nor Hyp in embryos.

      We appreciate this suggestion. We have generated the requested plots showing where cells from individual embryos at different developmental timepoints are positioned on our UMAP embedding. (new supplemental figure (New figure, Figure S6). We present a summary heatmap of cell co-occurrence in revised Figure 4. These results offer greater insight than the RNA velocity analysis, which we have moved to supplemental Figure S6. We have added discussion of these analyses in the “Lineage branching blastocyst development” Results section.

      Reviewer #1 (Significance (Required)):

      The presented methodology shows significant value especially in the field of scRNA-Seq, where the critical step of feature selection is often inadequately addressed. Furthermore, this field is characterized by a limited set of feature selection methodologies. cESFW appears to be an important alternative to HVG methods that could improve scRNA-Seq analysis in certain contexts.

      The new findings on early human development are somehow incremental, but a welcome addition to solidify the two-step model and refine the concept of reject cells. The audience for this early development context is specialized, but cESFW will most likely have an impact to the entire field of scRNA-Seq analysis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Here, Radley and Austin present a novel approach for feature weighting in scRNAseq data based on entropy sorting. Feature selection is a central part of scRNAseq analysis, and it is most likely the case that there is no single approach that outperforms all others across all datasets. Hence, innovation in this space is needed for the field. The cESFW method presented here has several appealing properties from a theoretical point of view, and it also performs well on the synthetic and real datasets considered. Nevertheless, there are several major issues that need to be addressed before I can recommend the manuscript for publication:

      1 The original entropy sorting (eq 1 in SI 1) is based on only two discrete states. However, calculating entropy for continuous distributions can be more tricky and it is unclear to me what assumptions are made regarding the gene expression. Could the authors clarify what properties of the distribution are required for the updated ESE equation to be valid? Is the only assumption that values are drawn from the [0, 1] interval? What happens if values are highly skewed, ie forming a bimodal or power-law distribution rather than something close to a uniform distribution?

      We agree that it is beneficial to clarify these points. We have added a section titled “Assumed properties of underlying sample distributions” to the supplemental information. Briefly, we show that the ESS correlation metric is directly linked to the commonly used correlation metric, Mutual Information (MI). A desirable properly of MI is that it is able to capture non-linear/skewed relationships between features. The ES framework and ESS share this property with MI, allowing the ES framework to be relatively robust to presence of non-uniform distributions.

      The main assumption for applying ES is that the features can be meaningfully scaled between values of 0 and 1. For gene expression, an intuitive way of achieving this is to inspect each gene and designate 0 count values as having 0 expression activity, and the maximum counts as having activities of 1, and all values in between existing within the [0,1] interval. A useful property of ES is that we do not need to assume a particular shape or distribution of the samples within the [0, 1] interval. The ES framework is non-parametric and does not require an assumed distribution to calculate the conditional entropy (CE), even in the continuous form. This is possible because the ES framework is formulated by turning the probabilistic form of CE into an ordinary differential equation (ODE), where the only dependent variable, x, is the overlap between the minority state activities of each individual sample. This calculation is explicitly identifiable/calculable, and is permutation invariant, meaning the shape of the distributions of a reference feature (RF) and query feature (QF) does not need to be assumed/defined. In other words, the ES framework quantifies to what degree active expression states enrich/overlap with one another in a manner that is robust to different distribution shapes.

      2 How robust is the procedure for the choice of percentile for normalizing the gene expression scores? Does one get roughly the same results for 90-99th percentile or is it sensitive to this choice?

      We have carried out a sensitivity analysis on the choice of percentile for each of the synthetic datasets and added it to the manuscript. (New figure, Figure S11). We find that on each of our 4 synthetic datasets the final results of cESFW are robust to a wide range of normalisation percentiles.

      3 Similarly, I am concerned about the procedure for how to choose the number of significant genes. How robust is this process? Also, it is not altogether clear how to generalize the procedure outlined on p19. Most potential users would benefit from more quantitative guidelines. In particular, having to rely on interpretation of GO terms typically requires a considerable amount of understanding about the system at hand which could make it challenging to apply the procedure for others. For most users it would be helpful to know how robust the procedure is to this step and also if there could be more stringent guidelines for how to decide which genes to include.

      We understand the reviewers concern regarding the robustness of feature selection on real scRNA-seq datasets. We have now applied our cESFW workflow to peripheral blood mononuclear cells (PBMC) scRNA-seq data, and found cESFW feature selection to be comparable, and by one metric more robust, than Seurat and Scran HVG selection (New Figure S2).

      As cESFW is applied to more scRNA-seq data, we will learn more about how results compare to highly variable gene selection, and how workflows may be adapted to optimise results in different scenarios. For example, we have found that supervising the selection of gene clusters using a small set of markers known to be important in the system of study can help identify which clusters of genes should be retained during gene selection. We have added this to the materials and methods with the following paragraph:

      “Furthermore, we suggest supervising the selection of gene clusters using a small set of markers known to be important in the system of study. In this work, we found that genes known to be important during early human embryo development (FigS4) are enriched in the dark blue cluster of genes, further suggesting that this cluster of genes is more likely to separate cell type identities in downstream analysis.”

      While gene cluster selection supervision in this manner requires a degree of domain expertise, we believe this is not unreasonable for most applications, and is the case for many scRNA-seq analysis pipelines.

      Our primary software contribution is the cESFW algorithm which calculates the ESS and EP matrices. With this manuscript we provide 6 commented workflows for applying cESFW to different datasets (4 synthetic data, human embryo data, PBMC data). We believe these workflows provide a good balance of documented use cases and user flexibility for cESFW usage. This is important because it is advantageous to be able easily to adapt workflows to incorporate domain expertise and different methodologies. Although workflows such as Seurat and Scran are user-friendly, their rigidity can be difficult when wanting to deviate from their standard workflows. In summary, we believe that our provided workflows are suitable for users to implement cESFW, while providing the flexibility to apply adapted pipelines.

      4 The comparison of the clusterings on p6 is not really fair is it? If I understand it correctly, the 3,012 genes identified by cESFW was used to define clusters in fig 3c through unsupervised clustering. The authors then use HVG methods to identify 3,012 genes and then carries out clustering based on those. To evaluate the methods the silhouette score is used, but the labels from the cESFW clustering is used as ground truth. This does not sound like a fair way to compare. Could the authors please clarify, and if needed come up with an approach where the three methods have a more level playing field if needed.

      The reviewer raises a fair point regarding the comparison of cluster identities and ranked gene lists. This issue is a chicken and egg problem, in that we require a baseline to benchmark different methodologies but lack an explicitly defined ground truth. For that reason we used synthetic datasets for initial comparison.

      For the human embryo data, we have presented substantial evidence that our cluster annotations are biologically coherent and consistent with prior knowledge. We therefore consider it legitimate to compare the ranked lists of Seurat, Scran and cESFW. However, we acknowledge the potential bias and have mentioned this in the “Limitations of the study” section.

      In addition, we have now analysed the peripheral blood mononuclear cells (PBMC) scRNA-seq dataset that is used in the tutorial workflows of Seurat and Scran. This PBMC dataset is arguably better defined since it has more discrete populations of cells, and by using the Seurat generated cell type labels we bias the analysis towards Seurat rather than cESFW. The results show that cESFW performs comparably to Seurat and Scran, and that the cESFW ranked gene list may be more stable than Seurat and Scran. These results suggest that cESFW can be widely applicable as a suitable alternative for feature selection. We have included this analysis in the Results and as a supplemental figure (New figure, Figure S2).

      5 The main cESFW.py file in the github repository is clearly well structured and commented. However, I would like to see a much better documentation so that one does not have to go through the source code to understand what functions there are and what they do. In particular, I would like to see a vignette to make it easier for others to incorporate cESFW into their workflows.

      We thank the reviewer for the positive comments regarding our cESFW.py commenting. We accept that our initial submission failed to point the reader directly towards our example workflows that provide step by step, well commented vignettes for using cESFW to analyse scRNA-seq data. In our initial submission we provided 5 workflows (4 synthetic data and the human embryo data), and in the re-submission we have added a workflow for analysing PBMC data. We have updated our cESFW Github to guide users to these example workflows (https://github.com/aradley/cESFW/tree/main).

      Please note, the embryo workflow will be easily accessible through GitHub, whereas the synthetic data and PBMC workflows will be provided through a Mendeley data link (referenced in the manuscript and on our GitHub). However, the content of the Mendeley link cannot be made public until the paper is finalised, as it cannot be changed after publication. We provide a temporary public Dropbox link for the reviewers so that they may access the additional workflows (https://www.dropbox.com/scl/fo/xr5o9xm6490ftjsa55wxg/h?rlkey=maindrxwdqnirsw1en3my5qsr&dl=0).

      Minor:

      Why are the figures not always in order? For example, fig S10 is mentioned before fig S2 on p 6

      Thank you for pointing this out; we have amended the text.

      I am not sure if the indexing in eq 1 (p 18) is correct. j is both on the LHS and it is also being summed over on the RHS. Should one of these be i instead?

      The indexing is correct. Each column j of a matrix refers to gene/feature on the RHS, and in the calculation on the RHS we take the column averages, leading to vector on the LHS that is still indexed by genes/features j. We have clarified this in the text.

      Reviewer #2 (Significance (Required)):

      The work presents a new method for feature selection in scRNAseq. Feature selection is a very important step and can have a big impact on findings. The method presented here is theoretically sound and it seems to provide interesting result when applied to early embryo development. However, as cESFW is only tested for one dataset it is unclear how well the method generalizes to other problems and datasets.

      Appreciation of the utility of cESFW will grow as it is applied to more datasets. However, we would like to highlight that the human embryo dataset consists of 6 independent scRNA-seq datasets from different laboratories, and that cESFW was able to identify common and differing structure between them without any batch correction, smoothing or feature extraction. We have added to our summary that we propose cESFW may be best suited to analysis of transcriptome trajectories in time course and developmental data. However, we have also now performed comparison of Seurat, Scran and cESFW feature selection in a different context, using a reference PMBC scRNA-seq dataset. The results demonstrate that cESFW is a viable alternative for feature selection in that static system also (New figure, Figure S2).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for their thoughtful and constructive comments. The changes to the text and figures made in response to the questions raised have made this a clearer and stronger manuscript. The additional citations suggested by the reviewers helped to further anchor our study within the growing literature on facultative parthenogenesis. Below we have responded to each comment in blue. We have added new data to the manuscript (Fig. 4C, Fig. S10B and Fig. S10D).

      Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      1. Summary: Here Ho et al. provide strong molecular evidence for the production of facultatively parthenogenetic whiptail lizards, through a gametic duplication. As evidenced through multiple routes, including microsatellites, WGS, RADseq, and RBC ploidy, and lines of evidence from multiple specimens, this study is timely in furthering our understanding of the mechanisms underlying FP. The findings are conclusive.

      That said, I have several comments that should be addressed prior to publication. The introduction which addresses FP in other systems fails to cite several key studies that provide strongly molecular support for terminal fusion automixis. Similarly, the study pushes the idea that this is an adaptive trait, however without proving that the parthenogens can themselves reproduce, this is a moot point at this stage.

      That said, my comments are minor. I found this to be an excellent study, well written, comprehensive in methodology, and one that I strongly advocate for publication.

      We thank reviewer 1 for referring to our manuscript as an excellent study and strongly advocating for its publication. We concur with his/her points that evidence for automixis in other systems was not sufficiently referenced and that the adaptive trait hypothesis for FP is somewhat speculative. The text has been modified accordingly (see below).

      Major comments - None.

      Minor Comments: Should be addressed.

      Line 36 - However, data that supports terminal fusion are no longer restricted to microsat data. Studies utilizing RADseq and whole-genome sequencing in snakes and crocodiles have now provided further evidence supporting terminal fusion.

      See: Booth et al. 2023. Discovery of facultative parthenogenesis in a new world crocodile. Biology Letters. 19, 20230129.

      Card et al. 2021. Genome-wide data implicate terminal fusion automixis in king cobra facultative parthenogenesis. Scientific Reports. 11, 1-9

      Allen et al. 2018. Molecular evidence for the first records of facultative parthenogenesis in elapid snakes. R. Soc. Open. Sci. 5, 171901.

      We have now included that automixis in other systems is supported by both microsatellite and NGS data in the abstract of our manuscript. The references have been included in the main text.

      Ln 42 - Evidence suggesting that isolation from males was not a pre-requisite for FP has previously been reported in snakes.

      See: Booth et al. 2011. Evidence for viable, non-clonal but fatherless Boa constrictors. Biology Letters. 7, 253-256.

      Booth et al. Facultative parthenogenesis discovered in wild vertebrates. Biology Letters. 8, 983-985.

      Booth et al. 2014. New insights on facultative parthenogenesis in pythons. Biol J Linn Soc. 112, 461-468.

      Despite the prior evidence to the contrary cited by the reviewer, it is still a commonly held belief among scientists and science journalists that isolation from males promotes or triggers FP. We have placed our findings in the context of other studies, including those mentioned above, that came to the same conclusion that isolation from mating partners is not a requirement for FP. We thank the reviewer for the additional citations, which are now included in the discussion section.

      Ln 48 - Is this really an argument. While an immediate transition to homozygosity will purge some deleterious alleles, given the genome-wide nature of this, there will also conversely have been strong selection for mildly deleterious alleles.

      Even though many FP animals have congenital defects, our data, combined with that of others, show that seemingly healthy animals arise as well. Even if these healthy animals harbor slightly deleterious alleles, the most detrimental alleles would have therefore been purged especially for subsequent generations. We have modified the abstract to be clearer: “Conversely, for animals that develop normally, FP exerts strong purifying selection as all lethal recessive alleles are purged in one generation.”

      Ln 56 - I would recommend the inclusion of both Allen et al. 2018. R. Soc. Open Sci, and Card et al. 2021. Sci Reports, here, as they are members of the elapids, not represented in the other examples.

      These two citations have been added.

      Ln 60 - Recent studies have highlighted the significance of sperm storage in reptiles. For example, Levine et al. 2021. Exceptional long-term sperm storage by a female vertebrate. PLos ONE. 16(6).e0252049, describe the storage of sperm by a female rattlesnake for ~70 months, with two instances of its utilization to produce healthy offspring during that period. Clearly, molecular tools are providing both support for long-term sperm storage, and an understanding of its utilization.

      Recent work has indeed provided new evidence for instances of long-term sperm storage and the two mechanisms are no longer competing hypotheses, but it is clear that both mechanisms exist in nature. We have modified the text accordingly to include “Nevertheless, clear examples of long-term sperm storage have also been documented in the recent literature (29), underscoring the need for molecular methods such as MS analysis or sequencing data to elucidate the underlying mechanisms.”

      Ln 68 - American Crocodile would also be suitable to include here.

      This has now been included in the list of examples of endangered species.

      Ln71 - The problem with this hypothesis is that parthenogens produced through FP tend to have very low viability. For example, Adams et al. 2023. Endangered Species Research, follow a cohort of sharks produced through FP and all survive. Similarly low levels of survival are reported across other systems for which FP was reported. More likely, FP is simply a neutral trait. The mother is not negatively impacted through producing parthenogens and can go on to produce sexual offspring. Few instances report successful reproduction of a parthenogen. See pers. Comm in Card et al. 2021. And Straube et al. 2016.

      We thank the reviewer for the comment and agree that more data on the successful reproduction of parthenotes are needed to claim that FP is an adaptive trait. We have modified the text to include that studies on “the successful reproduction by FP offspring” are needed to support this hypothesis and have included the Straube et al. 2016 citation. We decided to omit the Card et al. 2021 citation as the reports of second-generation FP was through personal communication mentioned in this study and the results themselves have not yet been published.

      Ln 79 - I doubt that there is a desperate need for this for conservation. However, I think there is a need to simply further our understanding of basic biological function, given that it is not uncommon, and is phylogenetically widespread in species lacking genomic imprinting.

      We agree that understanding FP as a basic biological function is important in light of the realization that it occurs more commonly than previously thought. We have added this aspect to the text: “A better understanding of the triggers and molecular mechanisms underlying FP and the fitness of the resulting offspring are therefore needed in a variety of contexts. These include: to understand a fundamental biological mechanism and its significance in vertebrate evolution, to aid in conservation efforts including captive breeding programs, and to possibly harness FP in an agricultural context (28).”

      Ln 85 - It would be worth citing Card et al. 2021., here given that they used genome-wide ddRAD markers to show support for terminal fusion.

      The citation has been added.

      Ln 91 - Better citations here are Card et al. 2021. Allen et al. 2018, and Booth et al. 2023, which all utilize either RADseq or WGS.

      These citations have been added.

      Ln 95 - The conclusion of genome duplication here was supported only by a small number of microsatellite loci. As such, given that terminal fusion has been supported through genome-wide markers in other species of snakes and crocodiles, the conclusion of genome duplication is likely incorrect.

      In light of the other examples that show terminal fusion in snakes, we have removed this sentence.

      Ln 96 - I would strongly disagree with this statement. Allen et al. 2018, Card et al. 2021, Booth et al. 2023, all provide evidence of heterozygous loci and thus support terminal fusion. While no species-specific chromosome level reference genome is available for any of these species, the fact that levels of heterozygosity are below 33% percent supports terminal fusion. Rates over 33% support central fusion, but have not been reported in any vertebrate to date. AS such, I would recommend the removal of this statement.

      We agree that the studies listed by the reviewer all support terminal fusion in snakes and crocodiles and therefore, we have removed the statement.

      Ln 121 - Recent work in Drosophila mercatorum and D. melanogaster suggest that three genes play a role in the activation of FP in unfertilized eggs. In this case, through the fusion of meiotic products. That said, it is plausible to assume that FP in these lizards has an underlying genomic mechanism that is not related to isolation from males. See Sperling et al. 2023. Current Biology. 33, P3545-P3560.E13.

      Clearly isolation from males is not a key trigger in FP in whiptail lizards and other vertebrate species. With recent work from Sperling et al. 2023 and the fact that selection has led to increases in parthenogenesis in birds, an underlying genetic mechanism may well be at play. We have cited and addressed this in the discussion and propose identifying the genetic basis for FP in whiptail lizards in future studies.

      “Recent work identifying key cell cycle genes inducing FP in two species of Drosophila (71) and selection resulting in higher incidences of parthenogenesis in birds (24, 33) suggest a genetic basis for the initiation of FP. [...] Additional whole-genome sequencing data for species with documented FP will aid in the understanding the genetic basis, propensity, and evolutionary significance of FP.”

      Ln 126 - While these data strongly support FP of the two unusual A. marmoratus appearing offspring, can long term sperm storage be ruled out. Either through captive history or allelic exclusion of other males in the group?

      We have added the following sentence to the text: “Given that all of these offspring are female, inherited only maternal alleles, and animal 122 had no history of being housed with a conspecific male during its lifetime, both interspecific hybridization and long-term sperm storage are all but ruled out and FP is strongly supported.”

      Ln 171 - 191 - Given that the topic of this manuscript is the genomic mechanism underlying FP in this species, are these data necessary? These are not discussed later and as such I would recommend that they are moved supplemental material. Otherwise, they simply clutter that manuscript and detract from the key question. Indeed, they are important to show that the genome constructed is of high quality, but online Supp Mat is the place for that here.

      We chose to keep this section in the main text for the following reasons: There is still a lack of published reference quality genomes for many reptile species and therefore we want to highlight that this A. marmoratus reference adds not only to the understanding of FP, but also expands the small list of reptile genomes and makes the first Aspidoscelis genome available to the community. The high quality and contiguity of the genome (as indicated by the high N50 value and BUSCO score) is important to emphasize in the main text because the absence of any heterozygous regions in FP animals supports a mechanism of post-meiotic genome duplication. We would not want to bury these key points in the supplement.

      Ln 296 - Comparable estimates were made for parthenogenetic production in wild populations of two North American pitviper species. See Booth et al. 2012. Biology Letters.

      In Booth et al. 2012, 2 out of 59 litters of the two pitvipers (3.39%) were identified to contain FP offspring and these results are very similar to our reported rate of FP in whiptail lizards. We have now included this similarity in our discussion. “Interestingly, these rates are similar to what has been reported for wild populations of two North American pitviper species (10)”.

      Ln 312 - Again, can this really be suggested? Above, the authors state that most FP animals that hatched had congenital defects, and a large number failed to hatch. This does not sound like strong support for generating individuals that counter the effects of population bottlenecks and inbreeding depression. The authors need to take this study further and monitor the long-term viability of the FP individuals that survive.

      We agree with the reviewer that the adaptive advantages of FP reproduction are dependent on the fitness and reproductive potential of FP offspring and present data is insufficient to clearly support this notion. We have modified the text to include that long-term studies are needed to support or refute this hypothesis: “However, support for this hypothesis is predicated on the fitness and reproduction of FP offspring and therefore more long-term studies on seemingly healthy individuals of FP origin are needed.”

      Ln 348 - To be able to provide support for this, you need to track animals long term to understand their reproductive competence, and that of their offspring.

      We have added the text: “To assess whether the co-occurrence of sexual and FP reproduction in vertebrates can indeed be considered a reproductive strategy rather than biological noise will require further studies to assess the reproductive competence and fecundity of offspring produced by either mode of reproduction.”

      Ln 358 - But, the caveat is that the parthenogens must themselves reproduce. This must me stated.

      The statement that parthenogens must be able to reproduce to support a hypothesis of FP as an adaptive trait has been added: “One must now consider the possibility that FP is an adaptive trait and that low rates of successful FP could contribute significantly to genome purification. Such a role for FP hinges on further studies demonstrating the ability of parthenogens to reproduce themselves either through further FP or sexually.”

      Ln 359 - Note that FP can also fix mildly deleterious alleles. Only if it is strongly deleterious will it be lost.

      We now make it clearer that selection only applies to strongly deleterious alleles.

      Ln 361 - See above comments.

      We have modified the text to include that “FP offspring will have low genetic load and only pass on neutral and mildly-deleterious alleles to the next generation.”

      Reviewer #1 (Significance (Required)):

      1. Significance:

      While reports of parthenogenesis have been reported as far back as the early 1900's, it has only been over the last decade that reports are become common. Such that facultative parthenogenesis is no longer considered a rarity, but is recognized now as being relatively common and phylogenetically widespread in species that lack genomic imprinting - particularly reptiles, birds, and sharks. Reasons for this are both an increased understanding that the trait can occur, hence recognizing it as an alternative mechanism to long-term sperm storage, and the ease of using molecular approaches.

      The fundamental questions of recent times have been understanding the mechanisms driving FP. Recent papers utilizing whole genome sequencing and ddRADseq have provided support for terminal fusion automixis in snakes and sharks. Here, this study provides evidence of gametic duplication in whiptails, a mechanism with an alternative outcome in regards to the levels of retained heterozygosity. As such, this study compares to the recent work of Card et al. 2021 (Scientific Reports), and Booth et al. 2023 (Biology Letters), in providing substantive advances in the field.

      The audience for this will be broad. Parthenogenesis is a fascinating topic that attracts significant media attention. See the Altmetric score of recent papers on the topic, particularly Booth et al. 2023 (Altmetric score - ~3100). As such, the study will be of interest to both a broad readership, but will also be of great significance to a specialized group working on parthenogenesis. All round, an excellent paper that has promise to advance the field.

      We thank reviewer 1 for this positive assessment and for putting our work into context.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The researchers bring together microsatellite and whole-genome sequencing data from long-term laboratory cultures of lizards to discover occasional production of parthenogenetic offspring by several species of otherwise sexually producing whiptail lizards ("facultative parthenogenesis, "FP") and to show that these FP-produced lizards have patterns of genomic homozygosity that are incompatible with currently held assumptions about mechanisms of FP. Instead, the FP lizards seem to have been produced by a mechanism that results in almost complete homozygosity, likely a consequence of post-meiotic duplication of genomes from haploid unfertilized oocytes. They also show that FP offspring were produced by females housed with males and along with sexually produced offspring, counter to prevailing assumptions that FP offspring are only produced in situations where mates are not available. Many of the FP-produced offspring did not survive to hatching or had major abnormalities, consistent with a situation where this high homozygosity exposes harmful alleles. Finally, the authors used reduced-representation sequencing (RAD-seq) to survey heterozygosity in 321 wild-collected whiptail lizards from 15 species, showing evidence for strikingly low homozygosity in at least one individual and perhaps up to 5, consistent with the potential for FP in nature. These data are of broad interest in demonstrating several exciting new possibilities. Most importantly, the data hint at a different mechanism of FP than previously assumed, and one that causes immediate near-complete homozygosity. This scenario would likely lead to immediate purging of harmful recessive alleles. If the selective load of this purging wasn't insurmountably high, a lineage with a history of purging could produce FP offspring of relatively high fitness. Other exciting possibilities suggested by the data include the existence of FP even in a setting where mating occurs and in natural populations, versus just captivity.

      Major Comments:

      I found it difficult to impossible to sort out exactly what the researchers did and with what lizards. For example, in line 107, they refer to a "systematic MS analysis" for all individuals of gonochoristic species in their laboratory, but where are these data? Indeed, at this early spot in the paper, the introduction from here on out suddenly reads like a discussion. What would be better here would be to summarize what was known and wasn't known about the system and questions involved, why gaps in knowledge were important, and what the researchers actually did for this paper. In my opinion, the paper would be a much easier read if the researchers left the results and interpretation for later in the paper.

      As a consequence of the reviewers’ comments, the text of the manuscript has undergone major revision, and we trust that reviewer 2 will find this new version far more accessible. The MS data collection of more than 1000 individuals is the subject of another ongoing study and was only mentioned peripherally here to put the identification of FP into context. As most of the MS data relates to gonochoristic reproduction and interspecific hybridization, we are only presenting the data that are directly relevant to this manuscript as part of this study. To our knowledge, there is no common repository to upload raw MS data, but we have provided the data for the FP animals and controls discussed in this paper in the Github repository (see section “Data availability”).

      Even with this suggested fix, however, the data are still too inaccessible and analyses too opaque. For example, in line 202, a critical definition is laid out regarding heterozygous sites as those having "equal support" for two alleles. What do the researchers mean by "equal support"? My presumption is that this is something about equal or close to equal numbers of reads, but this definition needs to be spelled out and justified because it underpins much of the downstream analyses. A similar problem occurs in line 208-209, where the authors make a statement about limiting further analysis to positions in the genome where the coverage is "equal" to the mean sequencing depth.

      We have changed the text to “we defined heterozygous sites as those having two alleles supported by an equal number of reads. This stringent requirement was chosen to limit the search to apparent heterozygous sites with strong support, decreasing the chance of false positives.”. We further look at only sites where the coverage is equal to the average sequencing depth to exclude regions where over-assembly and collapse of repetitive elements would artificially increase the coverage.

      Another data/analysis issue emerges with the components of the manuscript that deal with mixoploidy. As far as I can tell, these data come from one sexually produced lizard, one FP A. marmoratus, and one FP A. arizonae. While the reports of bimodality of nuclear size are certainly interesting, the data and discussion are no more than an anecdotal case study in the absence of careful replication across multiple FP lizards and comparison to sexually produced lizards. Without these data, the conclusion that “Animals produced by facultative parthenogenesis are characterized by mixoploidy” (Figure 4 caption; also see lines 324-331) is far too strong.

      We have added animal IDs to figure legends 4 and S10 to clarify that these erythrocyte staining come from two FP A. marmoratus, and one FP A. arizonae. In addition, imaging from two sexually produced control animals (1 A. marmoratus and 1 A. arizonae) have now been included in S10 (as S10B and S10D). We also have included an extra panel of flow cytometry data (new Figure 4C) as a complementary methodology for ploidy determination. Both imaging and flow cytometry support similar amounts of haploid cells. With the additional data and clarification, we hope that the reviewer agrees that the observations of mixoploidy are well beyond “anecdotal”. Nevertheless, we have changed the title for Figure 4 to “Detection of mixoploidy associated with facultative parthenogenesis.” We hope that our observations here will indeed inspire future studies to see if mixoploidy is a widespread phenomenon in FP outside of whiptails as indicated by earlier work in birds.

      I had a similar reaction to the discussion of developmental abnormalities and embryonic lethality of embryos of FP origin presented in lines 263-281 (also lines 307-309). What is the baseline level of such abnormalities and the frequency of lethality in sexually produced eggs/embryos/hatchlings, and especially those produced via inbreeding? These comparisons are needed to interpret the significance of the patterns observed in the FP eggs/embryos/hatchings. Analogously, the comparison of the ovaries and germinal vesicles from one FP individual relative to one sexual individual do not tell us anything nearly so definitive as the text in lines 279-281 (also see Fig. S12 title, which is too broad of a conclusion for N = 1). This overly ambitious conclusion also underpins the discussion regarding the potentially adaptive nature of FP with respect to genome purification (lines 341-363; also see lines 47-50). If FP does not actually increase the rate of purging in FP lizards relative to inbred sexual counterparts (sounds like inbreeding is common from line 339), it seems less likely that we can view FP as adaptive at least from this perspective.

      We have now included a comparison between defects seen in sexually produced animals vs FP animals: “six out of 16 FP animals (37.5%) hatched with no discernable developmental defects (Fig. S11A-B). This is in stark contrast to sexually produced animals, where over 98% of hatchlings showed no abnormalities. Additionally, most of the defects noted in sexually produced animals were less severe than in FP animals including bulges in tails or truncated digits.”

      We agree that our statement on the lack of differences between sexually produced and FP animals was too general. We have modified the title of Fig. S12 from “No differences between ovaries and germinal vesicles of Aspidoscelis marmoratus produced by facultative parthenogenesis or fertilization” to "Ovaries of Aspidoscelis marmoratus FP animal 8450 and germinal vesicles of FP sister 8449 revealed no differences in structure and anatomy compared to fertile sexually reproducing animals.” Due to instant complete homozygosity, FP would indeed have a higher rate of purging than inbreeding. While one hypothesis is that FP is adaptive (in large enough populations), our intentions were to highlight the alternative that FP could be detrimental in smaller populations (that already would likely experience high inbreeding rates). We would expect inbreeding to not be common in whiptails relative to other lizards given that they tend to have large population sizes and actively range across generalist habitats.

      A final data concern is with the use of liver tissue for whole-genome sequencing and reference genome assembly (lines 389-390) and then using these data and the reference genome to make conclusions about ploidy/coverage. Liver tissue is very commonly endopolyploid, meaning that coverage could be artificially high for animals for which liver (vs. tail) tissue was used for DNA extraction. In particular, it would be helpful if the researchers consider whether endopolyploidy could have affected their ability to make accurate estimation of coverage and thus, heterozygosity, when libraries generated from diploid (tail) tissues are aligned to a reference genome generated from a polyploid tissue as was done here.

      This is an interesting point and indeed hepatic cells in various organisms have been documented to be polyploid. The proportion of polyploid cells though vary and as far as we are aware, all published studies on polyploid hepatocytes are in mammals (DOI: 10.1016/j.tcb.2013.06.002). Reference genomes have been generated from a variety of tissue sources and liver is commonly used. As most assemblies are for haploid genomes, polyploidy (unlike aneuploidy) does not impact the assembly quality. The reference genome was also from an animal of FP origin and therefore has genome-wide homozygosity that aids in a more contiguous genome assembly by eliminating the phasing problem. For the 10 animals sequenced, genomic DNA was derived from liver for three animals and the rest from tail tissue. The sequencing data generated from either liver or tail resulted in similar coverage levels (Figure S6) and similar levels of heterozygosity (Figure 2A). Minor Comments:

      Line 410: Please explain why the BLAST cutoff was changed from the default.

      The BLAST cutoff was changed from the default 1e-03 to 1e-06 to be more stringent and thereby increase confidence in the BUSCO results.

      Lines 441-443: Please explain why this dataset was seemingly larger than expected.

      Animal 122 was sequenced on one flow cell without any multiplexing with other samples and therefore yielded more reads than other animals sequenced. We subsampled the reads from this animal for analysis, so it is directly comparable with the other WGS data.

      Line 510: The link to the Github repository was broken, so I was unable to access the code and data denoted as available here.

      We apologize for the unavailability of the link at the time of review. Review Commons did not request a reviewer token. The repository will be made public upon journal acceptance. We would be happy to provide a reviewer token in the meantime upon request by Review Commons.

      Figure 1, and other figures featuring comparisons of MS data across parents and offspring: The authors need to engage here with the alleles that do not match either parent here (e.g., allele 282 at MS7), explaining the likelihood that these alleles indeed represent a binning error (or, perhaps, stepwise mutation from parental allele), and these alleles should be flagged. Instead, they bin these unique alleles with the most similar parental allele without any explanation or flagged. The authors do bring this point up in Figure S1, but this issue needs to be addressed in the main text (related point: the mix of red/green in MS16 offspring appear more green than red. Is this meant to denote a probability different than 50:50? If not, the authors should adjust the shading so that this shape is half green, half red).

      We have added to the figure legend that single nucleotide differences are most likely binning errors and are therefore not considered “de novo” alleles. Instead, they are assigned it to the most similar parental allele, consistent with Figure S1. The shading at MS16 has been removed so that it is consistent with Figure 3.

      Figure 3: Indicate that white background for alleles means that allelic inheritance is not determinable, or use the mix of colors applied in Fig. 1 to indicate as such. Unique offspring alleles should be flagged rather than just automatically assigned to the most similar parental allele. Finally, it would be helpful if the alleles were presented within loci from the shorter to the longer alleles.

      We have included in the figure legend that non-shaded alleles are those for which multiple potential parents share the same allele and the inheritance therefore remains ambiguous for this locus. Single nucleotide differences are also now addressed, and sizes are ordered from smallest to largest.

      Figure S7. Indicate visually which panels indicate FP animals.

      We have now indicated which animals are FP and included this in Figure S6 as well.

      Fig. S13. The 5 animals that had especially low heterozygosity should be flagged. The title of this figure should be toned down in light of the tentative nature of the conclusions regarding FP in nature: low heterozygosity could instead reflect, for example, a long history of inbreeding. My reaction to the data is also that the % heterozygosity distribution for many of the species looks continuous rather than the bimodality one might expect under FP vs. sexual reproduction.

      Since FP has not been further confirmed in these animals, unlike those examples from our captive colony, there could indeed be other reasons for low heterozygosity. We have changed the title of the figure from “Facultative parthenogenesis in whiptail lizards collected in nature” to the more neutral “Heterozygosity estimates of whiptail lizards collected in nature.” Since there are so relatively few animals, one would not necessarily expect a bimodal distribution to be apparent in the current data. We did show that the animal with the lowest calculated level of heterozygosity (deppii LDOR30) was a statistical outlier when compared to other individuals of the same species though. Since these animals were sampled across different locations and habitats, the effective population sizes would be assumed to be different as well, reflecting the range of heterozygosity estimates seen here. This has been made clear in the text.

      Reviewer #2 (Significance (Required)):

      General assessment: strengths and limitations. The paper's strengths include the combination of data from lab and natural populations, the characterization of an unexpected means of achieving FP, with dramatic genetic consequences, and the data suggesting that this type of FP is fairly common and occurs even in the context of mating.

      Audience: The biological questions of relevance to these discoveries are of broad interest, and the paper is likely to garner some attention from the life sciences community as whole and the popular press.

      Advance: These data fill an important knowledge gap regarding the mechanisms potentially driving FP in vertebrates, how often FP is likely to occur, and its genetic consequences. The discoveries are potentially conceptual/fundamental, though the extent to which they are ground breaking is not clear in the absence of functional characterization of how FP occurs as well as the need for more rigorous comparisons and replication that I outlined above.

      We thank reviewer 2 for summarizing the strengths of this manuscript, pointing out the broad interest and stating that this work fills an important knowledge gap.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The occurrence of facultative parthenogenesis has been described in a number of vertebrate lineages but the underlying cytological mechanism(s) have remained largely speculative due to sparsity of data. Here, Ho & Tormey et al. provide a detailed analysis of facultative parthenogenesis in gonochoristic species of the lizard genus Aspidoscelis. They show that parthenogenesis leads to a complete loss of heterozygosity (LOH) within a single generation. They attribute the LOH to diploidization through duplication of the oocytes haploid genome after completion of meiosis. This mechanism is consistent with their finding of mixoploidy in erythrocytes of asexually produced offspring. Based on LOH the authors additionally show that facultative parthenogenesis in Aspidoscelis is not condition dependent (no developmental switch): it can occur in the presence of males, alongside with sexual reproduction in the same clutch, and both in captivity and the wild. Finally, the authors show that facultative parthenogenesis is associated with developmental aberrations, likely caused by expression of homozygous recessive deleterious mutations.

      Major comments: In my opinion, this study presents a very comprehensive, careful documentation of mechanistic aspects and consequences of facultative parthenogenesis in a vertebrate. The genomic and microsatellite results leave little to no doubt that facultative parthenogenesis has led to complete LOH in Aspidoscelis. I am particularly impressed by the meticulous analysis of genomic coverage to exclude e.g. false positive heterozygosity due to merged paralogs in the assembly. I also follow the authors conclusion that a post-meiotic "gamete duplication"-like mechanism is likely causative for the LOH (and the mixoploidy of erythrocytes; but I am no expert on that). I was wondering if terminal fusion automixis together with a complete absence of recombination would be worth mentioning as an (probably very unlikely) alternative in the discussion. It would be exciting to corroborate the conclusion of diploidization by genome duplication in the future, e.g. via early embryonic DNA stainings to show the duplication "in action" (if that is practically possible)...? As for this manuscript, I suggest emphasizing the indirect nature of the evidence for the mechanism of parthenogenesis a little bit more.

      We thank the reviewer for highlighting the effort that went into the genomic analysis that led us to our conclusions. In terms of terminal fusion without recombination, we argue that this is not an obvious alternative explanation as a large body of work has established that at least one crossover per homologous chromosome pair is required to advance into meiosis I in many organisms (e.g. see https://doi.org/10.3389/fcell.2021.681123) and therefore the absence of recombination would likely not produce the polar bodies necessary for automixis.

      We have added to the text: “In whiptail lizards, we have not been able to examine post-meiotic oocytes as locating the post-meiotic nucleus within a large yolked egg is inherently difficult. The difficulty is compounded by the unpredictability of which eggs will undergo FP development and the need to sacrifice animals to remove eggs.”

      While the genome duplication mechanism we propose is indeed indirect because we are unable to visualize developing FP embryos, the most parsimonious explanation from the whole-genome sequencing analysis is genome duplication because of the lack of heterozygous regions associated with automixis. In the text, we have made sure to state genome-wide homozygosity as the basis for our conclusion.

      I agree that facultative parthenogenesis in the presence of males hints at a baseline rate of parthenogenesis without requiring a developmental switch. However, this makes it difficult to rule out that sperm played a role in activation of embryonal development (gynogenesis; however I am only aware of gynogenesis in fishes and amphibians)... maybe, the authors want to take this up in the discussion. Were the five parthenogenetic individuals for whole genome sequencing actually produced in the presence of males, too?

      FP has been reported to occur in isolated females for other reptile and bird species, suggesting that sperm activation is at least not a general requirement in FP of amniotes. (Watts, et al. 2006, W. W. Olsen, S. J. Marsden 1954). In all cases in this study, the female mothers were housed with conspecific or heterospecific males. While we cannot completely rule out a non-genetic contribution of sperm in these cases, it would seem to be an unlikely explanation in light of the sperm-independent reproduction by obligate parthenogenesis in other species of whiptail lizards (unlike the sperm-dependence of all unisexual reproduction in amphibians and fish). We decided to not include speculation on sperm-dependence in this manuscript as we have no evidence in favor of it, nor is there any evidence for this in the literature relating to other amniotes. In fact, most examples of FP were reported from isolated females, most likely because offspring were not expected in those cases and prompted further analysis as to their origin.

      I agree with the interpretation of the LOH in the RADseq data as a likely case of facultative parthenogenesis in the wild. However, when looking at figure S13 I noticed some bimodal looking distributions (e.g. in A. guttatus). It may be interesting for future studies to look into what factors influence heterozygosity in natural populations of Aspidoscelis (e.g. inbreeding vs parthenogenesis). Could there be different mechanisms of facultative parthenogenesis in different Aspidoscelis species explaining different LOH intensities?

      The continuous nature of the data may reflect natural variation between individuals and collection at various locations with possibly different effective population sizes and levels of hybridization. Low levels of heterozygosity could be indicative of inbreeding or FP in some cases. This is important to note in future studies and we have added this to the manuscript (“Further fieldwork and analysis will be required to assess the level of FP in natural populations of gonochoristic Aspidoscelis species (and other factors that could influence the observed heterozygosity such as population size, levels of hybridization, and inbreeding) …”). While there are different mechanisms of FP in other vertebrate groups, the most parsimonious hypothesis is that within a genus, the mechanism would be the same.

      The manuscript is well written, the introduction nicely explains the significance of the study, the methods are fully appropriate and the results (and supplementary results) displayed comprehensibly and in great detail. The discussion might benefit from going a bit more generally into the occurrence and mechanism of obligate asexuality in Aspidoscelis. One might e.g. speculate on whether the ability for facultative parthenogenesis in gonochoristic species has facilitated the transitions to obligate parthenogenesis in the hybrid lineages and what peculiarities might predispose Aspidoscelis to parthenogenesis (e.g. are centrioles contributed by sperm required?). In addition, I think the occurrence of LOH due to gamete duplication (facultative and obligate) in invertebrates (e.g. due to Wolbachia) is worth mentioning in the discussion: e.g. there is a similar case in facultative asexual Bacillus rossius stick insects, where the early dividing cells are haploid. Some of them diploidize via duplication later and form the embryo.

      Thank you for complimenting each section of the manuscript and referring to it as well-written. Our lab has a long-standing interest in obligate parthenogenesis. While it is interesting that both obligate and facultative parthenogenesis occur alongside each other in this genus, the mechanisms appear to be fundamentally different, and we would like to focus the discussion on FP in a variety of systems and its potential implications in conservation and evolution. Parthenogenesis in general is a fascinating topic for a broad audience and not discussing another form of parthenogenesis (obligate in this case), the focus remains on FP and keeps the manuscript more accessible for non-specialists. We have included the stick insect as another example of diploid restoration through genome duplication in the discussion.

      Minor comments:

      39-41: I am a bit puzzled by the usage of the term "post-meiotic" to contrast the diploidization through duplication with automixis. Wouldn't one consider polar body fusion after completion of meiosis II also post-meiotic? Maybe I am just not aware of how the term is usually used in this context here...

      We use the term “post-meiotic” because the restoration of an entirely homozygous diploid cell can only occur after the completion of both meiotic divisions. It is our understanding that polar body fusion and meiotic restitution after meiosis I or meiosis II are generally considered meiotic mechanisms in the specialized literature, even though polar body fusion would also occur after the meiotic divisions.

      65: isn't that gynogenesis (sperm-dependent parthenogenesis) in the amazon molly?

      While sperm is required for parthenogenesis in the Amazon Molly, it is an all-female species that exclusively reproduces through gynogenesis. In this case, it is considered an example of obligate parthenogenesis rather than FP.

      78: the term "economically viable" may be a bit puzzling for a biologist's audience. "Economically sustainable" could be an alternative.

      This has been changed.

      129: the Arizona male was referred to as ID 4272 above. Here it is ID 4238?

      This has been corrected. The correct ID is 4272.

      218: please define over-assembly (see line 207)

      The definition of “over-assembly” is collapsing paralogous loci into a single representative sequence. This is now explained in the text.

      263-281: please, indicate a hatching rate/ rate of malformations of sexually produced offspring for comparison.

      A comparison has been added: “This is in stark contrast to sexually produced animals, where over 98% of hatchlings had no abnormalities noted.”

      333: in the haploid cells recessive deleterious mutations would be exposed in the hemizygous state but in the diploid cells in the homozygous state.

      The text has been modified to reflect the difference between haploid and diploid cells.

      470: please, provide more detail for the RADseq analyses (variant calling, calculation of heterozygosity etc.)

      We have elaborated on the analysis in the methods.

      Figure 1B: please, mention in the legend that the shown mechanisms are not exhaustive, e.g. first polar body fusion could occur right after meiosis 1 or polar body formation could be skipped completely.

      This has been added.

      Figure 1C: it may be interesting for non-specialists to name the distinctive morphological characters setting apart the three species in the figure legend and highlight them e.g. with arrows in the figure.

      We have now included in the figure legend characteristic color patterns for each species: “(C) Photographs of Aspidoscelis arizonae with characteristic blue ventral coloration (top), A. gularis with light spots in dark fields that separate light stripes on dorsum (middle), and A. marmoratus with light and dark reticulated pattern on dorsum (bottom).” Since the descriptions are specific and apparent, we did not add arrows to the pictures.

      Reviewer #3 (Significance (Required)):

      Significance: The study by Ho & Tormey et al. substantially enhances the understanding of (facultative) asexuality in vertebrates. In particular, while most reports of facultative parthenogenesis in vertebrates have been attributed to a form of automixis, the authors conclusively show an instance of diploidization through genome duplication, a mechanism functionally similar to "gamete duplication". The study is novel, very comprehensive and of interest for a general audience within the field of evolutionary biology.

      We thank reviewer 3 for pointing out that our study substantially enhances the understanding of asexuality in vertebrates, is very comprehensive and of interest for a general audience within

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We appreciate the thoughtful comments of the reviewers. We have revised the manuscript according to these comments as detailed below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Efficient proteostasis in cells demands efficient clearing of damaged or misfolded proteins, and an important pathway involved in such clearance is the ubiquitin-proteasome pathway. In this system, proteins are tagged with ubiquitin to target them for degradation by the 26S proteasome complex. The conventional 26S proteasome complex consists of a core particle (CP or 20S proteasome) and one or two regulatory particles (RP, or 19S proteasome) to form the singly or doubly-capped proteasome, respectively. Proteasome assembly is a well-orchestrated process that requires proper stoichiometry of proteasome subunits and dedicated proteasome assembly chaperones. This is maintained by fine-tuning their transcriptional and translational regulation.

      This manuscript elucidates an important aspect of how the different proteasome components are transcriptionally regulated upon denervation in mouse muscles for timely and efficiently assembling 26S proteasome. The authors present data that point out towards the model whereby a two-phase transcriptional program (early: day 3-7 and late: day 10-14) activates genes encoding proteasome subunits and assembly chaperones to boost an increase in proteasome content. This involves the coordinated functions of two transcription factors, PAX4 and alpha-PAL(Nrf1) which were important for both early and late phase of the transcriptional program. Their roles were not redundant as loss of one transcription factor was sufficient to prevent induction of various proteasome genes in muscle after denervation.

      In summary, the authors report a novel bi-phasic mechanism elevating proteasome production in vivo, which involves the coordinated functions of two transcription factors, PAX4 and alpha-PAL(Nrf1).

      Major points: 1) It is not clear why PAX4 and alpha-PAL(Nrf1) are both fully required for the transcriptional induction of some proteasome genes upon denervation (with good overlap), while only PAX4 is important for increased proteasome assembly. The authors speculate that this could be due to a stoichiometry problem but an alternative scenario where translation is increased upon alpha-PAL(Nrf1) inhibition would also be possible. This would explain why, for example, the induction of PSMC1 gene expression upon denervation is abolished upon alpha-PAL(Nrf1) inhibition (Fig. 5C) while the protein level is still increased (Fig. 6H). Is that also true for PSMD5 and Rpn9? Could it also be that the loss of function of alpha-PAL(Nrf1) is too detrimental for the muscle so that they induce an alternative stress response pathway increasing proteasome subunit translation?

      We thank the reviewer for this comment. To better clarify this important point, we conducted further experiments to examine the differential effects between PAX4 vs. α-PALNRF1 on proteasome assembly chaperons (Fig. S4b). Our new data show that PAX4 promotes the induction of the assembly chaperone, PSMD5 (S5b) at 3 days after denervation (Fig. S4B). This induction is critical for the increase in PSMD5 protein levels because PAX4 knockout results in decreased PSMD5 protein levels at both 3 and 10 days after denervation (Fig. 4K). α-PALNRF1, however, does not affect the mRNA levels of this chaperone (Fig. S4A). This new result strengthens our conclusion that induced expression of assembly chaperones by PAX4 is key to raising proteasome levels after denervation.

      We cannot rule out an indirect effect of α-PALNRF1 knock-down on protein synthesis, and therefore this potential alternative mechanism is now discussed in the text. It appears unlikely, however, that α-PALNRF1 knock-down is too detrimental to muscle as we do not find any evidence phenotypically for any type of stress or abnormalities.

      2) Pax4 controls Rpt1-2 transcription and these two Rpt proteins form a pair. As Rpt4 is also regulated by Pax4, is Rpt5 also controlled by Pax4?

      We believe the reviewer meant to request the data for Rpt4, because the data for Rpt5 was already included in original Fig. 4G-H. Therefore, we repeated the RT-PCR analysis of PAX4 KO mouse muscles for Rpt4 and now show that its induction requires PAX4 at 10 d after denervation, just when proteasome content is increased (Fig. 4G). At 3 d after denervation, Rpt4 induction is probably regulated by other transcription factors because its mRNA levels at this early phase were similar in muscles from WT and PAX4 KO mice (Fig. 4H). These data, strengthen our conclusions that coordinated functions of multiple transcription factors control proteasome gene expression in vivo. In future studies, we will investigate the specific mode of cooperation and mechanisms by which various transcription factors and co-factors collaborate to enhance the expression of proteasome genes in the early and delayed stages of gene expression within a living organism.

      What about the assembly chaperone for these two pairs: PSMD5 and p27? It would be very interesting to know if there is a transcriptional coregulation based on proteasome assembly intermediates.

      The referee raises an important point, which we also discuss in the text. We now present data showing that PAX4 promotes the induction of the assembly chaperon PSMD5 at 3 d after denervation (Fig. S4B), correlating nicely with the observed changes in protein levels of this chaperon (Fig. 4K). The expression of PSMD9 (p27) however, does not require neither PAX4 nor α-PALNRF1 (Fig. S4). Consequently, we conclude that PAX4 promotes proteasome biogenesis by promoting PSMD5 induction, and in the absence of α-PALNRF1 proteasome subunits can still efficiently assemble into the proteasomes (even though their expression is reduced), due to the induced expression and increased action of the assembly chaperone PSMD5. Our data highlight the intricacy in controlling proteasome levels, through transcriptional regulation of proteasome genes and assembly chaperones during muscle atrophy. We now further document and discuss the regulation of proteasome biogenesis by these two transcription factors in the text and Discussion (p.28).

      3) Fig. 4J: PSMD5 and PSMD13 are not tested in Fig. 4A, G and H. This needs to be done if the authors want to draw the parallel mRNA-protein levels, as in their conclusion. Moreover, the protein levels seem to be much more induced than the mRNA levels, could that be due to increased translation? This could be discussed.

      We accepted this thoughtful suggestion and now present the mRNA levels for PSMD5 and PSMD13 in Figs. 4A, G and H and Fig. S4. The new data does not change our conclusion that protein abundance largely correlate with the transcript levels (Figs. 2 and 4K).

      The reviewer raises an important question that we hope to resolve in the future. As we point out in the revised Discussion section, “the substantial rise in protein levels compared to mRNA levels after denervation suggests potential increased protein translation due to PAX4 loss. Whether PAX4 regulates protein synthesis and thus can affect protein levels beyond gene expression are intriguing questions for future research”.

      4) The conclusion is not correct in this sentence: "Moreover, analysis of innervated and 10 d denervated muscle homogenates from WT, alpha-PAL(Nrf1) KD or PAX4/alpha-PAL(Nrf1) KD mice by native gels and immunoblotting or LLVY-cleavage indicated that loss of both transcription factors is necessary to effectively block accumulation of active assembled proteasomes on denervation (Fig. 6H)". This is not correct, as the loss of PAX4 is sufficient to block accumulation of active assembled proteasomes on denervation (Fig. 4K). So, it could just be that alpha-PAL(Nrf1) KD has no effect on the induction of proteasome assembly after denervation and that all the effect of the double mutant is due to PAX4 loss. This needs to be corrected.

      We thank the reviewer for this thoughtful comment. The text has been revised accordingly.

      Minor points:

      1) I would rephrase the sentence "baseline at 14 d after denervation and showed a sustained low mRNA levels until 28 d (Fig. 2A-F).", as the mRNA levels are still significantly higher that the basal levels for most proteasome genes. Same for the sentence: "RNA sequencing (RNA-Seq) analysis of TA muscles at 14 d after denervation indicated that expression of most proteasome genes is low at 14 d (Fig. S1)". Expression is low compared to what and not being induced doesn't mean they are low. This needs to be rephrased.

      We revised the text accordingly and thank the reviewer for these suggestions.

      2) Microscopy images need more explanation: define the green and red channel and what they are used for in the legend.

      The legends have been updated as requested.

      3) Columns have moved from the Table 2.

      The tables have now been submitted as separate files.

      4) Fig. S3: RT-PCR on NRF-1(NFE2L1) need to be performed to see the extent of inhibition by shRNA.

      We thank the reviewer for this important comment. The data, which was added as new Fig. S3A, shows an efficient knockdown of NRF-1NFE2L1 with shNFE2L1.

      5) In the sentence: "PAX4 maintaining subunit stoichiometry for increased proteasome assembly.", could it be due to the much higher levels of PSMB8, 9 and 10 immunoproteasome subunits upon alpha-PAL(Nrf1) KD (Fig. 6F)?

      We addressed this aspect in Major Point #1, regarding the difference between PAX4 and α-PALNRF1; please see our response. As for the Reviewer’s comment concerning Fig. 6F, we think that the increased expression of PSMB 8, 9, and 10 in α-PALNRF1-KD compared to the double KD or PAX4 KO further suggests a distinct cooperative interaction between these transcription factors in promoting proteasome expression, assembly, and function, which we plan to thoroughly investigate in future separate studies. However, the increased expression of PSMB 8, 9, and 10 can affect the composition of the CP (by replacing their normal ounterpart), but not the RP assembly. CP and RP are known to assemble separately with their own dedicated chaperones; RP and CP then associate to complete the assembly of proteasome holoenzyme (RP-CP complex). Thus, it is unlikely that increased CP assembly alone would increase overall RP-CP assembly.

      **Referees cross-commenting**

      All other comments are relevant.

      Reviewer #1 (Significance (Required)):

      Overall, the work is impactful and timely, reporting the participation of a novel transcription factor, alpha-PAL(Nrf1), along with PAX4, in regulating the transcription of proteasome genes and the subsequent assembly of conventional proteasomes in mouse muscle upon denervation. One limitation is that alpha-PAL(Nrf1) kockdown is only inhibiting proteasome genes expression but proteasome assembly, the reason being still unknown. Most of the conclusions drawn in the manuscript are supported by the experimental data. Better understanding how proteasome homeostasis is regulated upon stressful conditions is an important fundamental aspect of proteasome biology. I would support publication of this manuscript providing the more specific concerns listed are addressed.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The main limitation of this study is that is based on a single model of muscle atrophy: that induced by cut of the sciatic nerve. Another one will nicely complement the findings as fasting atrophy or cancer cachexia model, to see if the two phase is recapitulated with regard to proteasome modulation.

      The referee raises an interesting point, but as we explained throughout the manuscript, we did not use denervation in this study as a model for atrophy but rather as an in vivo model system to investigate mechanisms of protein degradation and proteasome homeostasis in a whole organism in vivo. The reason we selected denervation as an in vivo model for accelerated proteolysis is due to the gradual nature of muscle loss, which allows us to dissect the various phases of proteasome homeostasis effectively. Fasting, as an alternative model, is too rapid for addressing the specific questions that we asked in this study. In addition, in the rapid atrophy induced by fasting the primary physiological mechanism to increase protein degradation in vivo is believed to be through post-synthetic modification of proteasomes, rather than the production of new proteasomes (VerPlank et al., 2019). In future separate studies, we will thoroughly investigate whether the mechanisms discovered here are applicable to other types of atrophy (e.g. diabetes, aging, cancer). The obtained results will be published and fully discussed separately, in part because covering all types of atrophy within a single paper is impractical and goes beyond the scope of the current manuscript.

      Another major concern is that the author do not measure over time during denervation atrophy the mRNA and protein content expression of the two transcription factors that they found crucial in the proteasome induction and assembly.

      We agree with the reviewer that time course would strengthen our conclusions that the two transcription factors are important for proteasome gene induction and assembly. We have added these data showing that PAX4 (Fig. 4I) and α-PALNRF-1 (Fig. 6E) both accumulate in the nucleus at 7 d after denervation, just when proteasome content is maximal (Fig. 3A) and protein breakdown is accelerated (Cohen 2009; Volodin 2017; Aweida 2021). The mRNA levels of PAX4 were presented as original Fig. 4F and indicate that PAX4 is induced already at 3 d after denervation. We have added new RT-PCR data for α-PALNRF-1 showing that α-PALNRF-1 is induced at 7 d and 10 d after denervation (Fig. 6D).

      Major and minor concerns are as follows:

      Typos now and then are present all over the text, as holoemzyme shall be replaced with holoenzyme on page 9, on page 12 proteasome is misspelled on mid page, as well as cellls. By cotrast shall be corrected on page 19. References on page 22 shall be formatted.

      We have corrected the typographical errors.

      • reference 29 on page 7 seems out of context together with the sentences it is coupled with.

      The reference is appropriately located within the text in terms of context, and precisely aligns with the sentence to which it is associated. Reference 29 (Boos 2019) describes a cellular state in which all proteasome genes rise simultaneously.

      • muscle electroporation of plasmid shall be replaced by AAV9 injection that causes less inflammation and more expressing fibers

      We do not understand and see no basis for the referee’s assertion that the “muscle electroporation of plasmid shall be replaced by AAV9 injection”. On the contrary, the electroporation methodology is widely used by many labs because of its many advantages. This in vivo gene transfection approach is extremely useful to study transient gene (or shRNA) effects in adult muscles, while avoiding the developmental effects of genes (or shRNA) that are often seen in transgenic or knockout animals (e.g., the inducible knockout of α-PALNRF-1 caused lethality, see Fig. 6B-C).

      In addition, the electroporation technique offers great advantages from its speed and major cost savings. We have been using it routinely in our lab for in vivo studies, and articles using it from many laboratories worldwide have appeared in all major journals, e.g. see our papers in Nature Communications, J Cell Biol, PNAS, EMBO rep, and papers from late Alfred Goldberg (Harvard), Marco Sandri (Padova, Italy), Jeff Brault (Indiana Univ.) and others. In all studies included in this manuscript that involve electroporation, contrary to the reviewer’s impression, there was no damage or inflammation to the muscles, and we routinely examined histological sections. Finally, for our studies, we always use muscles that are at least ~70% transfected, which has proven adequate for observing gene effects in mouse muscle. In each experiment, transfected muscles are always compared and analyzed in parallel to control muscles (transfected with scrambled shLacz control). In fact, the validity of the in vivo electroporation technique is further confirmed herein by our investigations of transgenic inducible knock-down mice, showing similar effects on proteasome gene expression.

      • the shGankyrin data shall be complemented with overexpression of the same chaperone to see the effects of proteasome expression and assembly.

      We understand the reviewer’s concern but do not believe that such an experiment is necessary since it is well known and there is already extensive evidence in the literature showing that the chaperon Gankyrin is essential for proteasome assembly (Kaneko et al. Cell 137, 914–925, May 29, 2009 (DOI 10.1016/j.cell.2009.05.008). Thus, various Gankyrin mutants have often been used as an inactive control for proteasome assembly in vitro and in vivo (Kaneko et al. Cell 137, 914–925, May 29, 2009 (DOI 10.1016/j.cell.2009.05.008). In fact, Gankyrin’s known function in ensuring not only the proper subunit composition, but also proper conformation of the proteasome holoenzyme (Lu et al., Mol Cell. 2017 Jul 20;67(2):322-333.e6).

      • another important transcription factor driving MuRF1 expression is Twist and it is totally ignored in the discussion, please add it.

      We regret this oversight. We did not mean to slight any authors, although our major new discoveries and focus is on proteasome genes and not MuRF1. However, to satisfy the reviewer, we now discuss in the text Twist and other transcription factors (including SMAD2/3, glucocorticoid receptors and NFkB) capable of inducing the major atrophy-related genes (among them MuRF1).

      • WB in Fig 2 shall be complemented by one in the Supp with more replicates per timepoint

      We accepted this thoughtful suggestion and now present blots from additional normal and atrophying denervated mouse muscle samples as new Fig. S1B. This approach, however, does not change any of our conclusions.

      • please justify why only PSMD10 (gankyrin) has been silenced and not any of the others (POMP, PSMD5, PSMD9)

      We silenced PSMD10 (Gankyrin) as a representative RP assembly chaperone, since it is better characterized than the other RP assembly chaperones (PSMD5 and PSMD9). We kept POMP (a CP assembly chaperone) intact. Since the formation of one proteasome holoenzyme (RP2-CP) requires two RPs and one CP, increasing proteasome assembly is expected to be more demanding for RP assembly than CP. This led us to predict that disrupting RP assembly should be sufficient to block the induced proteasome assembly. This prediction is supported by our data (Fig. 3), and this justification was also added to the revised text to enhance clarity.

      The originality is limited by the fact that Pax4 was already shown to have a role in muscle atrophy and drives the expression of p97 by the same authors. I would be curious to see if treatments in vitro know to induce the proteasome as starvation etc acts through the biphase mechanism showed in this paper, to understand how extendable to other kinds of atrophy is.

      We respectfully disagree that the originally of the present findings is limited, because previously we validated a single proteasome subunit (Rpt1) as a target gene for PAX4 (Volodin 2017), and here we discover novel global coordination of proteasome gene expression by multiple transcription factors.

      As we mention above, muscle denervation was used here as an in vivo model system of catabolic conditions. Unlike prior reports that were limited to cultured cells, our studies focus on the physiological setting in vivo to reveal mechanisms of proteasome homeostasis. In any case, regulation of proteasome gene expression by multiple transcription factors in other types of atrophy has not been investigated but is possible because common transcriptional adaptations activate protein breakdown in different types of muscle atrophy, including a coordinated induction of numerous components of the ubiquitin proteasome system (Jagoe 2002; Lecker 2004; Gomes 2001). In future independent research, we intend to investigate if the two-phase mechanism reported here can in fact be generalized to other atrophy (or stress) conditions.

      Reviewer #2 (Significance (Required)):

      The authors Gilda and co-workers made a great attempt to dissect the induction of proteasome activity during denervation muscle atrophy and discovered a two-phase process which involves two transcription factors Pax4 and NRF1. The manuscript is clearly written and the experiments fully delineated.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Using denervated mouse muscle as a model, Gilda et al. demonstrated that a two-step transcriptional program operates in the process of muscle atrophy after denervation and that proteasome expression-induced enhancement of protein degradation is important. Gilda et al. clarified that the transcription factors PAX4 and PAL/NRF-1 act on this proteasome expression induction and that the induction of these transcription factors and the expression induction of the proteasome gene cluster after denervation are necessary for muscle atrophy using an in vivo mouse model. The experiments were logically designed, and the results presented are considered clear and reliable. However, some of the descriptions in the text lack accuracy and courtesy, and some experiments require additional data to support and strengthen the author's claims. In particular, it is unclear whether PAX4, FOXO3, and NRF-1 work together or whether they have distinct functions. Although the authors claim that there are two stages of proteasome expression induction after denervation, this remains unclear. The authors should clarify the differences in target sequence or target genes and the substitutability of each transcription factor.

      Major comments: 1: In Figure 3A, the results of the immunoblot of SDS-PAGE against 20S proteasome subunits should also be shown to confirm the increase in proteasome activity and amount.

      We would like to clarify this aspect. We show the increased levels of proteasome holoenzyme complex (RP2-CP) by immunoblotting of the native gel, rather than SDS-PAGE gel. This is because the blots of the native gel can assess the levels of the actual proteasome complex, not simply subunit levels in their denatured state as in SDS-PAGE; SDS-PAGE cannot distinguish between free subunits and ones that are incorporated into the proteasome.

      If proteasome activity was increased due to some other mechanisms, proteasome levels would remain relatively constant, while proteasome activity would have increased. However, this is not the case here since our data demonstrates that both RP2-CP activities and levels peak at day 7. Furthermore, the in-gel peptidase assay (Fig. 3A panel b) directly tests the 20S CP activity within the proteasome holoenzyme (RP2-CP complex) using the fluorogenic model substrate, LLVY-AMC. The 20S CP is activated for substrate degradation, only upon its association with RP (RP2-CP complex), since RP opens the substrate entry gate of the 20S. Free 20S itself is inactive, as its gate for substrate entry is closed; for this reason, free 20S can be detected, only after its substrate entry gate is artificially opened by SDS (see free 20S in panel b, but not in panel a).

      2: In Figure 3, the reviewer assumed the conflict between the results of peptidase activity and SDS-PAGE in 14d. Therefore, quantification and statistical analysis should be performed on the results of proteasome peptidase activity and immunoblots to clarify the relationships between proteasome activity and amounts. Immunoblotting against ubiquitin is also needed to confirm the requirement and efficiency of proteasome induction.

      As the reviewer pointed out, it might seem discrepant that peptidase activity at 14 d denervation is lower than its peak at 7d (Fig. 3A, panel a), but SDS-PAGE signal for proteasome subunits seems still high (Fig. 3A, panel d, Rpn2). SDS-PAGE detects total cellular content of proteasome subunits (free subunits as well as ones assembled within proteasomes). However, at any given moment, these subunits are not only in the proteasome holoenzyme complex, but also in different assembly intermediates. When proteasome subunits are transcriptionally induced as in this study, proteasome assembly process is also increased. However, proteasome assembly is a multi-step process, and the fold-induction for each specific subunit is different (Fig. 2A-B). This means that the rate of a certain assembly step would be differently affected for a given subunit, depending on their fold-induction. For this reason, some subunits seem to exist at a high level at 14d (e.g. Fig. 3A, panel d, Rpn2), but they are not yet incorporated into the proteasome complex, because they might be still undergoing assembly process.

      As for the ubiquitin blot, it can be a good indicator for proteasome activity, when proteasome activity is decreased than normal. In such situations, ubiquitinated proteins accumulate (i.e. their signals increase as compared to control), due to their deficient degradation. However, our present study pertains to the opposite situation, where proteasome activity is increased in degrading ubiquitinated proteins. In normal cells, ubiquitinated proteins are hardly detectable due to their rapid degradation. Thus, when proteasome activity is greater than normal, ubiquitinated protein levels will be further decreased than normal. Data become unreliable when the signals are below the detection threshold. For this reason, we provided functional readouts involving the number of muscle fibers (for example, Fig. 3D).

      3: In Figure 3C, the sample labels of shGankyrin and shLacZ are repeated. Would it be mislabeled? In addition, NATIVE PAGE immunoblot analysis against Gankyrin and proteasome subunits are needed to prove the knockdown efficiency and to reveal the assembly defect of proteasome by Gankyrin knockdown.

      To present our findings more clearly, we show one of each sample in the revised figure, rather than the duplicates as in the previous figure (Fig. 3C). We also included the immunoblot data to show that Gankyrin knockdown disrupts proteasome assembly, as seen by the reduced proteasome complex activity and level (Fig. 3C, panels a, b, c, lane 3, see RP2-CP). In Gankyrin knockdown samples, proteasome holoenzyme complex exhibited smeary appearance (Fig. 3C, panel c, see bracketed region in lane 3), as opposed to a discrete band in the controls (lanes 1, 2). This smeary appearance reflects more heterogeneous proteasome populations, due to defects in their composition and/or conformation. This is in line with Gankyrin’s known function in ensuring not only the proper subunit composition, but also proper conformation of the proteasome holoenzyme (Lu et al., Mol Cell. 2017 Jul 20;67(2):322-333.e6).

      4: In Figures 4A, 4G, 4H, 4J, and 4K, the results of shPAX4 against innervated muscle should be shown to estimate the contribution of PAX4 in steady-state conditions. To clarify the innervated muscle-specific function of PAX4, histological analysis and quantification of proteasome gene expression in multiple organs in PAX4 KO mice are needed.

      The reviewer raises an interesting point, but as we explained above, we concentrate here on the major new discovery that multiple transcription factors increase proteasome content in a catabolic condition in vivo, correlating directly with the accelerated protein loss. Regulation of the basal levels of proteasome in normal conditions in various types of cells and tissues is certainly an important issue meriting in depth study and will be the subject for future studies, but it is beyond the scope of this lengthy paper. This point is now discussed in the revised text.

      The tissue distribution of PAX4 and the detailed description of the phenotype of KO mice are also needed to understand and evaluate the role of PAX4 in muscle.

      We added the requested data about PAX4 distribution as Fig. 4I. These data shows that PAX4 accumulates in the nucleus already at 3 d after denervation. Furthermore, we are happy to add further information about the knock-out mouse model. The requested information and a detailed description of how PAX4 KO mice were generated were added to the text. The PAX4 KO mice showed no abnormalities and did not appear in any way different from the wild type littermates.

      5: In Figure 4C, immunoblot analysis against PAX4 is essential to confirm the PAX4 protein knockout.

      We agree and representative blots were added to Fig. 4C.

      6: In Figure 5, peptidase activity and immunoblotting in NATIVE PAGE are needed to reveal the contribution of FOXO3 and NRF-1 in denervated muscle as shown in Figure 4.

      The requested data for FOXO3 using FOXO3 dominant negative (as in Fig. 5A-B) were added as new Fig. 5C-D, showing no effect on proteasome content by FOXO3 inhibition. These new data are consistent with our findings that the expression of only two proteasome subunit genes was affected by FOXO3 inhibition at 10 d after denervation (Fig. 5B). The data for α-PALNRF-1 and the effects of its knockdown on proteasome content and activity were shown as original Fig. 6H (now Fig. 6J).

      The expression of FOXO3 and NRF-1 should also be shown by RT-PCR and immunoblotting as shown in Figure 4.

      We thank the reviewer for this thoughtful suggestion, and as requested, we now show representative blots of transfected muscles to support the graphical data (Figs. 5C-F). These data confirm the efficient expression of HA-FOXO3ΔC or FLAG-α-PALNRF-1 dominant negative inhibitors in transfected muscles. It is important to note that these inhibitors are mutant forms designed to interfere with the normal function of the wild-type endogenous FOXO3 or α-PALNRF-1 proteins, without affecting their transcript levels. Given this mechanism, we believe that Western blotting is a more appropriate technique for assessing their impact, as it provides direct insights into protein expression. In the revised main text and methods, we have now clarified this point.

      Similar to previous comments, the expression of the dominant negative form of Foxo3 and NRF-1 should be performed in innervated muscles to reveal the significance and specificity of Foxo3 and NRF-1 function in denervated muscles.

      As mentioned above, regulation of the normal basal levels of proteasomes is certainly an important issue meriting in depth study and will be the subject for future studies, but it is beyond the scope of this lengthy paper, which focuses on the mechanisms increasing protein content in catabolic conditions in vivo. With respect to FOXOs, there is a large literature on its regulation and roles in normal muscle (please see papers by late Alfred L Goldberg, Marco Sandri and others). Under normal conditions FOXO3 is largely inactive via phosphorylation by insulin-PI3K-AKT signaling (Stitt 2004; Latres 2005; Zhao 2007).

      7: In Figure 6D, the list of genes should be served especially about 27 genes and 69 genes that show common features between NRF-1 KD and PAX4 KO.

      The requested data is now presented as new Table 4.

      8: In Figure 6F, the list of genes that change expression in PAX4 and NRF-1 KD mice is needed.

      We agree and the requested data has now been added to table 5.

      9: In Figure 6H, immunoblotting against ubiquitin is needed to evaluate the contribution of proteasome induction to protein degradation.

      We clarified this aspect in the Major Point #2. Please see our response.

      10: This study lacks the detailed mechanisms by which PAX4, Foxo3, and NRF-1 regulate the expression of proteasome genes. The contribution of these transcription factors is revealed by experiments, but the specific sequence that these transcription factors bind and how transcription factors are induced in denervated muscles is not clarified. As shown in the figures, the ChIP assay provides convincing results, but the detailed sequence or map of the promoter region of proteasome genes must be shown in the figures to clarify the target sequences of NFE2L1 and PAX4, FOXO3, and NRF-1. In addition, the luciferase assay would support the results of the ChIP assay.

      Again, the reviewer raises an important question that we plan to resolve in the future. As mentioned, our findings strongly suggest a novel coordinated mechanism involving multiple transcription factors that control proteasome content in catabolic states in vivo. The enclosed revised manuscript primarily focuses on elucidating the contributions of individual transcription factors (α-PALNRF-1, PAX4, NRF-1NFE2L1 and FOXO3) to the induction of proteasome genes, revealing a significant overlap in genes regulated by multiple transcription factors. The specific mode of cooperation among these and other transcription factors and cofactors is certainly an important question for future studies, but it is beyond the scope of this lengthy paper. In the revised text we have now clarified this point (page 27). In addition, we agree that clarifying how the transcription factors are induced in denervated muscles merits some considerations and a paragraph was added to the Discussion (page 26) concerning possible mechanisms. For example, it is possible that the transcription factor STAT3 is involved in PAX4 induction because, based on previous microarray and ChIP data in cultured NIH3T3 cells, PAX4 was identified as a target gene of STAT3 (Snyder et al., 2008), and STAT3 becomes activated after denervation (Madaro et al., 2018).

      We are delighted that the reviewer found the results obtained through the ChIP assay convincing. Given the extensive scope of our investigation and rigorous analyses of dozens of genes, it is not feasible to generate luciferase-encoding plasmids for all of them. However, in response to the reviewer's request, we have carried out predictions of the binding sites of the 4 transcription factors within the minimal promoter regions (300 up- and 1000 down-stream to TSS) of the 64 proteasome sequences. The predicted binding sites are now listed in Table 2A-D. These new data further support our key findings that multiple transcription factors control proteasome gene expression in a catabolic physiological state in vivo.

      11: The results of the loss of transcription factors are well done, but the authors should also try to estimate the effect of overexpression of transcription factors in muscle. If the overexpressed transcription factors cause proteasome induction and muscle fiber mass reduction, these results strongly support the importance of transcription factor-mediated proteasome enhancement.

      We understand the reviewer’s comment but do not believe that such an experiment is necessary to support our key findings about proteasome gene induction by multiple transcription factors in vivo. In fact, we have specifically refrained from pursuing overexpression studies in this context due to the apparent coordination and some potential interdependence between the functions of PAX4 and α-PALNRF-1 transcription factors in inducing proteasome genes. Manipulating one specific gene through overexpression could potentially disrupt this delicate coordination and yield misleading results.

      In addition, there are several limitations of gene overexpression in mouse muscle, as it may not be as efficient and does not represent physiological conditions. Therefore, to validate gene functions in a physiological setting in vivo, we generated transgenic animals with the gene of interest specifically knocked-out or knocked-down. Utilizing transgenic mice lacking the gene of interest, though time-consuming, is a widely accepted and common approach that proves to be the most suitable method for specifically demonstrating the involvement of a particular gene in a physiological process, enabling a targeted and controlled investigation of its role and providing valuable insights into its contribution to the observed effects.

      Minor comments:

      12: The authors should describe the inducible KO mice more carefully and correctly. In the Results section on P12, the description of "whole body Cre+ mice" confuses the readers in understanding the mechanism of inducible Cre-mediated KO.

      We agree and have added the information requested about the KO mice to the main text and a detailed description in the methods section.

      13: In Figures 6B and 6C, the number of mice and the meaning of the asterisk should be described correctly. Is it statistically significant?

      We agree. By accident the number of mice and sign for statistical significance were omitted during processing. The correct sign was added to Fig. 6B-C, and the number of mice used, and the meaning of asterisks were added to the corresponding legend. N=10 mice per condition. **, P

      14: There is no description of Figure 6E in the manuscript. The authors should include it.

      In the original version of this paper, we refer to Fig. 6E in the text on pages 21 and 25. Also, the presented illustration is fully described in the corresponding legend.

      Reviewer #3 (Significance (Required)):

      This paper clarified a novel mechanism of proteasome induction by transcription factors in denervated muscles other than Nrf1 (NFE2L1), which has been shown to contribute to the induction of proteasome gene expression in cultured cells. This is an important paper for expanding the understanding of the field. It is also important because it has demonstrated the potential for new therapeutic targets in diseases such as type 2 diabetes and cancer.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We would like to thank all three reviewers for their careful and comprehensive reviews of our manuscript. We have taken on board all the comments and have made appropriate changes to improve the manuscript. The more substantive changes are to the structuring of the text in Introduction section, and to improving the clarity of Figure 2 after reviewers’ comments (we have added extra panels to A, F and G). Other minor changes are individually signposted in each paragraph of the point-by-point response attached below.

      We performed a number of pieces of additional analysis to address reviewer comments. To be as transparent as possible we make these and all other data analyses available in the form of .html files exported by Rmarkdown, hosted at https://joebowness.github.io/YY1-XCI-analysis/.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: This manuscript uses differentiation of the highly informative inter-specific hybrid mouse ESC to follow features of genes that inactivate slowly. Resistance to silencing is reflected in reduced change in chromatin accessibility and the authors identify YY1 and CTCF as enriched amongst these 'slow' genes. This finding is provocative as these factors have been reported to enrich at both human and mouse escape genes. The authors go on to demonstrate that eviction of YY1 is slowly evicted from the X, and that removal of YY1 increases silencing.

      Minor Comments: Overall, the manuscript's conclusions are well supported; however, the brevity of the presentation in some places made it difficult to follow, and in other places seemed a missed opportunity to more fully examine or present their data.

      1. Introduction is only 2 paragraphs and half of the last is their new findings. First part of results/discussion is then forced to be very introductory. In addition, some discussion of escapees, even if predominantly human, seems warranted in the introduction. There are multiple studies that have tried to identify features enriched at genes that escape inactivation that could be mentioned.

      We have now written the introduction as 3 paragraphs instead of 2. In doing this, we have moved the sentence introducing chromatin accessibility from the results section to the introduction. Additionally, we now discuss the studies that focus on escapees (in mouse XCI) in the second introduction paragraph.

      Variation in silencing rates. 'Comparable rankings' cites multiple studies (oddly previous sentence cites only two) - how concurrent are they? Developing this further (perhaps a supplementary table) would inform whether the genes assessed are ones that routinely behave similarly across different studies/lines; and also serve as a resource for future studies.

      To avoid double-citing, we have made this one sentence and have cited at the end of the sentence 7 studies which describe gene-by-gene variability in rates of silencing. The majority of these studies include comparisons of their categories of fast and slow-silencing gene with previous classifications, and they all conclude that there is substantial concurrence. Some examples:

      • Marks et al, 2015, Table S3,
      • Loda et al, 2017, Figure 5,
      • Barros de Andrade E Sousa et al. 2019, Figure 2
      • Pacini et al. 2021, Figure 6e,i We believe this is sufficient evidence for our claim that these studies report “comparable categories” (“ranking” changed to “categories” as not all studies strictly rank). A comprehensive gene-by-gene comparison table would likely serve only to highlight differences due the various silencing assays/model systems/classification approaches used in the studies. If required, however, we would be willing to include a supplemental table which collates where gene silencing categories are discussed in each publication, and links to any supplemental files which provide full lists of X-linked genes.

      It would be helpful to give insight into informativity of cross - what proportion of ATAC-seq peaks were informative with allelic information (and similarly, what proportion of genes expressed had allelic information?

      Of the 2042 consensus ATAC-seq peaks we defined on ChrX via aggregating macs2 peaks over all time course samples, n = 821 passed our initial criteria for allelic analysis in the iXist-ChrX-Dom model line (ie they are proximal to the Xist locus in ChrX 0-103Mb, overlap SNPs, and contain sufficient allelic reads). A small number of peaks were additionally filtered out during fitting of the exponential decay model, leaving a final ATAC-seq peak set of n = 790 elements (38.6%) which we focus on in this study. We have added this information to the text (first Results paragraph).

      Our collections of ChrX genes amenable to allelic analysis were not redefined for this study. We used lists of genes defined in our previous ChrRNA-seq study (10.1016/j.celrep.2022.110830). In general, allelic analysis of gene expression is not as limited by the frequency of SNPs, because the sequence length of transcripts (including introns, which are a significant fraction of the reads in ChrRNA-seq data) is much greater than for ATAC-seq peaks. Only a few very lowly expressed genes are not amenable to allelic ChrRNA-seq analysis.

      P5: "can be influenced by Xist RNA via a variety of mechanisms" seems like it this sweeping statement could use expansion, or at least a reference. Authors could also clarify that 'distal elements assigned by linear genomic proximity is their definition of nearest gene.

      The statement that “both [chromatin accessibility and gene expression] can be influenced by Xist RNA via a variety of mechanisms” is intentionally broad to support a negative argument that we do not wish to mechanistically over-interpret the observation that Xi chromatin accessibility loss occurs slower than gene silencing. Nonetheless, we have added two references to studies which report mechanisms for how Xist may influence chromatin accessibility; via recruiting PRC1 (Pintacuda et al 2017) or antagonising BRG1 (Jegu et al 2019). That multiple molecular pathways simultaneously contribute towards the effect of Xist RNA on gene silencing is well established in the field (see reviews such as Brockdorff et al 2020, Boeren et al 2021, Loda et al 2022).

      We have clarified in the text that our definition of “distal” is all REs which do not overlap with promoter regions (TSS+/-500bp). We have also made it clearer that our definition of “nearest” gene refers to linear genomic proximity in both the Results and Methods sections.

      Figure S1 - there are 6-8 other regions that fail to become monoallelic - what are they?

      The regions which stand out most by the colour scheme of the heatmap in Figure S1 are those where accessibility increases on Xi, most notably the loci of Firre, Dxz4 and Xist, which are known to have unique features related to the 3D superstructure of the inactive X chromosome. A few other regions which do not become monoallelic harbour classic “escapee” genes. We have now labelled the locations of escapees Ddx3x, Slc25a5 and Eif2s3x in FigS1.

      The other regions noticeable in the heatmap have no obvious features which explain why they fail to become monoallelic. We have highlighted a region containing intragenic peaks within Bcor (a gene which is silenced in iXist-ChrX mESCs), but many other regions are not in the vicinity of genes. Some of the persistently Xi-accessible peaks within these regions contain strong YY1 or CTCF sites, although many others do not.

      It is also possible that some Xi-accessible peaks are artefacts of mismatches between the Castaneous or Domesticus/129Sv strain SNP databases and ground truth iXist-ChrX genome sequence. The number of these cases are small, and if a misannotated SNP is the only SNP present in a single peak, the peak is discarded by our allelic filtering criteria as it will appear monoallelic in uninduced mESCs.

      Is there any correlation between silencing speed and expression (as previously reported)? If yes, then is there also a correlation with YY1 presence - and is this correlation greater than or less than seen on autosomes?

      The data we present here pertaining to gene silencing kinetics is reused from our previous study. In that work we did indeed observe a significant association between silencing rate and initial gene expression levels (10.1016/j.celrep.2022.110830, Supplemental Information Figure S5F), which has also been reported by multiple groups previously.

      To correlate YY1 binding with gene expression levels, we calculated transcripts per million (TPM) for all genes from our genome-wide mRNA-seq data of uninduced iXist-ChrX-Dom cells (GSE185869). It is indeed true that, on average, X-linked genes classified as “direct” YY1-targets in our analysis have higher levels of initial expression (median TPM 70.8, n=64) compared to non-target genes (median TPM 30.7, n=346). Autosomal YY1 targets are also relatively higher expressed (median TPM 29.6, n=1882) than non-YY1 genes (median TPM 8.0, n=9983). Within the list of YY1-targets, there is no additional correlation between quantitative levels of YY1 ChIP enrichment (calculated in this study using BAMscale (Pongor et al, 2020)) and gene expression (R=-0.05, Spearman correlation).

      Therefore, we appreciate that this correlation between YY1-binding and gene expression levels may be a covariate in the correlation we report in this study between YY1-target genes and slow-silencing. This does not invalidate a potential functional role for YY1 in impeding silencing, as it could affect both variables via common or distinct mechanisms. Nevertheless, in an attempt to account for initial expression level as a covariate, we compared the silencing halftimes of YY1-targets versus non-targets within genes grouped by similar expression levels (low, medium and high-expressed genes). YY1-targets have slower halftimes in each comparison, and this difference is highly significant (p=1.9e-05, Wilcoxon test) for the “medium-expressed” gene group. This implies that YY1 contributes towards slower gene silencing kinetics independently of initial gene expression levels. We have added this panel to Fig2 with an associated sentence in the Results section.

      These new analyses are also appended to the documentation of the R scripts used to generate the main figures in this study (Figure2_YY1association.Rmd), which will all be published to Github.

      It is also important to note that this analysis approach is complicated by the methodology we use to classify YY1 target genes. In this study, we define YY1 targets based on the presence of ChIP-seq peaks overlapping the gene promoters, which is reasonable and widely accepted practice when defining targets of transcription factors. However, as briefly discussed in the Methods, in YY1 ChIP-seq data samples with very high signal:noise (eg Fig3), minor peaks of YY1 enrichment can be detected at almost every active promoter. As enrichment at these peaks is typically much less than at peaks with occurrences of the YY1 consensus DNA motif, we hypothesise that these small peaks result from secondary YY1 cofactors enriched at promoters (eg P300, BAF, Mediator) rather than direct sites of binding to DNA/chromatin. Therefore, for annotating genes as “direct” YY1 targets, we chose to use the YY1 peak set defined from lower signal:noise ChIP-seq data in iXist-ChrX produced with the endogenous YY1 Ab. Nevertheless, this behaviour is likely to confound any analysis correlating YY1 ChIP binding with gene expression.

      Figure 2: Have the authors considered using quartiles rather than an arbitrary division into depleted and persistent?

      We primarily chose this binary classification of REs as either Xi-“persistent” or Xi-“depleted” to maximise the numbers of sequences that could be used in each group as input for the HOMER motif enrichment software.

      It is also not trivial to separate REs into quartiles because our “Xi-persistent” classification includes peaks defined as “biallelically accessible in NPCs”, as well as peaks with slow accessibility halftimes. This is explained in both the Results and Methods but we now have edited Fig2A to make it clearer. Instead of quartiles, we have performed an analysis which keeps “biallelically accessible REs” as a separate category and subdivides the remaining peaks into three groups by halftimes (slow, intermediate and fast accessibility loss). The same trends are evident with this four-category approach as with the two-category approach.

      Importantly, our follow-up analyses which confirm the association between YY1 binding and slow Xi accessibility loss (Fig2E) and slow silencing (Fig 2F-H) are independent from categorisations of REs which rely on arbitrary thresholds.

      1. Could simplify secondary labels to solely YY1 and CTCF. D & F do not print in black and white. Overall the mESC versus NPC can be confusing, perhaps mESC (no diff) would be helpful?

      We have simplified the secondary labels in Fig2B and modified the colour scheme of FIg2D and Fig2F as suggested. “mESC” is now modified to “mESC no diff” in Fig2H, FigS2B, Fig3C and Fig3E to reduce the potential for confusion.

      The numbers appear to suggest YY1 is generally enriched on X, but not at promoters?? Is this true?

      The explanation for this is that clear peaks of YY1 ChIP are found at young LINE1 elements in iXist-ChrX mESCs (specifically over L1Md_T subfamilies). These elements are highly enriched (>2-fold) on the mouse X chromosome compared to autosomes (Waterston 2002), and the majority are not promoter-associated. We chose not to include a discussion of YY1 enrichment at repetitive LINE1 elements in this study primarily because of a) issues related to multiple-mapping reads, such as difficulties distinguishing ChrX vs autosomal reads, and b) the absence of strain-specific SNPs within annotated ChrX L1Md_Ts means that none of these elements are amenable to allelic analysis so we cannot compare Xi versus Xa. However, these LINE1 peaks are a significant fraction (262/521) of the numbers of YY1 ChIP-seq peaks in Fig2C.

      For Figure 2f, it might be helpful to show autosomal genes - are Fast depleted or Slow enriched for YY1 relative to autosomes?

      We have calculated these numbers as part of the analysis of gene expression on ChrX and autosomes above. Overall, the fraction of genes defined as YY1-targets is the same on ChrX as on autosomes (~0.16). Accordingly, fast-silencing genes are depleted for YY1 compared to autosomes, whereas slow-silencing genes are enriched for YY1 compared to autosomes. Fig2F is now redesigned to include the total numbers of YY1-target genes on ChrX and autosomes.

      More generally, is YY1 binding on the X lost more slowly than YY1 binding on autosomes, or is the slow loss a feature of YY1. While I agree YY1 could have direct up or down-regulatory roles, Figure S3 could also be reflecting a secondary impact.

      We agree that many of the differentially regulated genes after 52 hours of YY1 degradation could be secondary effects and have added a sentence on this to the relevant paragraph in the text.

      Figure 3, 4 and supplementary - the chromosome cartoon introduces the LOH in iXist, but this needs to be described in text. Describing the reciprocal as a biological replicate seems challenging given this LOH.

      It is true that the reciprocal lines iXist-ChrX-Dom and iXist-ChrX-Cast are not true biological replicates, and we try to avoid referring to them as such. Writing this in the legend of Fig3 was an error which we have corrected. We have now also mentioned the recombination event in the iXist-ChrX-Dom cell line at the point where data from this line is first discussed (paragraph 1 of Results section).

      For the latter parts this work (Figs 3 and 4), we made the conscious decision to proceed with two YY1-FKP12F36V cell lines from different reciprocal iXist-ChrX backgrounds (aF1 in iXist-ChrX-Dom, cC3 in iXist-ChrX-Cast), rather than “biological replicate” clones from either iXist-ChrX-Dom or iXist-ChrX-Cast. Our reasoning was to control against potential confounding effects of strain background on our experiments related to the role of YY1. Although there were some minor differences between the clones, aF1 and cC3 demonstrated essentially equivalent phenotypes in all analyses we performed.

      Could a panel of TFs be used rather than OCT4 which has its own unique properties to emphasize that YY1 is unique?

      This would indeed be worthwhile, and we did consider attempting to perform ChIP-seq for additional TFs other than OCT4 in order to collect more points of comparison for the slow rate of loss of YY1 binding to Xi. However, it is admittedly hard to identify appropriate candidate TFs in mESCs which a) have similar numbers of discrete peaks of binding in promoters and distal elements on ChrX and b) it is possible to reliably perform ChIP-seq for at sufficiently high signal:noise to allow for quantitative allelic analysis.

      We have changed the text to acknowledge that our comparison only to OCT4 limits the scope of the statements we can make about unique properties of YY1 binding.

      Figure 4 - by examining 'late' genes, a change in allelic ratio is observed, but what about escape genes (e.g. Kdm5c, Kdm6a)? Do they now become silent? It would be helpful to have all this data as a supplementary table so people could query their 'favourite' gene.

      YY1 degradation experiments performed for Figure 4 were performed on mESCs without cellular differentiation (YY1-ablated cells do not survive in our mESC to NPC differentiation protocol). In undifferentiated mESCs, silencing of the inactive X does not reach completion, and in fact all X-linked genes are residually expressed at a higher level than in equivalent timepoints of Xist induction with NPC differentiation (see Figure 4D, Bowness et al 2022). We write in the text “slow-silencing genes are residually expressed from Xi” because genes of this category account for the majority of expression under these conditions, and indeed almost all slow genes would all be classed as “escape genes” in this setting by a conventional definition of >10% residual expression from Xi (see also Figure 4D, Bowness et al 2022). Our analysis in Fig4D (of this study) includes all genes, and we share processed .txt files of allelic ratio and allelic fold changes in GEO, so querying the behaviour of a favourite gene would be easy (GSE240680).

      Incidentally, when we do perform NPC differentiation of iXist-ChrX NPC, at late stages very few genes show any expression from Xi (Ddx3x, Slc25a5, Eif2s3x and Kdm5c clearly escape, but even Kdm6a is entirely silenced). Unfortunately, with such a small number of “super” escapees it is hard to make any general conclusions, so in this study we can only make inferences about escape via the transitive property that many “slow-silencing” genes are facultative escapees in other settings without induced Xist overexpression. We now write about this consideration in the introduction and final paragraph of the main text.

      It seems surprising that loss of YY1 has no demonstrative impact on the Xa. Figure S3B suggests that over 1000 genes are significantly impacted - primarily down regulated. How many of those are X-linked? Perhaps they could be colored differently?

      For the broad-brush differential expression testing in FigS3B, we use all the ChrRNA-seq samples (6 x untreated, 6 x dTAG) as “pseudo-replicates”, disregarding any confounding effects related to induced Xist-silencing as effecting untreated and dTAG sample groups equivalently. We did specifically investigate the behaviour of X-linked genes in this volcano plot, however only a very small number of genes were differentially expressed (n=22 X-linked genes appeared significantly downregulated compared to n=4 genes upregulated). This can be seen in our analysis records uploaded to Github.

      Additionally, there is actually a minor effect of YY1 loss on expression of YY1-target genes on Xa. This can be seen in Fig4F, where the median lines of YY1-target boxes lie below the horizontal line of 0-fold change.

      Since XIST+/undifferentiated cells retain YY1, is YY1 binding sensitive to DNAme? Indeed, are X chromosome bound sites in islands that become methylated? Figure S4 shows YY1-targetted X genes in SMCHD1 knockout; can CTCF targets also be shown? While identified in Figure 2, CTCF was not examined the way YY1 was, although it has also been identified in somatic studies of genes that escape X inactivation.

      Binding of YY1 is indeed sensitive to DNA methylation; specifically it is reported to be blocked by CpG methylation (see refs (Kim et al, 2003; Makhlouf et al, 2014; Fang et al, 2019). Thus, crosstalk with the DNA methylation pathways, which deposit de novo CpG island methylation as a late event of XCI (Lock 1987, Gendrel 2012), did appeal to us as a potential mechanism of YY1 “eviction”. However, preliminary analysis we performed to investigate this revealed limited overlap between YY1 binding sites and de novo meythlated CpG islands in the iXist-ChrX model cell line.

      FigS4 presents ATAC-seq data from two iXist-ChrX SmcHD1 KO clonal cell lines, comparing the accessibility loss kinetics between YY1-binding and non-YY1 REs in these cells.

      Although FigS4 in this paper does not show genes, we have previously published ChrRNA-seq data from these SmcHD1 KO lines over a similar Xist induction + NPC differentiation time course (Figure 6, Bowness et al, 2022). A reanalysis of this ChrRNA-seq data by YY1-target vs non-target genes shows a similar trend to the accessibility data, although this is expected from the strong overlap of both “YY1-target” and “SmcHD1-dependent” genes with slow-silencing genes in our model.

      With respect to CTCF, we have performed a similar analysis of this data separating ATAC-seq peaks by CTCF-binding rather than YY1-binding. This shows a similar trend to YY1, but is overall less pronounced, and is now included in our analysis records. We have reported previously that loss of CTCF from many binding sites on Xi requires SmcHD1 (Gdula et al, 2019).

      When the authors use cf. do they simply mean see also, or as wikipedia suggests: "the cited source supports a different claim (proposition) than the one just made, that it is worthwhile to compare the two claims and assess the difference". Perhaps it would be worth spelling out to clarify for the audience.

      We used “cf.” in the text to mean “compare with”, when referring to a plot/observation/piece of data outside of the figure being immediately discussed (either in another study or different section of the paper). We were not aware of the recommendation to only use the cf abbreviation when the two items are intended to be contrasted. We do not believe this to be a universal grammatical convention, but nevertheless have changed incidences of cf. to “see also”.

      Reviewer #1 (Significance (Required)):

      General assessment: An important question in human biology is how much the sex chromosome contributes to sex differences in disease frequency. Genes that escape X inactivation in humans seem to have considerable impact on gene expression genome-wide. While there are not as many genes in mouse that escape inactivation, the use of the mESC cell differentiation approach allows detailed assessment of the timing of silencing during inactivation. The authors utilize an inter-specific cross and it would be interesting to know the limitations of such a system (in terms of informative DHS/genes that are informative).

      Advance: As the authors note, there are multiple studies of similar systems that have revealed differences in the speeds of silencing of genes. However, this is the first study to my knowledge that has then tried to assess timing with gene-specific factors. There are multiple studies in humans comparing escape and subject genes for TFs, but lacking the developmental timing that this study incorporates.

      Audience: While generally applicable to a basic research audience interested in gene regulation, the applicability to human genes that escape inactivation may interest cancer researchers or clinical audiences interested in sex differences.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors studied the molecular basis of a variation in the rate of individual gene silencing on the X undergoing inactivation. They took advantage of ATAC-seq to observe the kinetics of chromatin accessibility along the inactive X upon induction of Xist expression in mESCs. They demonstrated a clear correspondence between the decrease in chromatin accessibility and the silencing of nearby genes. Furthermore, they found that persistently accessible regulatory elements and slow-silencing were associated with binding of YY1. YY1 tended to associate longer with genes that required more time to be silenced than those that became silenced fast on the inactive X during XCI. The acute loss of YY1 facilitated silencing of slow genes in a shorter period. They suggest that whether or not the transcription factors stay associated longer is another factor that impacts the variation in the rate of gene silencing on the Xi.

      Reviewer #2 (Significance (Required)):

      It has been suggested that the rate of gene silencing during XCI is varies depending on the distance of individual genes from the Xist locus or the entry site of Xist RNA on the X, as well as their initial expression levels before silencing. This study provides another perspective on this issue. The persistent association of transcription factors during XCI affects the rate of gene silencing. Although the issued addressed here might draw attention from only the limited fields of specialists, their finding advances our understanding of how the efficiency of silencing is controlled during the process of XCI. The experimental data essentially support their conclusion, and the manuscript was easy to follow. However, I still have some comments, which I would like the authors to consider before further consideration.

      Major concerns 1. Based on the results shown in Figure 3E and F, the authors concluded that YY1 was more resistant than other TFs against the eviction from the X upon Xist induction. I am not still convinced with this. YY1 binds DNA via the zinc finger domain, while Oct4 binds DNA via the homeodomain. The difference in the binding module between them might affect their dissociation or the response to Xist RNA-mediated chromatin changes. In addition, given that YY1 has been reported to bind RNA, including Xist, as well, Oct4 might not be a good TF to compare.

      We acknowledge and agree that our singular comparison between YY1 and OCT4 is insufficient to support a general conclusion that YY1 is unique with respect to its binding properties on Xi. This was also alluded to by Reviewer #1 (see 10.), where in response we write about the difficulties of selecting other appropriate/feasible candidate TFs for ChIP-seq in order to widen the comparison beyond OCT4. In consideration of this concern, we have re-phrased our conclusions regarding this point in the text, both at the point where it is first presented (Fig3F) and in the first discussion paragraph.

      Furthermore, the difference in allelic ratio change between YY1 and OCT4 is admittedly not dramatic, and this metric can be influenced somewhat by the properties of the sets of peaks used (which is also why we have not tried to add statistical significance to this comparison in Fig3F). In order to make the comparison with OCT4 (a classic pluripotency factor), we were also limited to using mESC culture without differentiation conditions. It is possible that more pronounced differences between YY1 and other TFs would be observed under conditions where XCI is able to proceed further.

      Even so, we contend that our observation that YY1 binding is lost from the Xi relatively slowly likely stands without a requirement for a comparison with OCT4 or other transcription factors. The decrease in allelic ratio for YY1 ChIP occurs more slowly than overall loss of chromatin accessibility from REs, which is arguably a more general proxy for TF binding, and much slower than kinetics of gene silencing (Fig3D and FigS2C). In addition, no other TF motifs (except CTCF, which has its own unique properties) were found significantly enriched within persistently-accessible REs, which would be an expectation if a different factor had similar properties of late-retained Xi binding as YY1.

      Thus, overall we have tried to write the paper without overstating in isolation the importance of our claim that YY1 binding on Xi is relatively resistant to Xist-mediated inactivation, instead emphasising that it should be considered alongside the other pieces of data in the study.

      I don't think that Kinetics of YY1 eviction upon Xist induction in SmcHD1 KO cells during NSC differentiation fit the phenotype of Smchd1mutant cells. Although their previous study by Bowness et al (2022) showed that Smchd1-KO cells fail to establish complete silencing of SmcHD1-dependnet genes, their silencing still reached rather appreciable levels according to Figure 6 of Bowness et al (2022). This is, in fact, consistent with the idea that XCI initially takes place in the mutant embryos, at least to an extent that does not compromise early postimplantation development. On the other hand, a significant portion of YY1 appears to remain associated with the target genes on both active and inactive X (Figure S4), which I think suggests that the presence of YY1 is compatible with silencing of SmdHD1-dependent genes. This is contradictory to the proposed role of YY1 that sustains the expression of X-linked genes in this context.

      At any given timepoint of XCI, our data sets of gene silencing (ChrRNA-seq) consistently show a more pronounced allelic skew compared to chromatin accessibility (ATAC-seq). This behaviour is discussed in relation to Figure 1 in the text (see Results paragraph 2). We do not wish to overinterpret this quantitative difference because the assays are technically different and accessibility is not linearly correlated with gene expression. With this in consideration, we interpret the ATAC-seq data presented in Figure S4 to be fully consistent with the iXist-ChrX SmcHD1 KO ChrRNA-seq data in Figure 6 of our previous publication ie. a small increase in residual Xi gene expression from SmcHD1 KO NPCs is accompanied by a more appreciable increase in residual Xi chromatin accessibility. In line with this, it would not be contradictory for substantially increased Xi YY1 binding to sustain a quantitively small (but nonetheless meaningful) increase in residual gene expression from Xi.

      Additionally, the context in which we include this SmcHD1 KO ATAC-seq data in the current paper is to hypothesise a potential role for SmcHD1 in contributing towards the eventual removal of YY1 binding from Xi. This hypothesis is essentially based on two observations; 1.) There is substantially more residual YY1 binding to Xi in mESC no diff conditions (Figure 3) and 2.) One difference between no diff and diff conditions is absence of SmcHD1 recruitment in the former (Figure 5 in our previous study). The new SmcHD1 KO ATAC-seq data adds a third observation which supports the hypothesis - that YY1-bound REs are appreciably more accessible from Xi in SmcHD1 KO. However, none of these observations are direct evidence of a link between SmcHD1 and YY1, and more experiments would be required to substantiate this potential mechanism. If confirmed, it would be logically reasonable to suggest a role for YY1 in contributing towards the residual expression of X-linked in the context of SmcHD1 KO, but we do not yet claim this, and a potential link with SmcHD1 KO is not the main focus of the paper.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Bowness and colleagues describe the interesting finding that the transcription factor YY1 is associated with slow silencing genes in induced X-Chromosome Inactivation (XCI). The authors have conducted a comprehensive characterization of X-linked gene silencing and the loss of chromatin accessibility of regulatory elements in induced XCI in ESCs and during NPC differentiation. X-linked gene silencing was classified into four categories, ranging from fast-silenced genes to genes that escape silencing. Motif enrichment analysis of regulatory elements associated with slowly silenced genes identified YY1 as the transcription factor most significantly enriched. The separation of YY1-target and non-target genes confirmed that most genes bound by YY1 indeed exhibit slower silencing kinetics. A comparison of the binding kinetics of YY1 to another transcription factor, OCT4, during XCI revealed that YY1 is evicted more slowly compared to OCT4 on the inactive X, suggesting that slower eviction is a unique property of YY1. Conditional knock-outs of YY1 using protein degradation during induced XCI in mESCs demonstrated that the loss of YY1 at target genes enhances silencing. This supports the hypothesis that YY1 serves as a crucial barrier for slow-silenced genes during XCI. Finally, the authors propose a hypothesis regarding the mechanism of YY1 eviction, suggesting a potential connection to the role of SmcHD1 during XCI.

      The authors provide an in-depth analysis of the role of YY1 in gene silencing kinetics during induced XCI and believe this manuscript should be published if our comments are addressed.

      Major comment:

      Based on the allelic ratio in figure 3C only minor loss of YY1 binding occurs in induced XCI in mESCs on the Xi, while silencing is established properly as shown in figure 4C (left panel, red boxplots). This suggests that YY1 eviction is not necessarily required for these genes to be effectively silenced. Could the authors explain this discrepancy in the data regarding their manuscript conclusions? It seems this is true for XCI happening during differentiation towards NPCs, but not if cells are stuck in the pluripotency stage?

      Whilst indeed substantial, we do not consider the silencing seen for 6-day mESCs in Fig4C to be “established properly”. We refer to our previous publication (Figure 4 Bowness et al., 2022), which shows that silencing at equivalent timepoints under differentiation conditions (d5-d7) is significantly more pronounced (near-“complete”). Indeed, the level of silencing reached by YY1-FKBP mESCs (Xist induced but no dTAG treatment) aligns with the plateau of silencing in undifferentiated mESCs we describe in our previous study (median allelic ratio of approximately 0.1).

      We conclude that YY1 contributes somewhat to sustaining this residual expression in mESCs, because a) substantial YY1 binding remains on Xi at these timepoints in mESCs and b) silencing increases with degradation of YY1 (the latter is more direct evidence). Notably, silencing does not progress to completion (allelic ratio of 0) in the absence of YY1, so we do not claim that YY1 is the only factor sustaining residual Xi gene expression in mESCs.

      We interpret this comment to be a fundamentally similar concern to that raised by Reviewer #2 (2.), but in the context of undifferentiated mESCs rather than SmcHD1 KO. As stated above, we do not think it inherently contradictory for substantially increased Xi YY1 binding to sustain a quantitively small (but nonetheless meaningful) increase in residual gene expression from Xi.

      Minor comments:

      1. In the abstract lines 7-8, the authors state that the experiments were performed in mouse embryonic stem cell lines, but much of the data shown is acquired in NPC differentiations. Please adjust abstract.

      We have adjusted this sentence in the abstract to include that many of the experiments in the paper involved differentiation of iXist-ChrX mESCs.

      The last sentence of the abstract states that YY1 acts as a barrier to silencing but as stated in my major comment, that does seem to be the case in ESC differentiation towards NPCs, but not in ESCs themselves. Please tone down this sentence. Moreover, we do not fully understand where the 'is removed only at late stages' comes from? Is this because of the Smchd1 link? We find this link quite weak with the data presented. We would tone down that last abstract sentence.

      We have toned down the final sentence of the abstract accordingly. We agree that “removed only at late stages” is unsubstantiated since YY1 binding on Xi decreases over the entire time course (albeit slowly). However, we maintain that a connection between YY1 and late stages of the XCI process is reasonable to infer from the various pieces of evidence we provide in the study (egs YY1 is persistently enriched in accessible REs, it is associated with slow-silencing genes, and it remains bound to Xi in undifferentiated mESCs).

      Several comparisons to human XCI have been made in the article. We do agree that there are similarities between mouse and human XCI. However, there is insufficient data that substantiates that these genes are regulated in a similar manner in humans. We believe the comparisons should be removed altogether or attenuated.

      We agree that there is nothing in our data that directly pertains to human XCI. Comparisons to human are only made twice in the paper: Initially in the introduction to make a broad statement that many mechanisms of Xist function are conserved between species, and finally as speculation in the last discussion paragraph. We think it is relevant to acknowledge the parallels between our study, which links YY1 binding with resistance to Xist-silencing in a mouse ESC model, and literature describing a similar association between YY1 and XCI escape in humans.

      At bottom of page 4, the authors say that for any given gene, the allelic ration of accessibility at its promoter decreased more slowly than it silenced and then write Fig 1B. They probably mean S1C? Since 1B only shows 4 genes.

      The phrase “any given” was used colloquially (ie imprecisely), so we have replaced it with “individual”.

      Figure 1B shows the average allelic ratio of multiple clones for genes representing different silencing speeds. Each data point is the average of multiple clones for these representative genes, could the authors show the individual data points or the standard deviation?

      Fig1B predominantly shows the averages of only two replicate time-courses of Xist induction with NPC differentiation using the same parental clonal cell line, iXist-ChrX-Dom, but performed on different dates and passages. We regenerated the panel without merging the replicate data points, but this has little effect on the plot (see the Rmarkdown html file of Figure 1 on Github).

      Figure 1B. Loss of promoter accessibility lags behind loss of chromatin-associated RNA expression for these 4 genes. What about distal REs? Do the allelic ratios for the distal REs more closely follow chromatin-associated RNA expression? Could the authors show this in a supplemental figure?

      We comment from FigS1C on the general trend that accessibility decrease from Xi occurs slower than gene silencing (measured by ChrRNA-seq). We then find in FigS1D that distal elements lose accessibility slightly faster than promoters. Although overall the allelic ratio decrease of distal (non-CTCF) RE accessibility is slightly closer to the trajectory to that of gene silencing, it remains substantially slower (see again the Rmarkdown .html file of Figure 1 on Github).

      An equivalent plot to Fig1B showing distal REs would rely on our simplistic assignment of distal elements to their nearest genes. We believe this is reasonable generalisation for investigating chromosome-wide trends but unlikely to be sufficiently accurate at the level of specific genes.

      Figure 1B: gene silencing trajectory is depicted left while the legend says right. Same for promoter accessibility.

      The legend is now corrected.

      Figure S1A shows only part of the X chromosome. The area downstream of Xist is missing. Is this because the iXist-ChrXDom cell line is missing allelic resolution as shown in figure S2A? Could the authors explain in the figure legend that part of the X-Chromosome is missing?

      We have now included a reference to the recombination event in the iXist-ChrXDom cell line both when we present data from this background in the first paragraph of the Results section, and in the legend of FigS1A.

      Figure 2C shows that 94 TSSs bear a YY1 peak, yet Fig 2F shows 62 are targets of YY1. Is this because the rest are not properly silenced or are escapees?

      Fig2C shows the numbers of ChrX YY1 ATAC-seq peaks which overlap with “promoters” (ie regions +/- 500bp of a TSS). By contrast, Fig2F shows ChrX genes classified as direct YY1-targets for allelic silencing analysis. The discrepancy between these numbers is due to a number of reasons:

      1. It is possible for multiple YY1 peaks to overlap the same promoter (eg one peak overlaps 500bp upstream, a separate peak overlaps 500bp downstream).
      2. The count in Fig2C is not restrictive to one TSS per gene in cases where there are multiple transcript isoforms in the gene annotation, thus multiple YY1 peaks can overlap different promoters for the same gene.
      3. A few genes do not pass our filters for allelic silencing analysis (eg they are too lowly expressed). Some YY1 peaks may overlap these genes. We hope the revised version of Fig2F, which includes numbers of direct YY1 target genes on autosomes and ChrX, makes the distinction between these two numbers clearer.

      Moreover, YY1 has ~4-fold more peaks on the X chromosome on distal elements compared to promoters. Yet figure 2F exclusively shows the proportion of YY1 binding sites on TSSs. Would distal REs show similar proportions for the silencing categories? Could the authors show the differences in a Supplemental figure?

      As discussed in the response to Reviewer #1 (point 8.), a large fraction of distal YY1 peaks on ChrX are at LINE1 elements, which are not amenable to allelic analysis. Excluding these peaks results in a smaller number of distal elements bound by YY1. The application of our filters for allelic analysis reduces the number of distal YY1-bound REs even more, and our assignment of distal REs to their nearest gene is imprecise. For these reasons, we do not think a comparison of genes classified by whether they are putative targets of distal YY1-bound enhancers is informative.

      The authors switch between different model systems in the figures, which makes quite confusing which type of XCI is being discussed. We would like to see clearly stated above all panels which cell culture condition is being studied (mESCs or NPCs).

      We have tried to improve this potential source of confusion by modifying “mESC” to “mESC no diff” in the relevant figure panels (see response to Reviewer #1 comment 7B), and adding “in mESCs without differentiation” to the title of Figure 4.

      In Figure 3E and 3F the authors look at the binding retention of OCT4 during XCI in ESCs. However, it is not clear why the authors choose OCT4. Could the authors explain why specifically OCT4 was chosen for these analyses?

      In our responses to the other reviewers, we discuss the limitations of only having one other TF to compare to YY1. The choice of OCT4 was primarily dictated by our experience and confidence in being able to generate high quality ChIP-seq data of this factor.

      As it was essentially arbitrary for the purposes of this paper, we have added a comment to this effect in the text (“with that of a different arbitrary TF, OCT4”).

      What is the expression level of YY1 in NPCs compared to mESCS? In Supplemental S2A, it seems that YY1 protein levels decrease over time during NPC differentiation. Is part of the increased eviction a result of lower protein levels of YY1? Probably not since you calculate ratios between Xi and Xa. Can you please comment on this?

      We were similarly intrigued by this apparent decrease in YY1 protein levels in NPCs (there is no decrease on the RNA level) and initially considered if it could contribute to the relative.

      In FigS2A specifically, the d18 NPC band is probably just a poor quality sample extraction. Our ChIP-seq data generated from the same sample is similar poor compared to the others (FigS2B). In other YY1-FKBP12F36V clones we derived and characterised by Western (not described further in this study, but will likely be published as raw source data for the cropped blots we show in FigS2A), the apparent difference in YY1 protein levels in NPCs is less pronounced. Although a minor decrease in YY1 protein in NPCs seems to be robust, we do not think it relevant in the context of our analysis of YY1 and XCI, as we almost always use Xa as internal comparison for any observations made about Xi.

      On page 7 the authors state that degrading YY1 does not affect Xist spreading and/or localisation. Indeed, it has been previously shown by other groups that YY1 is required for Xist localisation during XCI. Could the authors elaborate further on the why their cells behave differently compared to the Jeon 2011 paper?

      We are working with a mouse ESC model of inducible Xist from its endogenous locus on ChrX and using the dTAG system to degrade YY1 protein. By contrast, Jeon 2011 worked with an Xist transgene integrated at random in the genome of mouse embryonic fibroblasts (MEFs) and siRNA knockdown of YY1. The difference in our observations could be linked to any of these 4 differences (ie cellular context, Xist genomic location, Xist introns, knockdown strategy), but we cannot identify a specific explanation.

      In figure 4G and figure S3D elevated levels of Xist are observed in the dTAG conditions. As the authors point out, this could then result in accelerated silencing of the X seen upon YY1 loss. Are these elevated Xist levels that result in enhanced silencing in figure 4 relevant for the kinetics of silencing? Moreover, YY1 could act as transcriptional regulator of those genes in the X and by removing YY1, one would expect decreased transcription, which would be read as accelerated silencing. The authors could see whether the genes that show accelerated silencing are regulated by YY1 in ESCs (+ dTAG, - Dox).

      We agree that these points are important to consider when interpreting the results of the YY1-FKBP12F36V ChrRNA-seq we present in Figure 4. However, we believe they are covered in the text during our discussion of the data.

      In relation to the final suggestion, the silencing of almost all X-linked genes is increased upon YY1 removal so separating a specific set of genes which show accelerated silencing would be difficult. Nevertheless, in Fig4F we report that the increases in Xi silencing are strongest for direct YY1 target genes. In fact, these genes also show a minor decrease in expression in the + dTAG - Dox condition (see response to Reviewer #1 point 12.). However, by-and-large the differences in Xa log2FCs between YY1-target and non-target genes are less statistically significant. Non-significant p-values are not shown on Fig4F, but can be found in our Rmarkdown analysis records.

      Can the authors explain why they decided to put the Smchd1 part after the conclusion? Before the conclusion would have been better? The probable link between YY1 and SmcHD1 is definitely something important to investigate.

      Supplemental FigS4 relating to SmcHD1 is more speculative and we lack direct mechanistic evidence linking YY1 and SmcHD1. It would require more experiments to substantiate this as a mechanism. We think these experiments could potentially be very interesting, but are beyond the scope of this study.

      In the paper the authors cite Bowness et al., 2022. In it, Figure 5F studies silencing times with respect to silencing dependency on SmcHD1. What is the overlap between SmcHD1 target genes and YY1 target genes? This would provide more data about the correlation between YY1 and SmcHD1.

      There is an association between YY1 target genes and our previous categories of genes based on SmcHD1 dependence (13/56 SmcHD1_dependent genes are YY1 targets compared to only 8/101 of SmchD1_not_dependent genes). However, this enrichment of YY1 targets in SmcHD1 dependent genes is not so striking to warrant inclusion into the (very short) discussion of SmcHD1 in this paper. This association is also expected from the fact that both YY1-target genes and SmcHD1-dependent genes associate with the set of slow-silencing genes.

      Of note, our categories of SmcHD1 dependency were in fact defined in a previous study (Gdula et al., 2019) from a different cellular model (SmcHD1 KO MEFs).

      The authors hypothesise that SmcHD1 might play a role in the eviction of YY1 in NPC differentiation. The current data shows impaired silencing of slow silencing genes and YY1-dependent genes in the SmcHD1 knock-out. However, it doesn't show SmcHD1 is required for YY1 eviction. Could the authors provide direct evidence for their hypothesis by performing NPC differentiation in wild type and SmcHD1 knock-out cells and investigate YY1 binding using ChIP-seq?

      The data we show in FigS4 is ATAC-seq data. It shows that YY1 target REs are particularly more accessible from the Xi in SmcHD1 KO, which is not direct evidence but does align with a potential role for SmcHD1 in mediating removal of YY1 binding from Xi (see our response to Reviewer #2’s comment 2.). We agree that YY1 ChIP-seq over the same time course would be an interesting experiment, but arguably this would also only be indirect evidence (ie increased Xi YY1 enrichment may be due to a confounding consequence of SmcHD1 KO). We therefore believe the full suite of experiments needed to rigorously test the hypothesis are beyond the scope of this paper.

      In figure S4A and S4B no significance is indicated among the different conditions across the different differentiation days. Could the authors add this?

      At all timepoints, differences of Xi accessibility between YY1-binding vs non-YY1 REs are significant. P values are now added to FigS4 and the statistical test is described in the legend.

      Finally, we would like the authors to elaborate in the conclusion about the order of events. As they correctly state at the top of page 5 (and we agree), delayed loss of promoter accessibility compared to gene silencing does not automatically mean that it is downstream of gene silencing. Can you elaborate on this? Also, in light of Fig S2C where loss of YY1 binding seems to happen after gene silencing.

      We mention in the text and in the above response to Reviewer #2 (point 2.) that we do not wish to overinterpret this quantitative difference because the assays are technically different and accessibility is not linearly correlated with gene expression.

      It is possible to speculate plausible biological explanations for this discrepancy in kinetics between accessibility loss, TF binding and gene silencing. For example, a change in the landscape of histone modifications at a promoter may have little effect on its accessibility to TFs but directly hinder RNA Polymerase II in initiation and/or elongation of transcription of the gene. However, we prefer to keep this speculation out of the main text of the paper.

      Reviewer #3 (Significance (Required)):

      This manuscript highlights a novel role for YY1 in XCI. The manuscript provides an analysis of the correlation and causation of YY1 in gene silencing during XCI. There is a clear correlation between YY1 and delayed silencing of genes on the Xi. To our knowledge, this is the first time such an analysis has been performed for YY1. It advances our conceptual and mechanistic understanding of gene silencing kinetics and what the factors involved in it are. We believe it is an important contribution to the XCI field and will be of great value to the XCI community.

      Strength:

      This study presents a comprehensive and in-depth characterization of X-linked gene silencing during XCI.

      Two different types of inducible XCI are studied and compared (ESCs vs differentiation towards NPCs), which we are grateful for.

      Systematic and stepwise analysis of the data is very strong.

      Many data points have been collected which provide stronger conclusions.

      Weakness:

      Some sentences in the abstract should be toned down.

      YY1 eviction on the inactive X doesn't seem crucial to establish X-linked gene silencing in mESCs.

      The mechanistic approach at the end of the manuscript with relation to SmcHD1 could be studied further.

      This paper will be suited for a specialised audience in XCI and transcription factor control of gene expression, i.e. basic research.

      Field of expertise: XCI, epigenetics, Xist, gene silencing, X chromosome biology.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statement We very much appreciate the reviewers' thorough comments and are sincerely grateful for their kind remarks on the novelty and interest of our manuscript. We are confident to have addressed all the points that they have raised including new data, as well as revised figures and text.

      Point-by-point description of revisions All the revisions have been already carried out and included in the transferred manuscript.

      Reviewer #1

      Major comments:

      > The number of the replicates/animals for the experiments described in Figures 1 and 2 should be reported either in the figure legends or in the methods (statistical analysis). We have added the required numbers to the corresponding revised figures, as requested.

      > A relevant part of the discussion repeats what the authors have already said in the results. I would recommend to reorganize this section, emphasizing the importance of these results in the context of human brain tumors.

      Following our own style, we have written a very short (46 lines in length!) Discussion. We dedicate a few lines to highlighting two points: (1) the suggestion, derived from our allograft experiments, that the initial stages of tumour development and long-term tumour growth may be molecularly distinct events, and (2), the unique effect of the combined loss of TrxT and dhd on mbt tumour transcriptomics -unique because none of the suppressors of mbt reported before are as effective in erasing both the MBTS and SDS mbt signatures. Neither of these points are raised in Results. In the remaining few lines we put our results in the context of human Cancer/Testis and elaborate on the fact that the TrxT and dhd pair qualify as head-to-head, CT-X genes, like those reported in human oncology. This is as far as we are willing to go at this stage at emphasizing the importance of our results in the context of human tumours.

      Reviewer #2

      > 1. Figures should include information regarding the sex of the larvae, particularly as there has been a previously reported sex-linked effect in the phenotypes analysed. (e.g. in Figure 2 and Figure S1, where Indication of the sex of the animals should be provided in the figure OK and not just in the figure legend). We fully agree. Sex must always be taken into account as a biological variable. All the experiments reported in the manuscript were carried out with sexed samples, and were annotated accordingly in the original text. In compliance with the reviewer's request we have added this information also to the revised figure.

      *> 2. Data regarding fertility. Can this be shown in a table format? Are dhdKO females fully sterile? What are the fertility levels of Df(1)J5? * Please note that we are not discovering anything here but merely corroborating what has been published before: the lack of TrxT does not affect fertility in either sex; the lack of Dhd results in female sterility (Torres-Campana et al., 2022, Tirmarche et al., 2016, Svensson et al., 2003, Pellicena-Palle et al., 1997). Adding a table would not be justified. Moreover, it would be a rather simple table: all single-pair mating tests (n=10 for each genotype) with Trxt KO and Dhd KO males, and TrxT KO females were as fertile as control flies, while all single-pair mating tests (n=10) with Dhd KO females were sterile.

      > 3. Are dhd and TrxT the only genes affected by Df(1)J5? Is there transcriptional data from Df(1)J5 animals to suggest that nearby genes are not affected by the deficiency? Of particular interest would be to assess if snf is affected or not as it is a known regulator of gene expression and splicing. Yes dhd and TrxT are the only genes affected by Df(1)J5. That is the case according to Flybase (citing Svensson et al., 2003, and Salz et al., 1994) and confirmed by our own RNAseq data. No other transcripts, including snf, are affected by Df(1)J5.

      > 4. In Figure 1C, statistical test plus indication of significance is not presented. The requested statistical test and significance data have been added as required to the revised figure and figure legend.

      > 5. Related to Figure 1D. Additional neural markers could be assessed in dhdKO and TrxTKO flies. Whilst the gross morphology of the brain does not seem to be affected, there is a possibility that cell specification is affected. Specific markers for the NE, MED and CB could be used to assess this in more detail, particularly as the DE-cad images shown for dhdKO and TrxTKO flies seem to differ slightly from the control. We believe that there may be a small misunderstanding here. We have made this point clear in the revised version by referring to substantial published data showing that expression of these two genes is restricted to the germline and that, female fertility aside, TrxT and dhd deficient flies' development and life span are perfectly normal. If anything, Figure 1D is redundant. However, we would rather keep it as a control that our CRISPR KO mutants behave as expected.

      > 6. Related to Figure 2A, images from TrxTKO; l(3)mbtts1, dhdKO and l(3)mbtts1 should be added at the very least in a supplementary figure. Additionally, data for NE/BL ratio should be provided for dhdKO, TrxTKO and Df(1)J5 in the absence of l(3)mbtts1 tumours. Related to Figure S1, quantification of NE/BL ratio for female lobes should be added to the figure. All the requested images and data have been included in the revised version in new figures Figure S2B, Figure S2A, and Figure S1A.

      > 7. Related to Figure 2B and Figure S1, three rows of images are presented for each genotype. It is unclear whether these correspond to brain lobes from different larvae or different confocal planes from the same animal. This should be clarified in the figure and/or figure legend. This point has been clarified as requested in the revised figure legend. Each group of three rows correspond to brain lobes from different larvae of the same genotype.

      > 7 cont. Related to this, in addition to the anti-DE-cadherin data, it would be informative to include immunofluorescence data using antibodies such as anti-Dachshund (lamina), anti-Elav (medulla cortex) and anti-Prospero (central brain and boundary between central brain and medulla cortex) (as assessed in e.g. Zhou and Luo, J Neurosci 2013) in the mbt tumour situation to accurately describe regions disrupted by the tumours. There is no denying that taking advantage of the many cell-type specific markers that are readily available in Drosophila could be of interest. The same applies to cell cycle markers like PH3, FUCCI, and many others. However, we believe that interesting as they may be, none of this markers will give us the clue on the molecular basis of TrxT and Dhd tumour function that is, of course, the open burning question that we are trying to address now.

      > 8. Authors should clarify how the NE was defined when mbt tumours are generated, as it is severely affected. From the images provided, it is unclear which region corresponds to NE or how the NE/BL ratio was measured. It would be helpful to outline these regions in the images or, as mentioned above, use antibodies to define them. The figure has been modified to include the requested outlines defining the NE that indeed is correspond to the channel showing DE-Cadh staining.

      > 9. Figure 2C does not have indication of statistical significance for the comparisons stated in the text. Potential explanations for the different roles of Dhd and TrxT in long-term tumour development should be explored in the discussion. The requested statistical significance data for these comparisons were stated in the second last paragraph of that section. To make these data more prominent we have also added this information to revised Figure 2C.

      >9 cont. Related to this, does the analysis of the RNA-seq data from TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1 animals reveal why they have similar effect on mbt tumour development but do not synergistically contribute to long-term growth? Unfortunately our analysis of the RNA-seq data from TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1 animals does not give us any clue that could help us understand why they have similar effect on mbt tumour development, but not in long-term growth (allografts). To further explore this point, we have added new Figure S3 that includes a Venn diagramme showing the overlap between the affected mMBTS genes in TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1, together with the lists of enriched GOs among overlapping and non-overlapping genes. GO differences are tantalising, indeed, However, they do not immediately suggest any direct explanation for the different roles of Dhd and TrxT in long-term tumour development.

      > 10. Authors should clarify if there is any overlap between the affected M-tSDS and F-tSDS in the TrxTKO; l(3)mbtts1 and dhdKO; l(3)mbtts1 conditions. Would the limited overlap suggest that TrxT and dhd act in parallel rather than synergistically? This might also explain the differential effects on long-term tumour development. Additionally, the stronger effect observed in Df(1)J5 animals may be due to TrxT and dhd functional redundancy. Currently, there is limited evidence to suggest that TrxT and dhd act synergistically to regulate mbt tumour growth based on the presented data. See below.

      > 11. Authors should include a Venn diagram depicting affected genes (M-tSDS and F-tSDS) in the TrxTKO; l(3)mbtts1, dhdKO; l(3)mbtts1 and Df(1)J5; l(3)mbtts1 genotypes as this could clarify the percentage of overlap of gene signatures in these different conditions. Related to this point, authors could provide results from GO analysis to investigate whether specific functional clusters are altered in the different conditions. We have taken the liberty of fusing points 10 and 11 that are conceptually similar. The requested Venn diagrams showing the overlap between the affected M-tSDS and F-tSDS genes in the TrxTKO; l(3)mbtts1, dhdKO; l(3)mbtts1, and Df(1)J5; l(3)mbtts1 conditions, and GO analysis are now shown in new Figure S5. Unfortunately, these new data do not suggest any obvious explanation for the differential effects of these two genes, nor do they allow us to derive any further conclusions regarding the nature of the pathways through which TrxT and dhd cooperate to sustain mbt tumour growth. However, our analyses demonstrate that efficient suppression of mbt phenotypic traits (in larval brains) and transcriptome requires the combined elimination of both germline thioredoxins, while the effect of individual removal of either of them is only partial. These data demonstrate the synergistic nature of TrxT and dhd function in mbt tumour growth.

      > 12. In Figure 3E, authors should indicate more explicitly in the figure panel and/or figure legend which genes display significant differences in expression in the different samples. We apologise for not having made this point clear in the original version: All (21) genes shown in this Table are significantly downregulated in DfJ5;ts1 vs ts1. From these, nanos and Ocho are also significantly downregulated in TrxTKO;ts1 vs ts1, and Ocho, HP1D3csd, hlk, fj, Lcp9, CG43394, and CG14968 are significantly downregulated in dhdKO;ts1 vs ts1. These data have been included in the revised figure legend. Data on all other comparisons are included in Table S1.

      > 13. In Figure S2C-F it is not clear if the graphs represent data from all tissues or data from male and female tissues separately, as shown in Figure 4. Apologies for the confusion. All samples were from male tissues as indicated in the original figure legend. To make it more clear, we have labelled all four panels in the revised figure.

      > 14. Are TrxT and dhd also deregulated in other tumour types? Or is this specific for mbt tumours? This information could be provided to enhance the scope of the manuscript. Thank you for raising this point. TrxT and dhd are not dysregulated in the other tumour types that were analysed in Janic et al., 2010 (i.e pros, mira, brat, lgl and pins).

      > 15. Authors conclude that TrxT and dhd cooperate in controlling gene expression between wild-type and tumour samples and that they act synergistically in the regulation of sex-linked gene expression in male tumour tissue. However, the link between the two observations (if indeed there is a link) has not been well explained. Is the effect on gene expression in tumours simply a result of the regulation of sex-linked transcription? Our data show that TrxT and dhd synergistically contribute to the emergence of both the MBTS (i.e tumour versus wild type) and SDS (i.e. male tumour versus female tumour). The only certainty at this time regarding the interconnection between both signatures is that they overlap, but only partially, which answers one the questions raised by the reviewer: the effect on gene expression in tumours is not simply a result of the regulation of sex-linked transcription. Beyond that, the link (if indeed there is a link) between these two signatures has not been investigated. The lack of insight on this issue is not surprising taking into account that, in contrast to classical tumour signatures (tumour versus healthy tissue), the concept of sex-linked tumour signatures is relatively new and only a handful of such signatures have been published. Moreover, the vast majority of classical tumour signatures have not been worked out in a sex-dependent manner.

      Reviewer #3 Comments: > - In the first section of the results, as a first step to study the role of TrxT and dhd genes on mbt tumors the authors generate CRISPR knock outs of these genes and correctly validate them. However, afterwards, the experiment where the authors test the KO of these genes in a wild-type larva brain is not contextualized with the rest of the section. It might be best to first address the role of these genes in a tumor context and only then complement with the experiments in wild-type (in supplementary material). We do appreciate the reviewer's view, but respectfully disagree. In our opinion, the manuscript flows better by presenting the tools that we have generated in Figure 1, By corroborating published data showing that these two germline genes do not affect soma development (Torres-Campana et al., 2022, Tirmarche et al., 2016, Svensson et al., 2003, Pellicena-Palle et al., 1997) this first figure not only validates our CRISPR KO mutants, but also sets the stage to highlight their significant effect on a somatic tumour like mbt.

      > - Fig 2 B - To back up the quantifications in Fig 2A the authors could include images of l(3)mbt ts1 tumors with TrxT KO and dhd KO also. The requested images are shown in new figure Figure S2B.

      > Fig 2 B and C - Indeed, the results suggest that TrxT seems to be responsible for most tumor lethality upon l(3)mbt allografts, but not dhd. This is curious since l(3)mbt; dhd KO brain tumors have the same partial phenotype as l(3)mbt; TrxT KO (fig 1A). It would be interesting to further explore these phenotypes by staining l(3)mbt; TrxT KO and l(3)mbt; dhd KO brains with, for instance, PH3 to understand if the number of dividing cells of these tumors could be different. In addition, to back up this information, the authors could look at what happens to l(3)mbt tumors with TrxT KO and dhd KO at a later stage of development (or to larva or pupa lethality if that is the case) and compare it with l(3)mbt brains. We did explore the possibility of looking at later stages. Unfortunately, the onset of the lethality phase compounded by major tissue reshaping from larval to adult brain make these stages unsuitable to reach any meaningful conclusion. With regards to staining for PH3, we think that like FUCCI and a long list of other useful labels that could be explored, it is potentially interesting, but hardly likely to give us the clue on the molecular basis of TrxT and Dhd tumour function, that is of course the one important question that we are addressing now.

      > - Fig 2 B - What happens to the medulla in a l(3)mbt brain tumor? Although the ratio of NE/BL is the same for wild-type and D(1)J5; l(3)mbt, it still seems that the medulla in D(1)J5; l(3)mbt brains is substantially bigger, although quantifications would be required. Do the authors know if the NE in D(1)J5; l(3)mbt brains is either proliferating less or differentiating more? There are no significant differences in medulla/BL nor in CB/BL ratios. The corresponding quantifications have been added to the revised version. As for the question on proliferation versus differentiation, the simple answer is that we do not know.

      > Figure S1 - Although the effects of TrxT KO and dhd KO in male mbt tumors seem to be enhanced in relation to female tumors, the authors should include some form of tumor quantification for female tumors like in Fig 2 A. We have carried out the requested quantifications and added the results in a new panel in revised Figure S1A.

      Moreover in the 2nd section of the results, relative to Fig 1S in "...Df(1)J5; l(3)mbtts1 female larvae although given the much less severe phenotype of female mbt tumours, the effect caused by Df(1)J5 is quantitatively minor." to say "quantitatively" minor, the authors should include not only quantifications, but a form of comparison between female tumors vs. male tumors. The requested quantification was published in Molnar et al., 2019. However, we agree on the convenience of doing it again with our new samples. The new data, that confirm published results, are now shown as a new panel in revised Figure S1C.

      > - Fig 3D - The hierarchical clustering was done according to which parameters? A brief explanation could help a better interpretation of this results section. The requested information has been added to the Methods section. Hierarchical clustering was done using the function heatmap.2 in R to generates a plot in which samples (columns) are clustered (dendogram); genes (rows) are scaled by “rows"; distance = Euclidean; and hclust method = complete linkage. Expression levels are reported as Row Z-score.

      > - Fig 3D - It could be beneficial for the authors to include an analysis of the downregulated genes shared between TrxT KO mbt tumors and dhd KO mbt tumors, as well as the genes that are not shared (besides MBTS genes). Could be something like a Venn diagram. Thanks for pointing this out. New Figure S3 shows the requested Venn diagram, as well as the list of enriched GOs for each group.There are no enriched GOs in the list of overlapping genes. TrxTKO; l(3)mbtts1-specific genes are enriched for GOs related to game generation, sexual reproduction, germ cell development and simlar GOs. dhdKO; l(3)mbtts1 -specific genes are enriched for GOs related to chitin, molting and cuticle development. Tantalising as they are, these observations do not immediately suggest any direct explanation for the different roles of Dhd and TrxT in long-term tumour development. We are happy to add these supplemental information, but we do not deem it worth of any further discussion at this point.

      > - Results section 3 - "Expression of nanos is also significantly down-regulated upon TrxT loss, but remains unaffected by loss of dhd" - to corroborate the idea that TrxT and dhd work as a pair, but contribute to different functions within the tumor, it would be interesting for the authors to do an allograft experiment of dhd KO; l(3)mbt male tissue with nanos knock down in the brain, if genetically possible. The suggested experiment is published. The gene in question (nanos) is a suppressor of mbt tumour growth: In a nanos knock down background, l(3)mbt allografts do not grow (Janic 2010).

      Minor comments: * > - In the first section of the results, the authors claim that "Consistent with the reported phenotypes of Df(1)J5...", but then the study is not mentioned.* The corresponding references (Salz et al., 1994; Svensson et al., 2003; Tirmarche et al., 2016) have been added.

      > - Fig 1 B - It is a bit confusing to follow where TrxT and dhd are in the Genome browser view. I am guessing we should follow the TrxT-dhd locus from A, but the authors could make it clearer. Figure 1 has been changed to make this point more clear.

      > - In the same section, in the next sentence, the homozygous and hemizygous is a bit confusing. "...homozygous TrxTKO females, dhdKO males, and TrxTKO males", should be corrected. We appreciate the suggestion, but would rather stick to classical terminology and refer to KO/KO females as homozygous and to KO/Y males as hemizygous.

      >- In the same section (Fig 1C): "RNA-seq data also shows that TrxT is significantly upregulated in l(3)mbtts1 males compared to females (FC=7.06; FDR=1.10E-44) while dhd is not (FC=1.89; FDR=2.00E-14)." - But dhd is nevertheless upregulated, although less, in l3mbt males, right? The authors might need to rephrase. We refer to comparing males versus females, not wild type versus tumours. The text has been rephrased in the revised version to make this point clear.

      > - Fig 2 A (quantifications), should be after the confocal images (Fig 2 B). We respectfully disagree on this minor point. We initially organised this figure in the order recommended by the reviewer, but we eventually found it easier to write the article using the order shown in the submitted figure. We would rather stick to this version.

      > - Fig 2 B and Fig S1 - Please include an outline of at least neuroepithelia and, if possible, Central brain or medulla so that these regions can more clearly identified. Moreover, these results will be easier to interpret if you add a male symbol in this image and a female symbol in Figure S1, otherwise, it might seem like the same figure Outlines and symbols have been added to the revised figure, as required.

      > - In results, section 2, "Consequently, in spite of the strong sex dimorphism of mbt tumours, the phenotype of Df(1)J5; l(3)mbtts1 larval brains is not sexually dimorph" - to back this up, quantifications of Df(1)J5; l(3)mbtts1 female vs male tumor size, as well as statistical analysis are needed, like previously said. The requested the new data is now shown in revised Figure S1C.

      > - In results section 2 - "For allografts derived from, female larvae, we found that differences in lethality rate caused by TrxTKO; l(3)mbtts1, dhdKO; l(3)mbtts1, Df(1)J5; l(3)mbtts1, and l(3)mbtts1 tissues (7-23%) were not significant (Figure 2C)" - there is no statistical analysis to conclude that the lethality rate is not significant, from 7% to 23% still seems like a difference. Thanks for pointing this out. We did of course generate the requested statistical analysis data, but failed to include it in the manuscript. Chi-square statistical test gives a p value=0.2346. These data have been added to the revised version.

      > - Last paragraph of section 2 of results - very long and confusing sentence. Please rephrase text. We have rephrased this sentence to make it shorter and clearer.

      > - On section 3 of results: "The vas, piwi and CG15930 transcripts are not significantly down-regulated following either TrxT or dhd depletion alone." - in Fig 3E, not only these transcripts seem to suffer a slight downregulation, but there is also no statistical analysis supporting this. There seems to be a misunderstanding here. The requested statistical data for each gene were shown in Table S1

      > - First paragraph of section 3 results - the first sentence is written in a confusing way. Moreover, more context is needed in the sentence afterwards: "we first focused on transcripts that are up-regulated in male mbt tumour samples compared to male wild-type larval brains (mMBTS)." but using which data? The RNA seq data? Agreed; this paragraph has been amended in the revised version.

      > - Brief conclusion missing on the second paragraph of the last section of results. As far as the results presented in this paragraph are concerned, we can only mention the two potentially interesting observations, which were pointed out in the original version: (i) the suggestion that nanos upregulation could be critical for sustained mbt tumour growth upon allograft, and (ii) the fact that three genes (vas, piwi and CG15930), also known to be required for mbt tumour growth, are downregulated in Df(1)J5; l(3)mbtts1, but remain unaffected following either TrxT or dhd depletion alone. We are unable to derive any other conclusion from these observations.

      > - In the end of 3rd paragraph of last section of results: "...M-tSDS and F-tSDS genes is partially reduced in l(3)mbtts1 brains lacking either TrxT or dhd, but it is completely suppressed upon the lack of both." - "completely" might not be a correct word to use in this case, as there is still some small differences As requested, we have changed "completely" for "strongly".

      > - 4th paragraph of last section of results: Either mention the male results and then female (to be in order with the figure, as the female graphs come after the male graphs) or change the order in the figure. Also, this paragraph is not very clear, could benefit from a better explanation of the results and conclusions. Point taken. Figure 4 has been changed and female graphs come before male graphs. The paragraph is clearer now. The conclusion from this paragraph is included in the final paragraph of this section.

      > - Fig 4 C,D,E,F: to make it more clear, please write the name of the genotypes in question in the figure. At the reviewer's request, the genotypes in question are now written in each panel. Please note that we did not do so before because all four panels correspond to the same genotype: Df(J5); l(3)mbtts1 vs l(3)mbtts1, as we mentioned in the original figure legend.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      The paper by Yammine et al addresses a major problem in peculiarities of genotype to phenotype manifestation in collagen II and chondrodysplasia. It is a lucid and comprehensive study detailing what they see as the fundamental mechanism of Gly1170 Ser mutated Col2a1 gene.

      At the heart of the matter is the debunking of the results from a mouse model generated by liang et al (Plos one 2014) paper in which the authors suggested that the phenotype seen only homozygous mice (heterozygous mice appear normal), was related to ER stress -UPR-apoptosis cascade resulting in the chondrodysplasia. Yammine et al paper uses a different model, a robust human iPSC-based tissue, with a CRISPRed variant show that despite the ability of the variant chondrocytes to deposit a Gly1170Ser-substituted collagen II in both the hetero- and homozygous models, is not accompanied by any substantive UPR. The authors of this current paper also argue that their model system is most closely resemble the human context, where heterozygous individual show pathology.

      We appreciate the Reviewer highlighting the significance of this manuscript addressing a major issue in the field.

      I have sieved all the data related to this topic and have gone back to examine the data and what struck me was the repeated use of the phrase "slow to fold" in the current paper and wondered whether the element of "TIME" is as important in the chondrogenesis of either models and it is this element that generate the difference between the two results? While iPSC-based tissue takes up to 44 days, a female mouse would have had two litters in this time and made many more growth plates. Could it be by "slowing" the chondrogenesis pathway, which is part of the procedure of differentiation of iPS cells into chondrocytes, the ER is not as "stressed" as in mouse development? I would like the authors to reflect and comment and put forward their view given UPR signaling pathways play a crucial role in chondrocytes in phases of high protein synthesis, e.g., during bone development by endochondral ossification (Journal of Bone Metabolism, vol. 24, no. 2, pp. 75-82, 2017).

      The Reviewer here emphasizes a likely benefit of the human model that we had not previously considered, as differentiation and growth in our model are indeed far more similar to humans than the rapid timeline in mice. We also note that the evidence that collagen is slow to fold comes from the gold standard assay in the field for collagen folding rate, a point discussed in greater detail in response to Reviewer 2’s query (see below).

      The literature evidence does indicate that transient UPR signaling is relevant for chondrogenesis. We selected UPR timepoints that do not interface with the differentiation process, but rather the tissue deposition process while chondrocytes are still actively depositing and maintaining the extracellular matrix, to be able to distinguish a physiological transient UPR during differentiation from a potential chronic and possibly pathologic one. We now clarify this point in the manuscript (see text below).

      “These timepoints were selected to reflect an early and a late stage of cartilage maturation, but with both timepoints harvested post-chondrogenesis so as not to interfere with the physiologic transient UPR activation that can be important in that process.”

      This is not withstanding the good argument given by the authors in defending their robust results, namely that the there is no evidence that the hydrophilic triple-helical domain of pro-collagen binds BiP, the main detector of accumulated misfolded proteins. What then do they make out of the immunostaining and qPCR with ER stress related genes in Liang et al paper?? I know that the data is not theirs but a comment on the indisputable data gives the reader a better understanding.

      It is critical to note that the evidence for ER stress that induces the UPR is, at best, exceptionally weak for heterozygotes in the Liang et al paper, and arguably also weak for homozygotes. Liang et al observed, via quantitative PCR, that the mRNA levels of just Chop (which is also a marker of the integrated stress response and not a good readout for UPR activity) and ATF6 (whose RNA-level upregulation is not a standard marker of the UPR) were significantly upregulated in the disease-relevant heterozygous mice – given that other (more valid) UPR markers were not altered, this is not so different from our observation of a lack of UPR in the disease-relevant heterozygotes.

      A somewhat more comprehensive set of UPR markers, including Chop, Xbp1(Total and Spliced), Grp78 (BiP), ATF4, and ATF6, was significantly upregulated only in homozygous mice compared to wild-type. PERK is one of many kinases upstream of ATF4 and Chop that can be activated by a variety of processes (the pathway is part of the integrated stress response, for example). Moreover, transcriptional upregulation of ATF4 (which is actually induced translationally, not transcriptionally) and ATF6 (which is actually induced proteolytically, not transcriptionally) are not normally used to read out UPR activation, so it is not so clear to us that a robust UPR was induced even in homozygotes. Moreover, there was not a substantial increase in Xbp1-S (S = spliced) relative to Xbp1-T (T= total) in the study of homozygous mice, which is the most appropriate measure of UPR activation – rather than change in Xbp1-T and Xbp1-S. The use of mostly non-standard genes to assess UPR induction, the weak upregulation of BiP (2.5-fold), and the unchanged ratio of Xbp1-S to Xbp1-T raise some questions regarding UPR induction even in the homozygotes. Regardless of these homozygote data, as noted above, the evidence for a UPR in heterozygotes is very weak, despite ER stress being the focus of the Liang et al paper.

      With respect to immunostaining, Liang et al observed that tissue from homozygous mice (but not heterozygotes) contained significantly more apoptotic cells. Apoptosis could be a result of chronic, unresolved UPR signaling, but it could also result from any number of other pathways and is certainly not direct evidence for UPR-inducing ER stress. Additionally, for the homozygote apoptosis assay, Liang et al do not note how many mice were analyzed for each genotype, a value they did report for their other assays. While examining multiple sections for each genotype is valuable (they state ≥10), the assessment of biological replicates (additional mice) seems critical to confidently reach a conclusion.

      Although I understand the choice of cell lines for overexpression, the transfection of the HT-1080 cells using wild-type and Gly1170Ser COL2A1encoding plasmids are not a match to the in vivo model (variation of efficiency, etc.) the appearance of BiP even at a lower fold increase does not negate ER stress, as the authors acknowledge but more important is what other paracrine signals which triggers the UPR signally pathway which is not linked to BiP? or an iPS system may lack? Is there anything else not only ATF6α (activating transcription factor 6 alpha), but IRE1α (inositol-requiring enzyme 1 alpha), and PERK (protein kinase RNA-like endoplasmic reticulum kinase).

      Our finding that the UPR is not activated is based on comprehensive RNA-sequencing performed in the physiologically more relevant iPSC-derived chondrocyte, as opposed to the tumor cell line HT-1080. Our interactomic finding that BiP interacts to the same extent with wild-type and Gly1170Ser procollagen-II (in HT-1080 cells) strongly supports our proposal that the reason the UPR is not activated is that BiP fails to recognize unfolded triple-helical domains.

      We note that, although HT-1080 cells are not a perfect match, they are the most accessible option for interactome-based studies. Because there is no MS-grade antibody for collagen-II IP, we need to IP a transfected, tagged collagen. We cannot do this in chondronoids, or in isolated chondrocytes that transfect poorly and rapidly dedifferentiate. Critically, Prockop and co-workers extensively validated HT-1080 cells as a platform for fibrillar collagen biochemical studies in Matrix 1993, 13, 399. Our own lab further characterized their capacity to properly handle fibrillar collagen variants in great molecular detail in ACS Chem Biol 2016, 11__, __1408.

      Since our chondronoid system contains only chondrocyte cells, as is the case in cartilage, the cells can receive paracrine signals from other chondrocytes, but not other cell types. In joints within a whole animal, it is true that paracrine crosstalk occurs between different cell types of different tissues, including inflammatory cells for example. The chondronoid is very useful for elucidating the defects that occur at the chondrocyte-level, without confounding secondary effects. At the chondrocyte-level, Gly1170Ser-substituted procollagen-II does not activate the UPR.

      The Reviewer’s comment regarding the absence of paracrine signals in an iPSC-based system is well-taken, and we added discussion as follows:

      “These observations indicate that the chondrocytes were not raising such stress responses, at least when examined in the absence of paracrine signals from other cell types in the joint.”

      The authors have given us plenty of alternatives that are relevant, and they prepared us for yet another paper on articular cartilage using iPS tissue model which I am looking forward to.

      We are also excited about the upcoming potential of this model system!

      Significance

      I think this paper is publishable and it is important in understanding the mechanism by which mutation in collagen type II affect chondrogenesis and therefore bone formation. This paper will appeal to musculoskeletal scientist especially those who are interested in bone and its pathology. It would be important for the authors to respond to the critique of "TIME" and speed of protein synthesis which create a duress in the ER pathway.

      We greatly appreciate the Reviewer’s comment again on the significance of this work, and their scholarly input which has substantially improved the paper. We hope they will agree that the manuscript is now ready for publication.

      Reviewer #2

      Evidence, reproducibility and clarity

      *System: The investigators have used a human iPSC chondrocyte model system to investigate the biochemistry of the Chondrodysplasia caused by the p.Gly1170Ser mutation in the type II collagen gene (COL2A1). They studied presumably homogeneous chondronoids formed by 3 cell lines they previously reported in which the chondrocytes were either homozygous wild type for the gene, homozygous for the Cas Crispr induced mutation or heterozygous for the two alleles (their refs 42-45). In addition, they utilized cultured HT1080 human fibrosarcoma cells transfected with wild type and mutant Col2A1 to study differences in the interactomes of the two proteins.

      *

      *Analytic Parameters: They investigated the extracellular matrix formed by the three cells using collagen and proteoglycan staining and TEM and the transcriptional responses in chondronoids expressing the wild type and mutant genes.

      *

      *Observations: matrix formation was defective in the two mutation bearing cell populations, reflecting defective fibril formation proportional to the abnormal gene dose. They found increased accumulations of post-translational modifications (hydroxylation, and O-glycosylation) on the mutant collagen extracted from the chondronoids and EM evidence of collagen retention in the ER. They studied the comparative transcriptional profiles in the three phenotypes and failed to find a profound UPR response late in culture and only a mild upregulation of UPR genes in the young cultures. They could not find evidence for activation of the ISR except in the homozygous mutant cells.

      *

      *Using transfected HT-1080 cells (previously shown by these investigators not to express endogenous pro-collagen II but able to synthesize transfected pro-collagen genes) they were able to study the comparative wt and mutant pro-collagen interactomes.

      *

      Conclusions: They conclude that the p.gly1170ser mutation in Col2A1 results in abnormal folding which results in trapping of the protein in the ER and some interaction with cellular elements of the proteostatic response. They concluded that the cellular proteostasis machinery can recognize slow-folding Gly1170Ser through increased interactions with certain ER network components but not in the same fashion that has been described for liver cells producing mutated versions of high volume secreted proteins.

      We appreciate this careful summary of our work.

      *Major comments:

      *

      Their first conclusion, stated in the abstract, "Biochemical characterization reveals that Gly1170Ser procollagen-II is notably slow to fold and secrete." that the mutant polypeptide chain is slower folding than the wild type chain is based on the premise that the longer the chains are in the ER the greater the degree of lysine hydroxylation and O-glycosylation. Although this may be true, they do not provide a reference and I could not find a definitive description of the phenomenon. Their reference 48 only discusses the occurrence of intracellular post-translational modification of the lysines and continuing modification extracellularly but does not relate these phenomena to the rate at which the peptides traverse the cell. I think the reader would benefit from seeing experiments in which the rate of folding and secretion of the wild type and mutant chains are measured and the degree of post-translational modification are compared. Cabral WA et al showed differences in collagen folding and secretion rates in cyclophilin wt, knockouts and heterozygotes osteoblasts and fibroblasts by western blots. (2014) Abnormal Type I Collagen Post-translational Modification and Crosslinking in a Cyclophilin B KO Mouse Model of Recessive Osteogenesis Imperfecta. PLoS Genet 10(6): e1004465. doi:10.1371 / journal.pgen. 1004465). Performing such experiments in their chondronoids would confirm the authors' interpretation that the increased post-translational modification portrayed in their figure 4 reflects slowed folding and secretion related to the mutation.

      We apologize for failing to provide essential background references and information to assess our assay for slow folding/secretion of procollagen. In fact, slow migration on SDS-PAGE is not only a widely used assay for comparing the rate of folding of procollagens, it has also remained the gold standard in the field for the past forty years. The studies cited below are some of the seminal papers in the field linking collagen’s rate of folding with its extent of posttranslational modifications and its electrophoretic mobility. We have now updated our citations accordingly.

      1. Bateman, J.F.; Mascara, T.; Chan, D.; Cole, W.G. “Abnormal type I collagen metabolism by cultured fibroblasts in lethal perinatal osteogenesis imperfecta” Biochem J 1984, 217, 103.
      2. Bonadio, J.; Holbrook, K.A.; Gelinas, R.E.; Jacob, J.; Byers, P.H. “Altered triple helical structure of type I procollagen in lethal perinatal osteogenesis imperfecta” J Biol Chem 1985, 260, 1734.
      3. Bateman, J.F.; Chan, D.; Mascara, T.; Rogers, J.G.; Cole, W.G. “Collagen defects in lethal perinatal osteogenesis imperfecta” Biochem J 1986, 240, 699.
      4. Godfrey, M.; Hollister, D.W. “Type II achondrogenesis-hypochondrogenesis: Identification of abnormal type II collagen” Am J Hum Genet 1988, 43, 904. The basis for this collagen-specific assay of folding rate is that the ER-localized procollagen proline and lysine hydroxylases require monomeric collagen strands as substrates, and cannot accommodate a folded triple helix in their active sites. Thus, accumulation of post-translational modifications on collagen depends on the procollagen triple-helical domain’s residence time as an unfolded monomeric region of the assembling triple-helical trimer within the ER. Some fraction of the hydroxylated lysines are later glycosylated, which slows migration on SDS-PAGE gels. We have now clarified our slow folding conclusion with more precise references and discussion in the manuscript.

      Pulse-chase experiments like those suggested by the Reviewer would indeed be beneficial if they were possible in this system, but they simply are not. Although it might be possible to soak in a radiolabeled amino acid over a short time period, the assay still relies on separating the cell fraction from the secreted fraction. This is possible in monolayer cultures, but in a chondronoid composed of complex cartilage and cells we have no way to do it. One could propose that we extract the chondrocytes and then do the pulse-chase in a monolayer culture, but this unfortunately is also not possible as chondrocytes do not behave well outside the tissue setting and rapidly differentiate into other cell types. Fortunately, the procollagen overmodification assay is a widely used and well-accepted measure of slow folding, and thus addresses the issue.

      I think Figure 4 needs more explanation for the reader. While, as expected, the homozygous mutant band is much slower than the homozygous wild type band, in the heterozygotes the band is intermediate rather than showing a discrete mixture of wild type and mutant proteins, reflecting different degrees of post-translational modification. Is this a function of mixed triple helices with heterogeneous degrees of post-translational modification? It deserves more comment, since the argument relating the degree of post-translational modification to the rate of folding is dependent on this observation. It would also be helpful to show the whole gel with collagen II markers.

      We modified Figure 4 __to show the whole gel (in the SI, see __Fig. S3) and molecular weight markers. It also shows the wild-type collagen-II band. Most of the procollagen produced by the heterozygote is heterotrimeric for the disease-causing substitution (>87% of trimers will contain at least one mutant chain and thus experience delayed folding) and, therefore, the diffuse banding structure is to be expected. Further, we would speculate that in these challenged ER, even the folding of wild-type only trimers is impaired. The Reviewer’s comment suggests there may be some basis for that speculation. We added a note to this effect.

      “The presence of a single broad, slow-migrating band as opposed to distinctive overmodified mutant versus normally modified wild-type strands is due to fact that the vast majority of trimers formed in heterozygotes (>85%) contain at least one Gly1170Ser strand that delays triple-helix folding.”

      Another approach to the question of intracellular accumulation due to a slow rate of folding of the mutant collagen would be to perform pulse chase labeling of the three types of chondronoids with radiolabeled amino acids and sugars and processing the media and lysates with analysis using antibodies specific for the two collagen chain types. Given the authors extensive experience in studying collagen biosynthesis (e.g. Chan et al J. Biochem. Biophys. Methods 36 (1997) 11-29), such a supporting study would firmly establish whether the rate of folding/secretion differs between the wt and the homozygous and heterozygous chondroidinomas. Until the slow folding can be directly demonstrated in a quantitative fashion rather than by monitoring the secondary phenomenon of post-translational modification the hypothesis remains unproven.

      Discussed above in response to the Reviewer’s earlier suggestion of pulse-chase and question regarding the post-translational modification assay, unfortunately the pulse-chase experiment is infeasible. Fortunately, the modification-based assay is already the gold standard in the collagen field.

      Another issue that does not appear to be addressed is the consequence of having misfolded collagen chains in the dilated ER. Liang et al, using mice transgenic for one or two copies of the mutant human gene showed apoptosis in the homozygotes but not in the hets a finding similar to that of Kimura et al using transgenics carrying a different human COL2A1 mutation. Okada et al, using chondrocytes converted from human fibroblasts with clinical collagenopathy (heterozygous), although not the same mutation as in the present study, showed dilated ER and some level of apoptosis in the cultured cells. Hintze et al, examining chondrocytes expressing different mutants associated with different forms of spondyloepiphyseal dysplasia, suggested that the degree of stability of the mutations might determine whether apoptosis occurred, i.e. the thermolabile p.R989C was associated with apoptosis while cells expressing the more thermostable mutants p.275C, P.719C and p.G853E did not reveal any evidence for ongoing apoptosis R989. Is it possible that the smaller size of the homozygous chondronoids reflect fewer cells rather than less matrix (or both) as result of apoptosis? Examination of the chondronoids with reagents for caspase 3 or Tunel staining. One could also measure by Col/DNA ratio in wt, hets and homos. It might also have been useful for these experiments been more quantitative, i.e. by cell sorting rather than by eye. Would ImageJ software been helpful?

      We greatly appreciate this suggestion. We now added results of TUNEL assays performed on sections of the chondronoids (see Fig. 8), including quantification of the results. Notably, we do not observe a significant difference in apoptosis between genotypes at the timepoint considered. This result is also supported by our transcriptional data, where we do not observe upregulation of apoptosis-related pathways, via the UPR or otherwise.

      It is also unclear as to the conformation of chains trapped in the ER. There are many examples in which the natural tendency of misfolded proteins is to aggregate. This is certainly true in the neurodegenerative diseases. While at the magnification used here in the TEM's the ER inclusions appear homogeneous and amorphous, perhaps at higher magnification/resolution a more discrete structure might be seen.

      From collagen-II immunohistochemistry confocal images, the intracellular collagen appears sometimes as aggregated puncta, and in other cases more diffuse and amorphous. Given this heterogeneity, we were not able to readily obtain clear additional structural characterization of the intracellular procollagen-II fraction.

      *While the choice of time points for the transcriptional analysis, i.e. early and late seems well thought out, the lack of a significant response may be due to the timing and it might have been useful to do earlier or later time points or intermediate time points in case the response was transient, particularly since other laboratories have reported UPR activation and abnormalities in the context of the silencing of Xbp1, the spliced form of which is a major driver of at least one arm of the UPR. *

      While our RNA-sequencing results at the specific timepoints we chose cannot rule out a transient activation of the UPR, they do indicate that chronic, unresolved UPR signaling is not the underlying cause of pathology, which is the main point we are making.

      The notion that pro-collagen is largely hydrophilic without the potential for exposure of hydrophobic regions that might engage BiP, thus is not sensitive to BiP sensing, is interesting. Is it possible that the tendency of the mutant polypeptides to form the triple helix which in itself acts as kind of a self chaperoning structure? Looking at the kinetics of assembly inside the cell, see suggestions above, might provide further insight into the process beyond that obtained by looking at the modified state of the lysines.

      We believe this notion is very strongly supported by the interactomic experiment showing that BiP fails to preferentially engage the poorly folding triple-helical variant. There are, however, many other chaperones and folding enzymes that assist collagen folding, including prolyl isomerases and Hsp47. Hence, it is not clear to us that substantial self-chaperoning occurs. Still, the self-chaperoning idea is intriguing, and we will note that prior work does indicate that triple-helical domains of individual procollagen polypeptides are strongly pre-organized for triple-helix formation (for a review, see Annu Rev Biochem 2009, 78, 929). That said, we hesitate to speculate here on the self-chaperoning idea without additional evidence.__

      __Minor comments:

      As I mentioned above, while the transcriptional interactome experiments are computationally sophisticated the cell biology and biochemistry would benefit from more and better quantitation.

      We have included quantitation of the extent of intracellular procollagen accumulation and the extent of apoptotic cells, which we hope helps to address this point.

      The paper is written in a style in which results and discussion are intermingled. Personally I prefer that the introductions are short, the results clearly and briefly presented and the discussion deals with the interpretation and conclusions. I thought that whole paragraphs could have been omitted. e.g. in the introduction *Omit paragraph "The fibrillar.........achondrogenesis type II" Omit paragraph "Conventional and... for example." Omit "Excitingly........in vitro and in vivo (36)." Results: First paragraph repeats last paragraph of introduction and not necessary in one place or the other, condense. *

      We appreciate this feedback and have accordingly edited the manuscript for clarity and brevity, which includes deleting or significantly shortening all the paragraphs indicated by the Reviewer. These improvements are indicated in the track-changes version of the manuscript we resubmitted.

      Figure 2 by eye MGP (Matrix gla protein inhibits vascular calcification of type II collagen) seems highly over-expressed in the homozygous mutants; MGP is supposedly an inhibitor of calcification, does its over-expression here reflect something about the adequacy of the matrix

      Overexpression of MGP could indeed reflect a defect in the matrix of the homozygous variants. It is also likely a reflection of the delayed hypertrophy and maturation observed in the homozygous variants, as matrix calcification is a step in the endochondral ossification process. We did not follow-up on this particular observation, as it is exclusively observed in the less clinically relevant homozygous variant. We added a note to the manuscript to capture the Reviewer’s point about MGP, as below:

      “The upregulation in the homozygous system of Matrix Gla Protein (MGP) (Fig. 2A), which inhibits vascular calcification of the matrix in vivo, further supports the delay in hypertrophy, and could lead to differences in the biomechanical properties of the matrix.”

      Figure 5 is good but can it be confirmed by quantitative biochemistry?

      We have included quantitation of the extent of intracellular procollagen accumulation and the extent of apoptotic cells.

      __ __Did you stain with antibodies to other ER resident chaperones other than calreticulin?

      Yes, we also stained the ER with PDI. However, the chondronoids require extensive optimization for immunostaining and we could obtain much better images using the ER marker for calreticulin, hence our choice of images to present in the manuscript.__

      __Do cells with large amounts of intracellular G1170S die?

      As indicated by the newly included TUNEL data, interestingly, even cells expressing exclusively the Gly1170Ser variant of procollagen-II do not seem to apoptose at a significantly higher rate than wild-type, at least at the timepoint considered. As mentioned above, we added these data as Fig. 8, and added discussion of these results and methodology in the relevant sections of the manuscript.__

      __Does higher magnification EM reveal any structure of the material within the dilated ER?

      We have so far not been able to use EM to obtain higher-resolution insight into intracellular procollagen structures, but we will work on this idea in future studies.__

      __Are there any inflammatory cells in the Chondronoids? To respond to aberrant proteins?

      There should not be any such cells present in the chondronoids, and we indeed do not observe any inflammatory response. As noted in the response to Reviewer 1, we added discussion regarding the absence of paracrine signals in this type of model system, which we do believe has major advantages for biochemical studies like those performed here.__

      __Paragraph

      * "Bypassing the UPR.......often do not" Is discussion not results*

      Corrected, thanks.

      Significance

      The experimental system described here is clearly the wave of the present. Generating human ipSC's of different lineages is now being exploited to study a variety of disorders, to achieve better understanding of pathogenesis at the molecular level to serve as appropriate models for drug development, particularly in the context of high throughput screening. In addition, as in this case, relatively rare autosomal dominant disorders with phenotypes that resemble more common sporadic disease, may allow the development of treatments that are relevant for the sporadic disorder. While it is likely that the osteoarthritis that develops in the carriers of the COL2A1 mutations is a function of the host response to the aberrant mechanics resulting from the defective extra-cellular matrix caused by the mutation, having a pure system in which the primary defect can be corrected and the predisposing matrix deficit reversed, could allow normal reparative processes to mitigate the functional joint disability. While the transgenic mice are useful as a disease model, they represent not only the expression of the primary defect but the host pathophysiologic response to that defect, i.e. in this case how the mouse responds to the defective matrix state and whether those responses add additional pathogenic factors to the disease course. Having a tool in which to relatively assess the pure chondrocyte effect should allow more granular analysis of the primary process.

      We appreciate the Reviewer’s careful and enthusiastic assessment of the significance of our work.__

      __

      Their findings reinforce the notion that involvement of the UPR as well as the other arms of the proteostatic response in chondrocytes expressing a variety of mutant collagens suggests a degree of heterogeneity, perhaps depending on the mutation involved. While I do not believe that their current data prove or rigorously test their proposed hypothesis, i.e. that "perhaps due to the pathologic substitution occurring within a triple-helical domain that lacks hydrophobic character, this ER protein accumulation is not recognized by cellular stress responses, such as the unfolded protein response", it is worth considering.

      We provide that hypothesis as a reasonable explanation for the absence of a UPR, and it is strongly supported by our interactomic studies. Furthermore, neither we nor others have found evidence for BiP binding the triple helical domain of procollagen in any other studies. Still, that hypothesis is not the core point of the paper and we do appreciate the Reviewer’s perspective.

      Given the fact that this is a relatively small field with a variety of observations concerning the role of proteostasis and the UPR in particular which seem to vary depending on the system, i.e. transgenic mice, transfected fibroblasts, the chondroidomas, these observations particularly with additional biochemistry to confirm their notions regarding folding rates etc, represent a useful technical addition to the field and should be interesting for people working on collagen biology, arthritis and protein folding.

      I am not a collagen biologist hence my knowledge of some of the nuances of collagen biology may not be extensive. My own areas of interest include the assembly of multi-peptide proteins (such as immunoglobulins) for secretion; the mechanisms that allow them to exit the cell and the aggregation of misfolded proteins as exemplified by the amyloidoses and other forms of clinically relevant protein aggregation. Hence, I am very familiar with tissue culture, transgenic animals as disease models, studies of protein aggregation, and as a former rheumatologist, osteoarthritis.

      We greatly appreciate the Reviewer providing such valuable and scholarly input from the perspective of a scientist with deep expertise in the secretory pathway and other diseases of protein misfolding, as well as from rheumatology. Specifically from the perspective of expertise in collagen biology/biochemistry, we hope that our detailed explanations of assays that are possible versus not possible with collagen in this system, the additional context for why our assessment of the modification of procollagen is correlated with folding/secretion rate, and the further analyses added to the paper, now make a convincing case that the improved manuscript is of high significance and is ready for publication.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility, and clarity (Required)):

      1. In this manuscript, Imoto et al. analyze the specific role of the Dynamin1 splice variant Dyn1xA in so-called ultrafast endocytosis, an important mechanism of synaptic vesicle recycling at synapses. In a previous publication (Imoto et al. Neuron 2022), some of the authors had shown that Dyn1xA, and not the other splice variant Dyn1xB, is essential for ultrafast endocytosis. Moreover, Dyn1xA forms clusters around the active zone for exocytosis and interacts with Syndapin 1 in a phosphorylation dependent manner. However, it was unclear which molecular interactions underlie the specific role of Dyn1xA. Here, the authors provide convincing evidence with pull down assays and CSP that Dyn1xA PRR interacts with EndophilinA1/2 with two binding sites. The first binding site lies in the part common to xA and xB, was previously characterized. The second site was previously uncharacterized, is specific for Dyn1xA, and is regulated by phosphorylation (phosphobox 2). The location of these splice variants and mutated forms at presynaptic sites correlate with the prediction made by the biochemical assays. Finally, the authors perform rescue experiments ('flash and freeze' and VGLUT1-pHluorin imaging experiments) to show that Dyn1xA-EndophilinA1/2 binding is important for ultrafast endocytosis. I find the results interesting, providing an important step in the understanding of the interplay between dynamin and the endocytic proteins interacting with it (endophilin, syndapin, amphiphysin) in the context of synaptic vesicle recycling. The manuscript is clearly written and for the most part the data supports the authors' conclusions (see specific comments below). However, there are some issues which need to be clarified before this manuscript is fully suitable for publication.

      We thank the reviewer for noting the importance of our study. Indeed, our previous study has raised the question as to why only the Dyn1xA splice variant mediates ultrafast endocytosis, and our current manuscript now resolves this issue.

      Introduction: the dynx1B Calcineurin binding motif is written PxIxIT consensus but actual sequence is PRITISDP. Is this a typo?

      The sequence is correct. One thing we failed to mention is that the last amino acid in this motif can be either threonine or serine for calcineurin binding, as we demonstrated previously [Jing, et al., 2011 JBC; PMC3162388]. We have amended the text as follows.

      1. calcineurin-binding motif (PxIxI[T/S]) 19.

      Figure 1: the difference between the constructs used in panels C and D is not clear. In D, is it a truncation without residues 796 and 845? If so, it should be labelled clearly in the Western blots. In Panel E, Dyn1xA 746-798 should be labeled Dyn1x 746-798 because it is common to both splice variants.

      We thank the reviewer for pointing this out. Both C and D used the full-length PRRs of Dyn1xA-746 to 864 and xB-746 to 851. To make the labeling clear, we changed Dyn1xA PRR to “Dyn1xA PRR (746-864)” and Dyn1xB PRR to “Dyn1xB PRR 746-851” in Figure 1. In the main text, we made the following changes.

      1. 4: “To identify the potential isoform-selective binding partners, the full-length PRRs of Dyn1xA746-864 and xB746-851 (hereafter, Dyn1xA-PRR and Dyn1xB-PRR, respectively).”

      Figure 1: For amphiphysin binding the authors write that "No difference in binding to Amphiphysin 1 was observed among these peptides (Figure1D-F)." They should write that Dyn1x 746-798 does not bind Amphiphysin1 SH3 domain, confirming the specificity of binding to the 833-838 motif.

      We edited the sentence as suggested.

      1. “Dyn1x 746-798 does not bind Amphiphysin1 SH3 domain (Figure 1G), confirming the specificity of binding to the 833-838 motif as reported in previous studies 29,30. (Figure 1D-F).”

      Figure S2. The panels are way too small to see the shifts and the labelling. Please provide bigger panels

      As suggested, we have now provided bigger panels in Figure S2, and amended the text and Figure legend accordingly.

      We also removed Figure S2B as it was not referred to in the text in any way. (It was the reverse experiment – HSQCs of 15N-labelled SH3 titrated with unlabelled dynamin).l

      Figure 2 panel B. There is a typo in the connecting line between the sequence and the CSP peaks. It is 846 instead of 864 (after 839).

      Corrected.

      Figure 3 panel E. In the text, the authors write that "Western blotting of the bound proteins from the R838A pull-down experiment showed that R838A almost abolished both Endophilin and Amphiphysin binding in xA806-864 (Figure 3D), and reduced Endophilin binding to xA-PRR (Figure 3E)." I think they should write "only slightly reduced Endophilin binding..." it is more faithful to the result and consistent with the conclusion that Endophilin A1 has two binding sites on Dyn1xA PRR.

      We have now provided quantitative data for R838A and R846A (Fig. 3F and G). Endophilin binding is significantly reduced with R846A.

      It is unclear why the R846A mutant affects binding of Dyn1xA 806-864 but not Dyn1xA-PRR-.

      The reviewer asks why the R846A mutant affects binding of Dyn1xA 806-864, but not so much of Dyn1xA-PRR. The explanation is simply that there are two endophilin binding sites in Dyn1xA-PRR. The first is not present in the xA806-864 peptide, while both are present in Dyn1xA-PRR (the full length tail). When doing pull-down experiments, the binding tends to saturate – even when the second site is blocked by R846A. The first site is still able to bind, and the binding appears as normal. The same applies to the R838A mutant.

      Moreover, it affects binding to endophilin as well as amphiphysin, and therefore it is not specific. It is thus not correct to write that "R846 is the only residue found to specifically regulate the Dyn1 interaction with Endophilin as a part of an SDE". In the Discussion (page 11), the authors refer to the R846A mutation as specifically affecting Endophilin binding. This should be toned down, as it also affects Amphiphysin binding. For this important point, the data on quantification of Endophilin binding should be presented.

      The reviewer’s concern is about our claims of specificity of Endophilin A binding in Dyn1xA R846 mutation experiments. The reviewer is correct, and we have now defined specific parameters for those claims. Specifically, we have added new quantitative data from the Western blots in Fig 3E (full-length Dyn1aX-PRR) as Fig 3F-G. We used full-length Dyn1aX-PRR rather than the xA806-864 peptide because the subsequent transfection experiments use full length Dyn1xA. In the new figures 3F and 3G, we quantified Endophilin A, Amphiphysin and Syndapin1 amounts from the multiple Western blots such as Figure 3E (now n=14, 6 experiments, each in with 2-4 replicates for Dyn1xA PRR). R846A mutated in Dyn1xA-PRR significantly reduces the binding to Endophilin A, but it does not significantly affect the binding to Amphiphysin 1and Syndapin1 (Fig 3G). Therefore, this particular Dyn1xA-PRR mutation specifically affects Endophilin A binding, in the context of the full-length tail Dyn1aX-PRR. To make these results clear, we modified the text as below.

      P7. “R838A and R846A caused smaller reductions in Endophilin binding compared to wild-type Dyn1xA-PRR, (Figure 3E, 3F, R838A, median 68.5 ; Figure 3G, R846A, median 59.3 % : R838A reduced the Dyn1/Amphiphysin interaction (Figure 3E, 3F, median 14.2 % binding compared to wild-type Dyn1xA-PRR). By contrast, R846A did not affect Amphiphysin and Syndapin binding to Dyn1xA-PRR (Figure 3E, 3G). Therefore, R846, being part of an SDE, is the only residue we found to specifically regulate the Dyn1 interaction with Endophilin in the context of the full length tail (DynxA-PRR)”.

      Additionally, the reviewer notes that “the authors refer to the R846A mutation as specifically affecting Endophilin binding. This should be toned down, as it also affects Amphiphysin binding.” In the light of the above data and new quantitative analysis (Fig 3F-G), we have clarified the conclusion. However, to be clear that this statement is only correct in the context of the full-length DynxA-PRR, we amended texts as follows:

      P7. “By contrast, R846A did not affect Amphiphysin and Syndapin binding to Dyn1xA-PRR (Figure 3E, 3G). Therefore, R846, being part of an SDE, is the only residue we found to specifically regulate the Dyn1 interaction with Endophilin in the context of the full length tail (DynxA-PRR)”.

      New legends for Figure 3F and G have now been added as follows.

      “(F) The binding of Endophilin A, and Amphiphysin 1 and Syndapin1 to Dyn1xA-PRR (wild type) or R838A mutant quantified from Western blots in (E). n=14 (6 experiments with 2-4 replicates in each). Median and 95% confidential intervals are shown. Kruskal-Wallis with Dunn’s multiple comparisons test (**p (G) The binding of Endophilin A, and Amphiphysin 1 and Syndapin1 to Dyn1xA-PRR (wild type) or R846A mutant quantified from Western blots in (E). n=14 (6 experiments with 2-4 replicates in each). Median and 95% confidential intervals are shown. Kruskal-Wallis with Dunn’s multiple comparisons test was applied (*p

      Figure 3F-G (which are now 3H and 3I in the revised text): what do the star symbols represent in the graphs? I guess the abscissa represents retention time. Please write it clearly instead of a second ordinate for molecular mass, which does not make much sense if this reflects the estimate for the 3 conditions.

      The “stars” are crosses (x) and represent individual data points. The figure legends have been updated for clarity. The reviewer is correct that the X-axis is retention time (min). The second Y-axis is needed to define the points in the curve marked with crosses (x’s). The legends for Figure 3H and I are now changed as follows.

      “(H) SEC-MALS profiles for Dyn1xA alone (in green), Endophilin A SH3 alone (in red) and the complex of the two (in black) are plotted. The x-axis shows retention time. The left axis is the corresponding UV absorbance (280 nm) signals in solid lines, and the right axis shows the molar mass of each peak in crosses. The molecular weight of the complex was determined and tabulated in comparison with the predicted molecular weight. x represent individual data points.

      (I) SEC-MALS profiles for a high concentration of Dyn1xA-PRR/Endophilin A SH3 complex (0.5 mg) (in dark blue) and a low concentration of Dyn1xA-PRR/endophilin A SH3 complex (0.167 mg) (in blue). The x-axis shows retention time. The left axis is the corresponding UV absorbance (280 nm) signals in solid lines, and the right axis shows the molar mass of each peak in crosses. The molecular weight of the complex was determined and tabulated in the table. x represent individual data points.”

      Figure 4: The statement that "By contrast [to Dyn1xA], Endophilin A1 or A2 formed multiple clusters (1-5 clusters)" is not at all clear on the presented pictures. The authors should provide views of portions of axons with several varicosities, for the reader to appreciate the cases where there are more EndoA clusters than Dyn1 clusters.

      In the revised Figure S4, we added additional STED images for a region of axons with more EndoA1/2 clusters than Dyn1xA clusters. The locations of Dyn1xA and EndoA1/2 clusters are annotated in each image based on the local maximum of intensity, which is determined using our custom Matlab analysis scripts (Imoto, et al., Neuron 2022; for the description of the methods, please refer to the Point #14 below). We also added Figure S3 to describe our analysis pipelines. In the Dyn1xA channel, outer contour indicates 50% of local maxima (boundary of Dyn1xA cluster) while inner contour indicates 70% of local maxima of the clusters. In the EndoA1/2 channel, local maxima of the clusters are indicated as points. To reflect these changes, we modified text as below.

      P 9. “By contrast, Endophilin A1 or A2 formed multiple clusters (1-5 clusters) (Figure S4)”

      The legends for Figure S4 are now as follows.

      “Figure S4. Additional STED images for Figure 4.

      (A) The top image shows an axon containing multiple boutons. Signals show overexpression of GFP-tagged Dyn1xA (Dyn1xA) and mCherry-tagged Endophilin A1 (EndoA1). The bottom images show magnifications of four boutons in the top image. Red hot look-up table (LUT) images on the right side of Dyn1xA and EndoA1 images are enhanced contrast images. Outer and inner contours represent 50% and 70% of local maxima of the Dyn1xA, respectively. Black circles represent local maxima of Endophilin A1. In these boutons, multiple EndophilinA1 puncta are present.

      (B) The top image shows an axon congaing multiple boutons. Signals show overexpression of mCherry-tagged Dyn1xA (Dyn1xA) and GFP-tagged Endophilin A1 (EndoA1). The bottom images show magnifications of four boutons in the top image. Red hot LUT images on the right side of Dyn1xA and EndoA2 images are enhanced contrast images. Outer and inner contours represent 50% and 70% of local maxima of the Dyn1xA, respectively. Black circles represent local maxima of Endophilin A2. In these boutons, multiple EndophilinA2 puncta are present.

      (C) STED micrographs of the same synapses as in Figure 4E with an active zone marker Bassoon (magenta) visualized by antibody staining. GFP-tagged Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A (green) are additionally stained with GFP-antibodies. Local maxima of Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A signals and minimum distance to the active zone boundary are indicated by dark blue lines.”

      Moreover, overexpression of EndophilinA1/2-mCherry is not sufficient to assess its localization. Please consider either immunofluorescence or genome editing (e.g. Orange or TKIT techniques).

      We agree with the reviewer that overexpression obscures the endogenous localization of proteins. To address this point in our previous publication, we titrated the amount of plasmids for Dyn1xA-GFP and transfected neurons just for 20 hours – this protocol allowed us to uncover the endogenous localization of Dyn1xA despite the fact that it was overexpressed in wild-type neurons (Imoto, et al., 2022). We also confirmed this localization by ORANGE-based CRISPR knock-in of GFP-tag in the endogenous locus of Dyn1 just after the exon 23 and confirm the true endogenous localization of Dyn1xA (Imoto, et al., 2022). Similar approaches were taken by the Chapman lab to localize Synaptotagmin-1 and Synaptobrevin 2 in axons (Watson et al, 2023, eLife, PMID: 36729040). We did not emphasize this in the first submission, but we took the same approach for the EndoA1/2 localization. This does not mean that they also unmask the endogenous localization, and the reviewer is correct that additional evidence would strengthen the data here. Thus, as suggested, we have looked at the endogenous EndophilinA1 localization by antibody staining. As the reviewer is likely aware, EndophilinA1 also localizes to other places including dendrites and postsynaptic terminals, making it difficult to analyze the data. However, we observe colocalization of Dyn1xA with endogenous EndoA1. Thus, we believe that our major conclusion here drawn based on EndoA1/2-mCherry overexpression is valid (Reviewer’s Figure 1). Since the Endophilin signals in neighboring processes obscures its localization in synapses-of-interest, repeating this localization experiments with ORANGE-based knock-in would be ideal. However, with the lead author starting his own group and many validations needed to confirm the knock-in results, this experiment would require us at least 4-6 months, and thus, it is beyond the scope of our current study. We will follow up on this localization in the near future, but given that endophilin is required for ultrafast endocytosis (Watanabe, et al., Neuron 2018, PMID: 29953872) and these proteins need to be in condensates at the endocytic sites for accelerating the kinetics of endocytosis (Imoto, et al., Neuron 2022, PMID: 35809574), we are confident that endogenous

      EndoA1/2 are localized with Dyn1xA.

      The analysis of the confocal microscopy data is not explained. How is the number of clusters determined? How far apart are they? Confocal microscopy may not have the resolution to distinguish clusters within a synapse.

      We apologize for the insufficient description of the method. We had provided a more thorough description of the methods in our previous publication (Imoto, et al., Neuron 2022, PMID: 35809574). To make this more automated, we improved our custom Matlab scripts. Please note that all the analysis for the cluster location is performed on STED images, not on normal confocal images. To determine the cluster, first, presynaptic regions (based on Bassoon signals or Dyn1xA signals within boutons) in each STED image are cropped with 900 by 900 nm (regions-of-interest) ROIs. Then, our Matlab scripts calculate the local maxima of fluorescence intensity within the ROIs. To determine the distance between the active zone and the Dyn1xA or EndoA1/2 clusters, the Matlab scripts perform the same local maxima calculations in both channels and make contours at 50% intensity of the local maxima. The minimum distance reflects the shortest distance between the active zone and Dyn1xA/EndoA1/2 contours. To make these points clearer, we modified the main text and the Methods section. In addition, we have added workflow of these analysis as Figure S3.

      P9. Main. “Signals of these proteins are acquired by STED microscopy and analyzed by custom MATLAB scripts, similarly to our previous work23.”

      P20. Methods. “All the cluster distance measurements are performed on STED images. For the measurements, a custom MATLAB code package23 was modified using GPT-4 (OpenAI) to perform semi-automated image segmentation and analysis of the endocytic protein distribution relative to the active zone marked by Bassoon or relative to Dyn1xA cluster in STED images. First, the STED images were blurred with a Gaussian filter with radius of 1.2 pixels to reduce the Poisson noise and then deconvoluted twice using the built-in deconvblind function: the initial point spread function (PSF) input is measured from the unspecific antibodies in the STED images. The second PSF (enhanced PSF) input is chosen as the returned PSF from the initial run of blind deconvolution62. The enhanced PSF was used to deconvolute the STED images to be analyzed. Each time, 10 iterations were performed. All presynaptic boutons in each deconvoluted image were selected within 3030-pixel (0.81 mm2) ROIs based on the varicosity shape and bassoon or Dyn1xA signals. The boundary of active zone or Dyn1xA puncta was identified as the contour that represents half of the intensity of each local maxima in the Bassoon channel. The Dyn1xA clusters and Endophilin A clusters were picked by calculating pixels of local maxima. The distances between the Dyn1xA cluster and active zone boundary or Endophilin A clusters were automatically calculated correspondingly. For the distance measurement, MATLAB distance2curve function (John D'Errico 2024, MATLAB Central File Exchange) first calculated the distance between the local maxima pixel and all the points on the contour of the active zone or Dyn1xA cluster boundary. Next, the shortest distance was selected as the minimum distance. Signals over crossing the ROIs and the Bassoon signals outside of the transfected neurons were excluded from the analysis. The MATLAB scripts are available by request.”

      In the legend of Figure S3,

      “Protein localization in presynapses is determined by semi-automated MATLAB scripts (see Methods).

      (A) Series of deconvoluted STED images are segmented to obtain 50-100 presynapse ROIs in each condition.

      (B) Two representations of the MATLAB analysis interface are shown. The first channel (ch1, green) is processed to identify the pixels of local maxima within this channel. The second channel (ch2, magenta) is normally an active zone protein, Bassoon. Active zone boundary is determined by the contour generated at 50% intensity of the local maxima of ch2. The contours outside of the transfected neurons are manually selected on the interface and excluded from the analysis. Minimum distances from each pixel of the local maxima in ch1 to the contour in ch2 are calculated and shown in the composite image. The plot “Distance distribution” shows all the minimum distance identified in this presynapses ROI (unit of the y axis is nanometer). The plot “Accumulated distance distribution” shows the accumulated distance distribution from the initial to the current presynapses ROI. The plot “Histogram of total intensity” shows the intensity counts around individual local maxima pixels in ch1.”

      For the STED microscopy, a representation of the processed image (after deconvolution) and the localization of the peaks would be important to assess the measurement of distances. If Dyn1xA S851/857D is more diffuse, are there still peaks to measure for every synapse?

      We thank the reviewer for bringing up this important question. In Figure S4C, we have added the position of the local maxima of wild-type and mutant Dyn1xA shown in the main Figure 4E. As the reviewer pointed out, when a protein is more diffuse, it is difficult to find the peak intensity by STED. However, since these proteins are still found at a higher density within a very confined space of a presynapse and synapses are packed with organelles like synaptic vesicles and macromolecules, signals from even diffuse proteins can be detected as clusters, and local maxima can be detected in these images.

      To illustrate this point better, we added Reviewer’s Figure 2 below. In this experiment, we transfected neurons with a typical amount of plasmids (2.0 µg/well) or ~10x lower amount (0.25 µg/well). When the density of cytosolic proteins is high (Reviewer’s Figure 2A), the depletion laser has to be strong enough to induce sufficient stimulated emission and resolve protein localization. Insufficient power would produce low resolution images, leading to inappropriate detection of the local maxima (Reviewer’s Figure 1A). Thus, we set our excitation and depletion laser powers to resolve the protein localization to ~40-80 nm at presynapses. Furthermore, to avoid mislocalization of proteins due to the overexpression, we use 0.25-0.5 ug/well (in 12-well plate) of plasmid DNA for transfection, which is around 10 times lower than the amount used in the typical lipofectamine neuronal transfection protocol (Imoto, et al., Neuron 2022). We also change the medium around 20 hours after the transfection instead of the typical 48 hours (Imoto, et al., Neuron 2022). With these modifications and settings, we can obtain the location of the local maxima of the diffuse signals (Reviewer’s Figure 1B and Figure 4E and Figure S4). We modified the Method section to make these points clearer.

      P 17, “Briefly, plasmids were mixed well with 2 µl Lipofectamine in 100 µl Neurobasal media and incubated for 20 min. For Dyn1xA and Endophilin A expressions, 0.5 µg of constructs were used to reduce the overexpression artifacts23. The plasmid mixture was added to each well with 1 ml of fresh Neurobasal media supplemented with 2 mM GlutaMax and 2% B27. After 4 hours, the medium was replaced with the pre-warmed conditioned media. To prevent too much expression of proteins, neurons were transfected for less than 20 hours and fixed for imaging.”

      P 20, “Quality of the STED images are examined by comparing the confocal and STED images and measuring the size of signals at synapses and PSF (non-specific signals from antibodies).”

      Legends for Figure S4C,

      “(C) STED micrographs of the synapses shown in Figure 4F with an active zone marker Bassoon (magenta). GFP-tagged Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A are visualized by antibody staining of GFP (green). Local maxima of Dyn1xA, Dyn1xA S851D/857D or Dyn1xA R846A signals and minimum distance to the active zone boundary are overlaid.”

      Figures 5 and 6: No specific comment. The data and its analysis are very nice and elegant. The comment on the lack of rescue of Dyn1xA on endosome maturation may be a bit overstated, because many "controls" (shRNA control Figure S5 or Dyn3 KO in Imoto et al. 2022) have a significant number of endosomes 10 s after stimulation.

      We thank the reviewer for noting the strength of our data and pointing out this issue on endosomal resolution. In particular, the reviewer is concerned about our interpretation of the ferritin positive endosomes present at 10 s in time-resolved electron microscopy experiments. Indeed, the number of ferritin positive endosomes in Dyn1 KO, Dyn1xA OEx neurons (0.1/profile) is similar to the control conditions: scramble shRNA control (0.1/profile, Figure S5) and Dyn3KO neurons (0.2/profile) in our previous study (Imoto et al. 2022). Although we do not consider Dyn3 KO as a control, given the presence of abnormal endosomal structures, we agree with the reviewer that scramble shRNA control in Figure S5 does indicate that some ferritin-positive endosomes even at 10 s after stimulation. We would like to note that this result is in stark contrast to our previous studies where we observed the number of ferritin positive endosomes returning to the basal level in both wild-type neurons and many scramble shRNA controls (Watanabe et al. 2014, 2018, Imoto et al 2022). Thus, the majority of the data we have indicate that the number of ferritin positive endosomes returns to basal level by 10 s, suggesting that endosomes are typically resolved into synaptic vesicles by this time. However, given that we do not know the nature of the inconsistency here and we cannot exclude the possibility of overexpression artifact of Dyn1xA as an alternative, we changed the following lines.

      P. 10, “Interestingly, the number of ferritin-positive endosomes did not return to the baseline (Figure 5E, F) as in previous studies3,35,36, suggesting that Dyn1xA may not fully rescue the knockout phenotypes or that overexpression of Dyn1xA causes abnormal endosomal morphology.”

      By the way, why did the authors use Dyn1 KO in this study, and not Dyn1,3 DKO as in Imoto et al. 2022?

      This is simply because Dyn3KO displayed an endosomal defect in our previous study (Imoto et al 2022), and we wanted to focus on endocytic phenotypes of Dyn1 KO and mutant rescues in this study.

      In the Discussion, the authors present the binding sites (for endophilin and amphiphysin SH3 domains) as independent. However, these proteins form dimers or even multimers as they cluster around the neck of a forming vesicle. Even though they provide evidence in vitro (Figure 3) that in these conditions of high concentration one dyn1xA-PRR binds one SH3 domain, in cells multiple binding sites on the PRR to these proteins may involve avidity effects, as discussed for example in Rosendale et al. 2019 doi 10.1038/s41467-019-12434-9. For example, the high affinity binding of Dyn1-PRR to amphiphysin cannot be explained only by the sequence 830-838.

      The reviewer suggests “In the Discussion, the authors present the binding sites (for endophilin and amphiphysin SH3 domains) as independent.” However, we do not claim these interactions are functionally independent, except in the context of in vitro experiments where they are sequence-independent.

      They also suggest “However, these proteins form dimers or even multimers as they cluster around the neck of a forming vesicle”. However we do not agree with this in the context of our Discussion, because the evidence of multimers and clustering is convincing but is entirely in vitro data.

      Thirdly they comment that “For example, the high affinity binding of Dyn1-PRR to amphiphysin cannot be explained only by the sequence 830-838.” We fully agree with the statement and felt we had addressed this in the manuscript. To explain, it’s important to point out our relatively new concept here and previously reported by us (Lin Luo et al 2016, PMID: 26893375) of the existence and importance of SDE and LDE for SH3 domains (Endophilin here, syndapin in our previous report). These elements act at a distance from the so-called core PxxP motifs and they provide much higher affinity and specificity than the core region alone. We had further mentioned this in the p11 discussion “Although this is a previously characterized binding site for Amphiphysin and is also present in Dyn1xB-PRR, the extended C-terminal tail of Dyn1xA contains short and long distance elements (SDE and LDE) essential for Endophilin binding, making it higher affinity for Endophilin.” Because the NMR identified F862 as a chemical shift for dynamin, we performed a pulldown with this mutant in the xA746-798 construct (which only contains the higher affinity site) and found that indeed “.F862A reduced Endophilin binding 29% (pOverall, the reviewer correctly points out that “multiple binding sites on the PRR to these proteins may involve avidity effects*” could play a role in vivo. We agree that avidity is an additional possibility, not examined in our study. Therefore, as suggested, we added the following sentence to the discussion on the SDE and LDE impacts.

      P. 11. “Our pull-down results showed that R846A abolished endophilin binding to xA806-864 (which contains only the second and higher affinity binding site and the associated SDE (A839) and LDE (F862)) and reduced about 40% of endophilin binding to the Dyn1xA-PRR (which contains both binding sites) without affecting its interaction with Amphiphysin, providing important partner specificity, although we cannot exclude the possibility that avidity effects may additionally come in play in vivo 42

      Reviewer #1 (Significance (Required)):

      This study provides a significant advance on the mechanisms of dynamin recruitment to endocytic zones in presynaptic terminals. The work adds a significant step by experienced labs (Robinson, Watanabe) who have provided important insight in the mechanisms by many publications in the last years.

      We thank the reviewer for the careful read of our manuscript and positive outlook of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      1. This is a compelling study that reports a key discovery to understand the molecular mechanism of ultrafast endocytosis. The authors demonstrate that the Dynamin splice version 1xA (Dynamin 1xA) uniquely binds Endophilin A, in contrast to Dynamin splice version 1xB (Dynamin 1xB) that does not bind Endophilin A and it is not required for ultrafast endocytosis. In addition, the Endophilin A binding occurs in a dephosphorylation-regulated manner. The study is carefully carried out and it is based on high quality data obtained by means of advanced biochemical methodologies, state-of-art flash-freezing electron microscopy analysis, superresolution microscopy and dynamic imaging of exo-and endocytosis in neuronal cultures. The results convincingly support the conclusions.

      We thank the reviewer for supporting the conclusions of our study.

      1. Although additional experiments are not essential to support the claims of the paper there is room, however, for improvement within the pHluorin experiments. These experiments, that are clearly informative and consistent with the rest of experimental data, do not apply the useful approach to separate endo- from exocytosis. The use of bafilomycin or folimycin to block the vesicular proton pump allows the unmasking the endocytosis that is occurring during the stimulus, that should correspond to ultrafast endocytosis. It would be very elegant to demonstrate that such a component, as expected according to the electron microscopy data, requires the binding of Endophilin A to Dynamin 1xA. If the authors have the pHluorin experiments running, the suggested experiments are very much doable because the reagents and the methodology is already in place and the new data could be generated in around six weeks.

      We thank the reviewer for the suggestion. The reviewer is concerned that vGlut1 pHluorin experiment in Figure 6 may not correspond to ultrafast endocytosis. We agree that bafilomycin/folimycin treatment will reveal the amount of endocytosis that takes place while neurons are stimulated. However, we are not certain that endocytosis during this phase would fully correspond to ultrafast endocytosis because reacidification of endocytosed vesicles typically takes 3-4 s (Atluri and Ryan, 2006, PMID: 16495458; although see https://elifesciences.org/articles/36097) and thus, the nature of endocytosis cannot be fully determined by this assay. To claim that endocytosis measured by pHluorin assay during stimulation all correspond to ultrafast endocytosis, we would need to perform very careful work to track single pHluorin molecules at the ultrastructural level and corelate their internalization to pHluorin signals. Perhaps, a rapid acid quench technique used by the Haucke group would also be appropriate to estimate the amount of ultrafast endocytosis (Soykan et al. 2017 PMID: 28231467), but we are not set up to perform such experiments here. Also, our lead author, Yuuta Imoto, is leaving the lab to start up his own group, and it will take us months rather than weeks to get the requested experiments done. Since the point of this experiment was to test whether the interaction of Dyn1xA and EndoA is essential for protein retrieval regardless of the actual mechanisms and the reviewer acknowledges that this point is sufficiently supported by the experiments, we will set this experiment as the priority for the next paper.

      Instead of the bafilomycin or rapid acid quenching experiments, we have now added data from vglut1-pHluorin experiment with a single action potential. With a single action potential, all synaptic vesicle recycling is mediated by ultrafast endocytosis in these neurons (Watanabe et al, 2013 PMID: 24305055; Watanabe et al. 2014, PMID: 25296249). Our electron microscopy experiments in Figure 5 is also performed with a single action potential. As with 10 action potentials, 20 Hz experiments, re-acidification of vglut1-pHluorin is blocked when Dyn1 and EndophilinA1 interaction is disrupted (Figure 6 F-I). We added a description of this result as below.

      P 11. “Similar defects were observed when the experiments were repeated with a single action potential – synaptic vesicle recycling is mediated by ultrafast endocytosis with this stimulation paradigm25 (S851/857 recovery is 73.3% above the baseline; R846A, recovery is 30.0% above the baseline) (Figure S9 A-D). Together, these results suggest that the 20 amino acid extension of Dyn1xA is important for recycling of synaptic vesicle proteins mediated by specific phosphorylation and Endophilin binding sites within the extension.”

      The methods are carefully explained. Some of the experiments are only replicated in two cultures and the authors should justify the reasons to convince the audience that the approaches used have enough low variability for not increasing the n number. The pHluorin experiments, however, are performed only in a single culture; they should replicate these experiments in at least 3 different cultures (three different mice).

      The reviewer is correct. The variability is very low in our ultrastructural studies and STED imaging, and thus, in all our previous publications, two independent cultures are used. We do agree that in the ideal case, we would like to have three independent cultures, but given the nature of ultrastructural studies (control, mutants, and multiple time points), triplicating the data would add another year to our work. We are currently developing AI-based segmentation analysis, and once this pipeline is established, we will be able to increase N. However, please note that for these experiments, we examine around 200 synapses from each condition in electron microscopy studies (Table S2)– these numbers are far more than the gold standard in the field. Likewise, 50-100 synapses are examined for STED experiments (Table S2). To examine variability of our analysis results, we compared a significance between the dataset using cumulative curves and Kolmogorov–Smirnov test (Figure S11). As shown in the summarized data and p value in each condition, there are no significant difference between the datasets.

      For pHluorin analysis, the reviewer is correct. We repeated the experiments twice to increase the N after the initial submission. The data are consistent, and the conclusions are not changed by the additional experiments (Figure 6 and Figure S9). We also changed the Statistical analysis section in Methods as below.

      P. 19. “All electron microscopy data are pooled from multiple experiments after examined on a per-experiment basis (with all freezing on the same day); none of the pooled data show significant deviation from each replicate (Table S2).”

      p 19, “All fluorescence microscopy data were first examined on a per-experiment basis. For Figure 4, the data were pooled; none of the pooled data show significant deviation from each replicate (Figure S11 and Table S2). Sample sizes were 2 independent cultures, at least 50-100 synapses from 4 different neurons in each condition..”

      Legends for Figure S11

      Figure S11. Data variability in Figure 4.

      Cumulative curves are made from each dataset of (A) distance of Endophilin A1 puncta from the edge of Dyn1xA puncta, (B) distance of Endophilin A2 puncta from the edge of Dyn1xA puncta, distance distribution of Dyn1xA from active zone edge in (C) neurons expressing wild-type Dyn1xA-GFP, (D) Dyn1xA-S851/857-GFP and (E) Dyn1xA-R846-GFP. n > 4 coverslips from 2 independent cultures. Kolmogorov–Smirnov (KS) test, p values are indicated in each plot.

      Minor comments: 4. Prior studies referenced appropriately and the text and figures are clear and accurate.

      We thank the reviewer for the careful read of our manuscript.

      The authors should discuss about the mediators (enzymes) responsible for dephosphorylation of phosphor-box 2 that is key for the Dynamin 1xa-Endophilin A interaction.

      We thank the reviewer for the suggestion. We added a discussion on a potential mediator, Dyrk1, as below.

      P. 12. ”What are the kinases that regulate Dyn1? The phosphorylation of phosphobox-1 is mediated by Glycogen synthase kinase-3 beta (GSK3ß) and Cyclin-dependent kinase 5 (CDK5)17, while phosphobox-2 is likely phosphorylated by Trisomy 21-linked dual-specificity tyrosine phosphorylation-regulated kinase 1A (Mnb/Dyrk1)44,45 since Ser851 in phosphobox-2 is shown to be phosphorylated by Mnb/Dyrk1 in vitro32. Furthermore, overexpression of Mnb/Dyrk1 in cultured hippocampal neurons causes slowing down the retrieval of a synaptic vesicle protein vGlut146. Consistently, our data showed that phosphomimetic mutations in phosphobox-2 results disruption of Dyn1xA localization, perturbation of ultrafast endocytosis, and slower kinetics of vGlut1 retrieval. However, how these kinases interplay to regulate the interaction of Dyn1xA, Syndapin1 and Endophilin A1 for ultrafast endocytosis is unknown.”

      It would be very helpful to include a final cartoon depicting the key protein-protein interactions regulated by dephosphorylation (activity) and the sequence of molecular events that leads to ultrafast endocytosis

      As suggested, we made a model figure, (new Figure 7) showing how Dyn1xA and its interaction with EndoA and Syndapin1 increases the kinetics of endocytosis at synapses. Regarding the sequence of molecular events, we think that there are already dephosphorylated fraction of Dyn1xA molecules sitting on the endocytic zone at the resting state and they mediate ultrafast endocytosis. However, it is equally possible that activity-dependent dephosphorylation of Dyn1xA also may play a role (Jing et al. 2011, PMID: 21730063). However, we have no evidence about the sequence of activity dependent modulation of Dyn1xA and its binding partners during ultrafast endocytosis yet. This is much beyond what we have reported in this work and therefore, excluded from the model figure. We added the following to the end of the discussion:

      p13, “Nonetheless, these results suggest that Dyn1xA long C-terminal extension allows multivalent interaction with endocytic proteins and that the high affinity interaction with Endophilin A1 permits phospho-regulation of their interaction and defines its function at synapses (Figure S7)”.

      Figure legend Figure 7,

      “Figure 7. Schematics depicting how specific isoforms Dyn1xA and Endophilin A mediate ultrafast endocytosis.

      A splice variant of dynamin 1, Dyn1xA, but not other isoforms/variants can mediate ultrafast endocytosis. (A) Dyn1xA has 20 amino acid extension which introduces a new high affinity Endophilin A1 binding site. Three amino acids, R846 at the splice site boundary, S851 and S857, act as long-distance element which can enhance affinity of proline rich motifs (PRM) to SH3 motif from outside of the PRM core sequence PxxP. (B) At a resting state, Dyn1xA accumulates at endocytic zone with SH3 containing BAR protein Syndapin 123 and Endophilin A1/2. When phosphobox-1 (Syndapin1 binding) and phosphobox-2 (Endophilin A1/2 binding, around S851/S857) within Dyn1xA PRD are phosphorylated, these proteins are diffuse within the cytoplasm. A dephosphorylated fraction of Dyn1xA molecules can interact with these BAR domain proteins. Loss of interactions including Dyn1xA-R846A or -S851/857D mutations, disrupts endocytic zone pre-accumulations. Consequently, ultrafast endocytosis fails.”

      Reviewer #2 (Significance (Required)):

      This is a remarkable and important advance in the field of endocytosis. The study reports a key discovery to understand the molecular mechanism of ultrafast endocytosis. Scientist interested in synaptic function and the general audience of cell biologist interested in membrane trafficking will very much value this study. The mechanism reported will potentially be included in textbooks in the near future.

      My field of expertise includes molecular mechanisms of presynaptic function and membrane trafficking.

      I have not enough experience to evaluate the quality of the NMR experiments, however, I do not have any problem at all with, in my opinion, elegant results reported.

      We thank the reviewer for the positive outlook of our manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Please find below a point-by-point reply to the reviewers, with our comments in plain text, and reviewer comments in italics. Direct quotations of MS revisions in the below point-by-point reply are in quotation marks.


      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **The manuscript "Circadian regulation of protein turnover and proteome renewal" investigates the role of protein degradation in the circadian control of proteostasis. The researchers suggest that the relatively static levels of protein levels in a cell are incongruent with the known oscillation in protein synthesis. They therefore hypothesize that there should be a compensatory mechanism to counteract rhythmic protein synthesis, rhythmic protein degradation. To investigate this, they employ bulk pulse chase labeling to study the process of degradation. They identify a synchronization between the creation and turnover of proteins in a cell, implying the clock helps to maintain homeostasis through a novel mechanism. They note that these phases align with energy availability, granting a plausible reasoning behind the biological implementation of this regulation. In summary, this is a sound manuscript that adds to the research field. The experiments in this manuscript are well thought out, organized, and explained. In general, the authors do not go further in their conclusions than I think is warranted given the data that they have, though I think that there are some key items that should be addressed before the publication of this manuscript. *

      Thank you for reading and appreciating our work

      Major notes: 1) In figure 1, a clearer idea of what the ** means would be appreciated. What was the standard of significance for this measure?

      Thank you, this was already reported in the methods section but is now reported in the figure legend also.

      * 2) In Figure 1b, it is important to note clearly in the text that the this is not a direct measure of protein degradation, but a subtractive proxy. Though I don't think that necessarily makes the authors conclusions incorrect, the same result could also be obtained if an extra 15% of the proteins were moved into the insoluble fraction. This is the same for Figure 1E and F. *

      Considering only the pulse shown in the left-hand graph of 1B, the reviewer is correct that this could arise by rhythmic partitioning of nascently synthesised proteins between digitonin-soluble and insoluble fractions. This could not readily explain the variation in the % of nascently synthesised digitonin-soluble protein that is degraded however (right hand graph), hence the need for pulse-chase rather than pulse alone. As such, we do not exclude circadian-regulated solubility of nascently synthesised protein or that there is a rhythm of protein synthesis in the soluble fraction, both are likely true. Rather Figure 1B indicates the relative proportion of nascently-synthesised protein in the soluble fraction that is degraded within 1h of synthesis is not constant over time. This is consistent with current understanding of the regulated increase in activity of protein quality control mechanisms (including proteasome-mediated degradation) that are required to maintain protein homeostasis upon an increase in bulk translation (Gandin and Topisirovic, Translation, 2014).

      In contrast, the lysates probed in Fig 1F were extracted in denaturing urea/thiourea buffer and so cannot be explained by variation in protein solubility.

      Considering 1E, to explain this result entirely through solubility changes would require that puromycinylated polypeptides to become more soluble, at discrete phases of the circadian cycle, but only when the proteasome is inhibited. Whilst we cannot formerly exclude this possibility, we are not aware of evidence to support it, whereas there is prior evidence supporting circadian regulation of protein synthesis and proteasome activity.

      To communicate all of this more clearly we have made the following revisions to the text:

      Page 6: ".The experiment was performed over a 24h time series followed by soluble protein extraction using digitonin, which preferentially permeabilises the plasma membrane over organelle membrane."

      Page 6: " Importantly, the proportion of degraded protein varied over time, being highest at around the same time as increased protein synthesis (Fig 1B), indicating time-of-day variation in digitonin-soluble protein turnover which cannot be solely attributed to previously reported circadian regulation of protein solubility (Stangherlin et al, 2021b). Rather, it suggests that global rates of protein degradation may be co-ordinated with protein synthesis rates, and may vary over the circadian cycle."

      Fig 1a legend: "...with digitonin buffer"

      Fig 1e legend: "...in digitonin buffer"

      Fig1f legend: "... and extracted with urea/thiourea buffer"

      * 3) In figure 1c, is the noted oscillation in protease activity due to the oscillation of these proteins? What are the predicted mechanisms behind this? I don't think that this is necessarily within the scope of this paper but should be addressed in the discussion. Also, the peak degradation rate from Figure 1B is 4 hours before the peak enzyme activities. How can this observation be reconciled? *

      Besides this study, our two previous proteomic investigations of the fibroblast circadian proteome detected no biologically significant or consistent rhythm in proteasome subunit abundance (Wong et al., EMBO J, 2021; Hoyle et al., Science Translational Medicine, 2017). Moreover, proteasomes are long-lived stable complexes whose activity is determined by a combination of substrate-level, allosteric and post-translational regulatory mechanisms that includes their reversible sequestration into storage granules (Albert et al., PNAS, 2020; Fu et al., PNAS, 2021; Yasuda et al., Nature, 2020). It is therefore very likely that the observed rhythm in trypsin- and chymotrypsin-like activity occurs post-translationally. Proteasome subunit composition is also known to change, which might be another reason for differences between the protease activities (Marshall and Vierstra, Front Mol Biosci, 2019; Zheng et al., J Neurochem, 2012).

      Due to the nature of the experiment, the degradation rate inferred from Figure 1B does not reflect proteasome activity, exclusively. Rather it reflects the combined sum of processes that remove nascently produced proteins from the cell's digitonin-soluble fraction, which includes proteasomal degradation, but also autophagy, protein secretion and sequestration into other compartments. Therefore, the peak degradation in Fig 1B would not necessarily be expected to coincide with the peak of proteasome activity in Fig 1C. Figure 1A/B is intended as an exemplar for the investigation's rationale and was the first to be performed chronologically.

      To communicate this succinctly, we have revised the relevant text as follows:

      Page 7: "Previous proteomics studies under similar conditions have revealed minimal circadian variation in proteasome subunit abundance (Wong et al, 2022), suggesting that proteasome activity rhythmicity, and therefore rhythms in UPS-mediated protein degradation, are regulated post-translationally (Marshall & Vierstra, 2019; Hansen et al, 2021)"

      * 4) For the pSILAC analysis, the incorporation scheme has a six-hour window between the comparison of the light and heavy peptides. This makes it somewhat difficult to assess whether you are looking a clock effect from T1 or T1+6. This does not negate the findings, but it does question when the synthesis is occurring and what is being compared, which I think should be more clearly discussed in the manuscript. This is discussed later in the manuscript but should be mentioned in this section. *

      Thank you for this suggestion. To communicate this more clearly, we have rearranged the labels at the top of schematic graphs in figures 2b and 3b in order to clearly distinguish the pulse-labelling window from the time of sample collection. The following text has been added to the methods section:

      Page 9: "To enable sufficient heavy labelling for detection, a 6h time window was employed, thus measuring synthesis and abundance within each quarter of the circadian cycle "

      * 5) There are no error bars on figure 2C. What the pSILAC just done in a singlet? If so, the rhythms estimation is likely a large overestimate and should be noted. *

      This first pSILAC experiment was performed in singlet with respect to external time for the RAIN analysis, but is duplicate for the two-way ANOVA that is also reported, by treating each cycle as a separate replicate. In fact, the 6.2% of proteins that were significantly rhythmically abundant by RAIN actually agree well with two previous experiments we performed using mouse fibroblasts under identical conditions: the first with 3h resolution over 3 cycles in singlet (7% rhythmic), the second with 4 biological independent replicates over one cycle (8% rhythmic) (Wong et al., EMBO J, 2021). The curve fits shown in 2C are the standard damped sine wave fits, with p-values from RAIN reported in the figure legend.­­

      Most importantly however, and as noted in the text, the absolute % of rhythmically abundant proteins is rather irrelevant and indeed the absolute numbers of 'rhythmic' proteins can vary wildly, dependent on the analysis method and stringency. The only important point to be gleaned from the estimates shown in Figure 2e is that by either statistical test, most rhythmically abundant proteins are not rhythmically synthesised, and vice versa; however, the % of proteins that are both rhythmically synthesised and rhythmically abundant is 6 to 11--fold higher than would be expected by chance (taking proteins rhythmic by RAIN and ANOVA, respectively; in both cases the overlap between the two sets is highly significant) . This serves as a positive control, i.e., a minority of proteins show correlated rhythms of synthesis and abundance that are consistent with the canonical activity of 'clock-controlled genes' which cannot be explained by overestimation of rhythmicity.

      Odds Ratio comparison synthesis vs total

      Synthesis rhythmic by RAIN - listA size=148, e.g. A8Y5H7, B2RUR8, E9Q4N7

      Total rhythmic by RAIN - listB size=149, e.g. A1A5B6, A2A6T1, A2AI08

      Intersection size=34, e.g. A8Y5H7, O08795, O54910

      Union size=263, e.g. A8Y5H7, B2RUR8, E9Q4N7

      Genome size=2528

      Contingency Table:

      notA inA

      notB 2265 114

      inB 115 34

      Overlapping p-value=5.4e-13

      Odds ratio=5.9

      Overlap tested using Fisher's exact test (alternative=greater)

      Jaccard Index=0.1

      Synthesis rhythmic by ANOVA - listA size=66, e.g. A8Y5H7, O35639, O55143

      Total rhythmic by ANOVA - listB size=83, e.g. A8Y5H7, B2RQC6, E9Q6J5

      Intersection size=16, e.g. A8Y5H7, P22561-2, Q3TB82

      Union size=133, e.g. A8Y5H7, O35639, O55143

      Genome size=2528

      Contingency Table:

      notA inA

      notB 2395 50

      inB 67 16

      Overlapping p-value=9.7e-11

      Odds ratio=11.4

      Overlap tested using Fisher's exact test (alternative=greater)

      Jaccard Index=0.1

      Nevertheless, we agree with the reviewer's general point and have revised the text as follows:

      Page 9: "... and may be susceptible to overestimation of rhythmicity."

      Page 9: "Consistent with similar previous studies, Page 9: "The proportion of such proteins was more than expected by chance (pMethods, Page 21: "...(n=1 per timepoint)"

      * 6) Why were the genes selected in 2C? these are not discussed anywhere else in the manuscript.*

      These are simply illustrative examples so that the reader can better understand what we mean, i.e., two proteins in different phases and one that did not change, all within a similar range of abundance. The selected proteins were not discussed because we do not expect the reader to attach any specific meaning to them. We have revised the figure to include in 2C examples of each rhythmicity category shown in 2E. To make this clear, we now state the following:

      Figure 2 legend: "No specific meaning is inferred from the protein identities”.

      • 7) The authors note that for Figure 2 "These observations are consistent with widespread rhythmic regulation of protein degradation." However, only 5-10% of the proteome is oscillating at any level and less with a discrepancy between synthesis and abundance, so "widespread" is an exaggeration and this statement should be limited to the degradation in the rhythmic proteome. *

      We take the reviewer's point, but the term rhythmic proteome is also inaccurate since half the proteins with rhythmic degradation did not show an abundance rhythm in both mass spec experiments. We therefore revised this sentence as follows:

      Page 10: "These observations are consistent with widespread temporal organisation of protein degradation within the circadian-regulated proteome."

      * 8) The authors note that their more developed strategy in figure 3 would allow for the detection of less abundant proteins. However, they do not discuss that they in fact found less proteins overall, or if they were able to detect proteins of lower abundance. This is of some concern in determining if this is indeed the better method that they predict. How can the authors reconcile this issue? How can they rationalize this explains their increase in oscillating elements? *

      Thank you for raising this point, we did not explain ourselves sufficiently clearly. As stated in the revised text, once we had analysed the first iteration of pSILAC (Fig 2), we realised that detection of heavy-labelled proteins was "inevitably limited and biased the proteome coverage towards abundant proteins with higher synthesis rates". In other words, in order to be considered in our analysis both unlabelled and heavy-labelled peptides needed to be detected in every sample at every time point. In fact, if we do not consider heavy-labelling, the overall coverage in the Fig 3 experiment (6577 proteins) was better than the Figure 2 experiment (6264 proteins), as expected, due to technical improvements in the methods used (by the time of the experiment in Fig. 3, we were able to perform the analysis using mass spectrometry techniques with better fractionation and detection, namely FAIMS and MS3). When the analysis criteria are applied however, this falls to 2302 and 2528 proteins, respectively. Because of the way that mass spectrometry works, many proteins needed to be excluded from analysis because the heavy label wasn't detected in one or more samples. In these cases, we cannot infer that no heavy-labelled protein was present in that sample or even that it was present at lower levels than other samples - it simply wasn't detected and therefore we cannot make any quantitative comparisons. Non-detection of any given heavy peptide may occur for several reasons, the most likely being that it co-elutes from the chromatography column at the same time as other much more abundant (light) peptides and simply escapes detection. This is an unavoidable limitation of the technique, we hope the reviewer can understand our need to restrict the analysis to those proteins whose nascent synthesis, and total abundance in the MMC fraction, can be confidently quantified.

      As the experiments in Fig 2 and Fig 3 were performed independently, with separate TMT sets and different instrumentation, we are also unable to compare absolute abundances of the proteins between the two.

      To communicate this more clearly we have amended Figures 2e and 3e to state the total coverage in the legends, as well as clearly stating the coverage of heavy-labelled proteins in the figure itself. We have also added the following explanation to the text:

      Page 11:

      “Despite enriching for only one cellular compartment, the overall coverage in this experiment was similar to the previous one (6577 and 6264 proteins, respectively), due to the altered and more targeted approach; with heavy peptides detected for 2302 proteins."

      *9) In the comparison of complex turnover rates, the authors need to provide a metric that backs their statement that "the majority of component subunits not only showed similar average heavy to total protein ratios but also a similar change in synthesis over the daily cycle" for figure 3F. *

      Our apologies for this oversight, this is now presented in new Fig S3D.

      * 10) In reference to the AHA incorporation, why is the hypothesis not that, like the puramycin, you would not see oscillation unless you add BTZ? Shouldn't the active degradation regulate the incorporation of AHA such that there is no visible rhythm unless you suppress degradation? *

      AHA is a methionine analogue that is sparsely incorporated into polypeptide chains with minimal effect on protein function/structure (Dietrich et al., PNAS, 2006). Unlike puromycin, therefore, AHA does not lead to chain termination or protein misfolding/degradation (Dermit et al., Mol Biosyst, 2017) and so pulsed application at different phases of the circadian cycle is sufficient to reveal protein synthesis rhythms. The novelty in Fig 3H is the combination of AHA labelling with native PAGE that allows us to validate rhythmic production of high molecular weight protein complexes. This would not be possible with puromycin because prematurely-terminated polypeptide chains are not able to assemble into native complexes unless chain termination happens to occur at the extreme C-terminus and the C-terminus does not partake in any intermolecular interactions within the assembled complex.

      * 11) The authors claim that there is enrichment of the actin cytoskeleton, but where this data can be found should be explained. The only thing that is shown is a few selected graphs of proteins in this pathway. *

      We previously reported circadian regulation of the actin cytoskeleton in Hoyle et al. (Sci Trans Med, 2017). The extremely high relative amplitude of Beta-actin (the structural component of microfilaments) in the MMC fraction is, in and of itself, entirely sufficient to demonstrate a circadian rhythm in the relative ratio of globular to filamentous actin that was originally identified by Ueli Schibler's lab (Gerber et al., Cell, 2013) and then shown to have a cell-autonomous basis in fibroblasts in Hoyle et al (2017). We have included further examples of an actin-binding protein (Corinin1b) and a motor protein (Myosin 6) to further illustrate this, but do not feel further discussion is warranted because it was comprehensively addressed in our previous work. The enrichment for actin was determined by GO analysis, which is now shown in the Fig 4A and referred to in the text.

      The important point in Fig 4C is the difference in phase with the examples shown in Fig 4B and summarised in Figure 4A, i.e., there are a small number of proteins whose presence in the MMC fraction is highest in advance of the majority of rhythmically abundant proteins, but this earlier group doesn't show any significant synthesis rhythm. Actin is one of the most abundant cellular proteins, and by mass it accounts for 67% of the circadian variation of rhythmically abundant proteins that peak in this fraction at the same phase. All these data and analyses are available for scrutiny in Supplementary Table 2.

      To communicate this more clearly we have expanded on this point as follows:

      Page 13: " These proteins were enriched by 9-fold for actin and associated regulators of the actin cytoskeleton (q* 12) The authors note an oscillation in the total levels of p-eif2, commenting that these do not arise from the rhythms in total eif2a but temperature and feeding rhythms. However, unless I misunderstood, this work was done in fibroblast cell culture, so in this case, where would these temperature and feeding rhythms come from? *

      We were insufficiently clear. Daily rhythms of p-eIF2 have been observed under physiological conditions in mouse, in vivo. We do not observe similar rhythms in cultured fibroblasts under constant conditions unless the cells are challenged by stress. By inference therefore, it seems likely that daily rhythms of p-eIF2 in vivo arise from the interaction between cell-autonomous mechanisms and daily systemic cues such as, insulin/IGF-1 signalling and body temperature that are in turn driven by daily rhythms in CNS control, daily feed/fast rhythms and daily rest/activity rhythms, respectively. We have amended the text as follows:

      Page 15: "...and so suggest that daily p-eIF2α rhythms in mouse tissues likely arise through the interaction between cell-autonomous mechanisms and daily cycles of systemic cues, e.g., insulin/IGF-1 signalling and body temperature rhythms driven by daily feed/fast and rest/activity cycles, respectively."

      * 13) In Figure 5d, the treatment impeding degradation is causing cell death while the inhibition of translation does not. However, wouldn't too much, or not enough, translation, without compensatory regulation from degradation cause a problem in the same way that degradation does? *

      It is well-established that acute treatment with high concentrations of proteasomal inhibitors rapidly leads to proteotoxic stress that will trigger apoptosis unless resolved (Dantuma and Lindsten, Cardiovasc Res, 2010). Treatment with CHX is certainly stressful to cells, but in a different way, and cells die through mechanisms generally regarded to be necrotic and certainly do not involve the canonical proteotoxic stress responses that are activated by MG132 and similar drugs. Our findings show that, by whatever mechanisms cells die with CHX treatment, it does not change over the circadian cycle whereas death via proteotoxic stress does, consistent with our prediction. We hope the reviewer agrees it is beyond the scope of our study to explain why CHX-mediated cell death does not show a circadian rhythm in mouse fibroblasts.

      *Reviewer #1 (Significance (Required)):

      *The information that stems from this work is relevant and of interest to circadian clock field as how the regulation of the output of the circadian clock is implemented is still a major question in the field. This manuscript suggests a novel and plausible method for how, at least in part, this regulation occurs. However, the manuscript uses methods that do not measure degradation directly, which is a minor limitation. In addition, the mechanisms by which this regulation is imparted are not addressed in any meaningful way, even in the discussion.

      We are sorry that we did not adequately discuss the extensive previous work that has already addressed regulatory mechanisms. We would like to stress that this manuscript concerns protein turnover and proteome renewal, of which degradation is obviously an important part but not the sole focus.

      To communicate this more clearly, we have amended the title to:

      "Circadian regulation of macromolecular complex turnover and proteome renewal"

      ... which we previously explicitly predicted in the discussion of previous papers (Feeney et al., Nature, 2016; O'Neill et al., Nat Comms, 2020; Wong et al., EMBO J, 2022) and our recent review (Stangherlin et al., Curr Opin Syst Biol, 2021).

      With respect to measurement of degradation - Physiologically, cellular rates of proteasomal degradation are so intimately coupled with protein synthesis that, over circadian timescales, the former cannot meaningfully be studied in isolation. It is possible that the reviewer is alluding to historical methods that measure change over time in the presence of translational or proteasomal inhibitors, but these have long been known to introduce artifacts - because translational inhibition rapidly leads to reduced proteasome activity, whereas proteasomal inhibition rapidly reduces protein synthesis rates through the integrated stress response. We would be interested to hear of any more direct method for measuring protein degradation proteome-wide than the pulsed SILAC method we developed, as we are not aware of any. Even proteasomal proximity labelling coupled with MG132 treatment, recently developed by the Ori lab, does not directly measure degradation (bioarxiv https://www.biorxiv.org/content/10.1101/2022.08.09.503299v1). By definition, degradation can only be measured through the disappearance of something that was previously present, usually by comparing its rate of production with the change in steady state concentration (if any), which we have done using multiple methods.

      With respect to regulation of degradation - We speculated on the mechanisms regulating rhythms in protein turnover in our several previous papers (Feeney et al., Nature, 2016; O'Neill et al., Nat Comms, 2020; Wong et al., EMBO J, 2021; Stangherlin et al, Nat Comms, 2021), whereas outside the circadian field these mechanisms have been addressed extensively. This was also discussed in detail in our recent review on the topic (see Stangherlin et al., COISB, 2021). In this review, we lay out the evidence for a model whereby most aspects of circadian cellular physiology might be explained by daily rhythms in the activity of mammalian target-of-rapamycin complexes (mTORC). This model makes multiple predictions and informs the central hypothesis which is tested in the current manuscript: that circadian rhythms in complex turnover and proteome renewal should be prevalent over abundance rhythms. An enormous body of work over the last two decades has already clearly established mTORC1 as the master regulator of bulk protein synthesis and degradation, and a substantial number of independent observations have demonstrated circadian regulation of mTORC1 activity in vivo and in cultured cells. The mechanisms that drive cell-autonomous mTORC1 signalling are only partially understood (e.g. Feeney et al., Nature, 2016; Wu et al., Cell Metab, 2019), and we continue to explore this experimentally but they certainly lie well beyond the scope of this investigation.

      Therefore, to address the reviewer's concern about inadequate discussion of mechanism, we have expanded on mTORC in the introduction and discussion, as follows:

      Page 3: "Daily rhythms of PERIOD and mTORC activity facilitate daily rhythms of gene expression and protein synthesis. In particular, mTORC1 is a master regulator of bulk 5'-cap-dependent protein synthesis, degradation and ribosome biogenesis (Valvezan & Manning, 2019) whose activity is circadian-regulated in tissues and in cultured cells (Ramanathan et al, 2018; Feeney et al, 2016a; Stangherlin et al, 2021b; Mauvoisin et al, 2014; Jouffe et al, 2013; Sinturel et al, 2017; Cao, 2018). It is plausible that daily rhythms of mTORC activity underlie many aspects of daily physiology (Crosby et al, 2019; Stangherlin et al, 2021a; Beale et al, 2023b)."

      Page 17: "The mechanistic underpinnings for cell-autonomous circadian regulation of the translation and degradation machineries remain to be fully explored, but are likely to be driven by daily rhythms in the activity of mTORC: a key regulator of protein synthesis and degradation as well as macromolecular crowding and sequestration (Stangherlin et al, 2021b, 2021a; Cao, 2018; Adegoke et al, 2019; Ben-Sahra & Manning, 2017; Delarue et al, 2018). In particular, global protein synthesis rates are greatest when mTORC1 activity is highest, in tissues and cultured cells, whereas pharmacological treatments that inhibit mTORC1 activity reduce daily variation in crowding and protein synthesis rates (Feeney et al, 2016a; Lipton et al, 2015; Stangherlin et al, 2021b). Given our focus on proteomic flux and translation-associated protein quality control, autophagy was not directly within the scope of this study but is also mTORC-regulated and subject to daily regulation (Ma et al, 2011; Ryzhikov et al, 2019). In vivo, daily regulation of mTORC activity arises primarily through growth factor signalling associated with daily feed/fast cycles (Crosby et al, 2019; Byles et al, 2021). The mechanisms facilitating cell-autonomous circadian mTORC activity rhythms are incompletely understood but may include Mg.ATP availability (Feeney et al, 2016a) and its direct regulation by PERIOD2 (Wu et al, 2019). This will be an important area for future work."

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This is a very interesting and well written paper that addresses key questions in the circadian organization of proteostasis. The paper investigates origins of cellular circadian rhythms, invoking a premise early that there is a poor correlation between rhythmic gene expression - regulated by the canonical TTFL - and rhythms of the proteome, which are rather meager. Specifically, they ask how a relatively stable proteome is possible if cells engage in rhythms of cellular protein synthesis? Their hypothesis is that protein degradation must rhythmically compensate for rhythms of synthesis and much of the manuscript is focused on defining the relationship between rhythmic global synthesis and rhythmic degradation. They employ a series of detailed proteomic investigations and biochemical assessments of protein synthesis coupled with various circadian reporters to assess proteosome function. The proteomic experiments reveal a limited number of proteins with oscillations in either synthesis or abundance or both and no discernible pathway organization however, a followup and more refined study that utilized fractionated samples and boosted heavy SILAC identified strikingly, that many proteins in relatively heavy fractions are rhythmic and that these fall into possible complexes including ribosome and chaperonins. Finally, they perform in vivo experiments testing whether the timing of proteotoxic stimuli regulates the degree of the integrated stress response measured as pEif2a. Overall, I think that this is a fascinating paper that addresses and important question but falls short on mechanistically unifying them and completely contextualizing the findings in light of the canonical modes of circadian timekeeping leaving us with an important, but mostly descriptive set of findings. In addition, there are a number of important questions about data interpretation, some issues with data quality that should be addressed outlined below. With revision and further explication, this study will be an excellent addition to the growing field of circadian organization of the cellular proteome. *

      Thank you for reading and appreciating our work

      *Major and minor Comments. Figure 1. Fig 1a. The difference in Pulse and Chase at ZT24 does not appear to reflect the quantified data in 1b. This should be reconciled to make the figure convincing. *

      When working with radioactive cell lysates it is not possible to equalise the level of protein loaded on each gel beforehand as would happen with a western blot, for example. For this reason, the radioactive signal was normalised to the protein level subsequently measured by coomassie staining, as is standard practise for this type of assay, with all 4 replicates being shown in supplementary Fig.1A. An overnight phosphor screen image is presented in the main Fig.1A for illustrative purposes, but we take the point that this might not be immediately obvious. In revised Fig 1A we therefore now also show the relevant coomassie as well as labelling to make clear that the radioactive signal was normalised to protein levels.

      * How was the timing of the chase collection determined? *

      For these proof-of-principle experiments, we empirically determined the minimum duration of pulse and chase necessary to detect a quantifiable signal.

      *Fig 1d-e. What is the evidence that puro labeling results in 'rapid' turnover. *

      Apologies, this has been established for some time. Some additional papers are now cited in this section of the text (Liu et al, PNAS, 2012; Lacsina et al., PLoS One, 2011; Szeto et al., Autophagy, 2006)

      *Fig 1e seems to be missing the data from the treated and untreated conditions? How are the lines produced (e.g. linear versus rhythmic? Are these drawn lines or actual regressions?). *

      Fig 1e depicts the result of the experiment schematically explained in 1d. The only conditions were +Puro or +Puro+BTZ. There was no completely untreated condition, as puromycin incorporation is the basis of the assay (Lacsina et al., PLoS One, 2012; Szeto et al., Autophagy, 2006) and puromycin does not occur naturally in cells. We realise the figure could potentially be confusing without the associated raw data (anti-puromycin blots) - these are shown in supplementary Fig. 2A.

      To explain the method more clearly, the following has been added to the results section where this experiment is described:

      " As determined by anti-puromycin western blots, over two days under constant conditions, puromycin incorporation in the presence of BTZ showed significant circadian variation. In contrast, cells that were treated with puromycin alone showed no such variation, and nor did total cellular protein levels (Fig 1E, Fig S2A).”

      The fit lines are produced by statistical comparison of fits, i.e., our hypothesis (damped cosine fit) vs null hypothesis (no or constant change over time, linear fit, y = mx+c), using sum-of-squares F test. The statistically preferred fit is plotted and p-value displayed on the graph, i.e., the regression line of the preferred fit and parameters are plotted. These details are reported in the figure legends.

      * Why was 30 minutes chosen as labeling time? It seems hard to understand here how protein degradation kinetics can be measured by puromycin labeling if the authors' claim that puromycin labeling potentially changes degradation rates as a function - primary or secondary - of the labeling itself. It seems they are measuring the potential to degrade proteins. *

      Puromycin labelling is a 20 year-old widely-used technique that can be employed in a range of applications. It was first used in a circadian context by Lipton et al (Cell, 2015) whose work we quickly followed (Feeney et al, Nature, 2016). Briefly, puromycin mimics tyrosyl-tRNA to block translation by labelling and releasing elongating polypeptide chains from translating ribosomes. When used at low concentrations (1 ug/mL in this case) puromycin is sparsely and sporadically incorporated into a small minority of elongating polypeptide chains. Those prematurely terminated chains have puromycin at the C-terminus, which can be detected by western blotting. We chose 30 minutes after optimisation experiments, as it was the shortest incubation time where a robust signal could be observed in these cells with this concentration of puromycin. The puromycinylated peptides are preferentially degraded by the ubiquitin-proteasome system because they are efficiently recognised as misfolded/aberrant proteins by chaperones within tens of minutes of being translated. Unless used at much higher concentrations, or over much longer timescales, there is no reason to believe that puromycin affects the degradation machinery itself, but the degradation of puromycinylated peptides depends on the proteasome. Therefore, puromycin+a proteasome inhibitor provides a reliable proxy for translation rate in the preceding 30 minutes, whereas puromycin alone tells us the steady state concentration under normal conditions, i.e., where proteasomes remain active. By subtracting the latter from the former we can infer the level of degradation of puromycinylated peptides that must have occurred in the previous 30 minutes. It is not a perfect technique, but its results agree with other findings in this manuscript: that protein turnover varies more than steady state protein abundance. With respect to the potential to degrade proteins, this is measured in Fig 1C.

      * How do they determine that they are measuring degradation of functionally relevant proteins as opposed to a host of premature truncations? *

      We do not. This is measured by stable isotope labelling in Figures 2-4. Figure 1 provides the rationale for what follows in subsequent figures, i.e., proof-principle experiments suggesting that turnover is not constant over the circadian cycle. No single experiment in Figure 1 is expected to convince the reader that of circadian turnover. Rather, several independent methods suggest that bulk protein synthesis and degradation (turnover) are not constant over time, and deviate from the null hypothesis with variation that appears to change over the 24h circadian cycle.

      * Fig 1e bottom - again is this a true regression line? *

      It is not a regression line, otherwise a p-value of fit would be shown. Fig1e bottom shows the bioluminescence measured at each timepoint from parallel control cultures (average of triplicates, error bars shown as dotted lines). Due to very high temporal resolution (every 30 min) and robustness of the cell line, it appears as a virtually perfect damped (co)sine wave. We apologise that this was not explained more clearly in the figure legend, now amended as follows:

      "Parallel PER2::LUC bioluminescence recording from replicate cell cultures (mean +/- SEM, every 30 min) is shown below, acting as phase marker."

      *Perhaps two time points should be examined here - similar to the pulse chase performed with 35S labeling? *

      We are sorry we were not fully clear with our method here. The puromycin (+/- BTZ) labelling was performed over two days every 4h (so 12 timepoints in total), which can be inferred from the data points in the top two graphs in Fig. 1E, and x-axis - but is now also clearly stated in the figure legend. The bottom right graph was a continuous bioluminescence recording, integrated every 30 min from the set of parallel culture dishes. The bioluminescence data serves as a circadian phase marker, so that we can infer at which biological times synthesis and inferred turnover was higher vs lower.

      We’ve adjusted the text to explain our method more clearly:

      “Acute (30 min) puromycin treatment of cells in culture, with or without proteasomal inhibition (by bortezomib, BTZ), allowed us to measure both total nascent polypeptide production (+BTZ) and the amount of nascent polypeptides remaining when the UPS remained active (-BTZ). This allowed inference of the level of UPS-mediated degradation of puromycylated peptides within each time window, as a proxy for nascent protein turnover (Fig. 1D).”

      * Fig 1f. It appears that Puro labeling results in a rhythm between ZT1 and ZT13 but no statistic is provided and appears that the 'ns' is the results of variance in the data as opposed to difference in means? - would this not contradict the cellular result? What accounts for the rhythm reversal in the presence/absence of BTZ. *

      To be clear, we measured the level of puromycin incorporation in mouse liver in vivo following a similar method employed by Lipton et al, Cell, 2015 (Figure 2). The prediction was that, exactly as in cells (Fig 1E), treatment with a proteasome inhibitor would lead to a much greater increase in puromycinylated peptides at ZT13 than ZT1, because this is when protein synthesis is known to be higher and thus (we predict) protein degradation should also be higher. The experiment was not designed or powered to detect a time effect, it was designed to detect an interaction between time-of-puromycin treatment and BTZ, with the specific prediction being that BTZ would have a greater effect during the active phase. This is what we observed.

      * While the authors have previously demonstrated an increase in rhythmicity of the proteome in Cry1/Cry2 double knockout cells, it would have been welcome here to test a global loss of circadian transcription in the degradation assay. One might expect that these rhythms would also be even higher. What I am really asking is: what is the mechanism for rhythmic degradation and is it dependent on the canonical clock? *

      To address the reviewer's curiosity, we used the proteasome-Glo assay (also used in Fig 1C) to assess whether there was an interaction between genotype (WT vs CKO) and time at opposite phases of the circadian cycle over 2 days. We found a significant interaction by two-way ANOVA, indicating that components of the 'canonical clock' regulate the temporal organisation of proteasomal activity (see revised Figure S1). Circadian regulation of mammalian cellular functions, such as protein turnover, is a complex and dynamic process, whereas gene deletion affects the steady state and may be epistatic to phenotype rather than revealing gene function. We are therefore reluctant to speculate what this result means in the present manuscript, which is focused entirely on testing the hypothesis that global protein turnover and complex biogenesis have cell-intrinsic circadian rhythms in non-stressed, wild type cells.

      To communicate this, the text has been revised as follows:

      "Moreover, we detected a significant interaction between genotype and biological time when comparing trypsin-lik proteasome activity between wild type and Cryptochrome1/2-deficient cells, that lack canonical circadian transcriptional feedback repression (Fig S1B-E). "

      * **Fig 2. How was the 'fixed window' timeframe determined? *

      A trial experiment was performed with labelling windows of various length, and 6h was determined to be the shortest window where enough heavy label incorporation was detected to be able to assess circadian changes. This was the case with our first methodology, which was subsequently improved (Figure 3), and therefore labelling window reduced to 1.5h.

      * *Fig 3h. While admittedly difficult, the native PAGE is not of great quality and kind of unconvincing. Also not really sure why the AHA labeling is used here an nowhere else in the paper.

      AHA is a methionine analogue that is sparsely incorporated into polypeptide chains with minimal effect on protein function/structure (Dietrich et al., PNAS, 2006). Unlike puromycin, therefore, AHA does not lead to chain termination or protein misfolding/degradation (Dermit et al., Mol Biosyst, 2017). In Figure 1, the aim was to validate previous reports of rhythmic protein synthesis assess whether there was any evidence for rhythmic turnover. To this end, we employed two independent methods (35S-labelling and puromycin-incorporation). We did not want to rely on AHA for measuring turnover: although it has been validated and used for this purpose in some studies (McShane et al., Cell, 2016), AHA is not fully equivalent to methionine, and cellular aminoacyl-tRNA synthetases have much higher affinity to methionine than they do to AHA (Ma and Yates, Expert Rev Proteomics, 2018). It is thus impossible to perform AHA labelling without methionine-free medium, and in turn methionine starvation and media changes are known to have an effect on cell signalling and cell metabolism, which would be particularly pronounced in circadian context (over days rather than over hours).

      By contrast, in Fig 3H, we use AHA with native PAGE to specifically validate one inference from the mass spectrometry analyses: circadian production of high molecular weight protein complexes. This would not be possible with puromycin because prematurely terminated polypeptide chains are not able to assemble into native complexes unless chain termination happens to occur at the extreme C-terminus and the C-terminus does not partake in any intermolecular interactions within the assembled complex.

      The raw data (full gels, all replicates) are presented in Figure S2e, which of course was used for quantification. We have now picked a different example for the main figure, which hopefully allows for clearer representation.

      The text in the results section describing the AHA experiment is now amended as follows:

      " To validate these observations by an orthogonal method, we pulse-labelled cells with methionine analogue L-azidohomoalanine (Dieterich et al, 2006). AHA is an exogenous substrate, that cells have lower affinity to than methionine, and it could potentially impact on stability of the labelled proteins (Ma & Yates, 2018) – therefore, we only used AHA to assess nascent complex synthesis, rather than turnover. We analysed the incorporation of the newly synthesised, AHA labelled proteins into highest molecular weight protein species detected under native-PAGE conditions (Fig 3H, S3F). We observed a high amplitude daily rhythm of AHA labelling, indicating the rhythmic translation and assembly of nascent protein complexes. Taken together, these results show that daily rhythms in synthesis and degradation may be particularly pertinent for subunits of macromolecular protein complexes"

      Fig 4. I was a little disappointed here that the authors did not directly assess macromolecular assembly of at least one of their "hits" and demonstrate functional relevance and most of the analysis is maintained at a very superficial, systemic level. STRING assemblies are not terribly helpful without clear k-means clustering or some other clearly visualizable metric for stratifying and organizing the putative PPI data - this figure (S3) could be markedly improved.

      We agree that validation is important. The ribosome is by far the most abundant macromolecular complex in the cell, and was one of the major complexes to show clear evidence for circadian regulation of turnover, but not abundance, by our pSILAC proteomics. To validate this result, we took advantage of two important observations: (1) that all fully assembled ribosomes incorporate ribosomal RNA (rRNA) which can readily be separated from other cellular RNA by density gradient centrifugation; (2) pulse-labelling with heavy uridine-15N2 allows nascent RNA to be distinguished from pre-existing RNA. Thus, combining stable isotope labelling with ribosome purification, we can distinguish nascently assembled ribosomes from total when the RNA is extracted, digested with RNAse, and the % heavy/total UMP quantified by mass spectrometry. These data are presented in new figure 5, and are consistent with findings in Figures 3/4 that circadian regulation of ribosome turnover is prevalent over abundance, and that the phase of highest ribosome turnover coincides with the phases of high translation and turnover overall. We hope by addressing the reviewer's question by an entirely orthogonal method, they can share more confidence in our conclusions.

      The statistical metric for STRING, specifically the p-value for enrichment in physical protein-protein interactions, is presented in the main Fig. 3G. It is now also reported in the legend for new Figure S4 itself.

      * Is it possible that some macromolecular complexes have rhythms because their constituent proteins have differential half-lives when in one complex compared with another in circadian time? This possibility was not discussed. *

      To our knowledge, there is no evidence that any major macromolecular complex in the cell has a functionally significant rhythm in abundance on a cell-autonomous basis. The reviewer’s suggestion is an intriguing possibility, but we can think of no way that it could be measured, even in principle. The simplest interpretation of our data from the independent techniques we employ (pSILAC with fractionation, native PAGE + AHA incorporation) is a rhythm in synthesis.

      *Fig. 5. Why is the first histogram in 3c not at unity? *

      This measures the average fold-induction in aggregation when cells are treated with MG132 for 4h at the indicated timepoints. Unity would indicate no induction at all, so the presented quantifications show that MG132 always elicited an increase in aggregation, with an effect size that varied with circadian phase.

      * Do ZT24 and ZT48 differ, similarly do ZT36 and ZT60?*

      No, neither difference is statistically significant (adjusted p-values of p=0.9 and p=0.07, respectively). This is now specified in the figure legend. Tendency to aggregate is also likely to change as a function of time in culture, which is why we think there is a slight increase overall in the second day of the experiment.

      * Fig S4f is not of good quality with missing eIF2a total and therefore no loading controls. *

      Thank you for prompting us to double-check this. We found that the levels of eIF2a were quite variable between the animals, and therefore we performed this experiment with 6 biological replicates. We have double-checked the quantification, and have now excluded 3 unreliable samples (the ones with undetectable levels of total eIF2a – ZT18 +BTZ replicate 1 & ZT18 -BTZ replicate 2, as well as ZT6 +BTZ replicate 4, where a smear does not allow for a reliable quantification of phospho-eIF2a) instead of 2 that were excluded originally. This still leaves at least 5 biological replicates in each group. In fact, the difference between BTZ and control in ZT6 is now deemed to be even more significant, going down to adjusted p=0.0007.

      *S4e? true regression lines? *

      The same method was used as in Figure 1. The fit lines are produced by statistical comparison of fits, i.e. our hypothesis (damped cosine fit) vs null hypothesis (no change over time, linear fit), using sum-of-squares F test. The statistically preferred fit is plotted and p-value displayed on the graph. These details are reported in the figure legends and methods section.

      While I thought these experiments were effective, they did not tie back well to the rest of the paper. What are the consequences of a temporally sensitive ISR? Which pathways does it effect in circadian time? Here, the main holes in this study are somewhat exposed; namely, a lack of mechanistic depth in explaining the very fascinating, albeit mostly descriptive, findings. The implicit assumption made here is that aggregation is 'bad' but could the opposite be just as true? Taking these considerations in account would further strengthen the discussion.

      The purpose of (former) Fig 5 was entirely to test the functional consequences and potential translational relevance of a daily rhythm in protein turnover. The mechanisms upstream and downstream of the ISR, and link with many diseases, are already quite well understood but we apologise that we did not draw more heavily on the prior literature to provide sufficient context for this experiment. Protein aggregation has long been associated with proteotoxic stress, and we do not assume it is good or bad, we simply use it as an additional validation of a temporally sensitive ISR. To correct this omission we have added the following to the results section before these experiments are introduced:

      "Disruption of proteostasis and sensitivity to proteotoxic stress are strongly linked with a wide range of diseases (Wolff et al, 2014; Harper & Bennett, 2016; Labbadia & Morimoto, 2015; Hipp et al, 2019). Evidently, global protein translation, degradation and complex assembly are crucial processes for cellular proteostasis in general, so cyclic variation in these processes would be expected to have (patho)physiological consequences....

      ...Informed by our observations, we predicted that circadian rhythms of global protein turnover would have functional consequences for maintenance of proteostasis. Specifically, we expected that cells would be differentially sensitive to perturbation of proteostasis induced by proteasomal inhibition using small molecules such as MG132 and BTZ, depending on time-of-day."

      Reviewer #2 (Significance (Required)):

      This is a fascinating paper that addresses key questions in the circadian organization of the proteome. The paper's main findings are that rhythms of protein synthesis and degradation are temporally coordinated to maintain overall stability of the proteome in mouse fibroblasts. Furthermore, the authors present evidence that this temporal organization may be important for assembly of macromolecular complexes. While very interesting, the main limitations are a lack of biochemical and mechanistic explanation and evidence that verifies these, mostly descriptive, findings.

      The fundamental biochemical mechanisms of protein synthesis, degradation, protein quality control and stress response have been studied for decades and are increasingly well understood, at least in cultured cancer cells. What is not understood is the extent to which all of these essential cellular systems are subject to physiological variation over the circadian cycle in quiescent cells. This is the fundamental knowledge gap our study attempts to fill by testing the discrete hypotheses that (1) circadian regulation of macromolecular complex turnover is more prevalent than abundance and that (2) proteome renewal is more prevalent than compositional variation. We suggest that establishing these essential principles of circadian cellular physiology is an essential prerequisite for performing the type perturbational experiments we presume the reviewer would prefer. We would like to reassure the reviewer that such studies have been and are being performed, but we are concerned that the inclusion of a very extensive additional body of work within this manuscript would detract from the clear communication of our major finding that complex turnover and proteome renewal has a cell-autonomous basis.

      *There are some relatively minor statistical and data quality issues that are probably addressable relatively quickly.

      **Upon revision the study would be a welcome addition to investigators interested in proteostasis, circadian biology, cell biology and proteomics.

      **I am a physician-scientist with expertise in circadian rhythms, cell biology, protein synthesis, and biochemistry.

      **Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Seinkmane et al investigate circadian regulation of protein synthesis and degradation in cultured cells and in mice. Their main new finding is that protein synthesis and degradation are in many cases rhythmic but coordinated such that the proteome is rhythmically renewed without an apparent rhythm in total protein abundance. Particularly the pool of large protein complexes is rhythmically renewed in this fashion.

      Using pulsed SILAC in combination with mass spectrometry, the authors are able to distinguish between total and newly synthesized protein levels in mouse lung fibroblasts. Analysis of these data shows that the synthesis of a large number of proteins is rhythmic although the total amount is constant, or that proteins are synthesized at a constant rate but the total amount is rhythmic, suggesting that degradation is rhythmic. By analyzing macromolecular complexes, defined as a high-speed pellet, they also present evidence that the rhythmic components of large complexes oscillate in the same phase and have a similar protein turnover rate. The authors conclude that complexes assemble rhythmically. **The authors also present evidence that the activity of the proteasome oscillates in a circadian manner. Based on this observation, they show (in fibroblasts and in mice) that the response to proteotoxic stress (monitored by eIF2alpha phosphorylation levels, protein aggregation, and apoptosis) is higher at circadian times of high proteasome activity.

      **I am an expert in the circadian field, and the hypothesis and concept behind the work presented here are potentially very interesting, and the experimental design is in principle suitable to answer these questions. However, after reading the paper several times, I cannot find the set of experiments that would convincingly support the authors' conclusions.

      **Major questions/points:

      *The major limitation of the manuscript is that the conclusions rely heavily on statistical analysis and massive processing of data from a bewilderingly large number of very different experiments. In looking at the figures, I have often wondered if the presence or absence of a rhythm is real or a product of the heavily processed data. The fact that a cosine wave fits through data points better than a straight line does not necessarily mean that a circadian rhythm is present.

      We agree that comparison of fits alone does not provide sufficiently reliable evidence. However, the fact that many independent methods (cosinor, RAIN, ANOVA) yield similar overall findings lends more confidence to our findings. We would also argue that the large number of different experiments is a positive aspect of the paper and lends weight to the general conclusions. We instead ask the reviewer to consider an alternative question - we and many other labs have found no evidence for any change in total cellular protein content, and yet there is extensive evidence from independent labs for a 'translational rush hour' whilst (excepting some low abundance transcription factors) very few cellular proteins change by more than 10% over the circadian cycle (see Stangherlin et al, COISB, 2022 for extended discussion of this). We hypothesised a parsimonious explanation for this clear contradiction, and designed experiments whose data were analysed by widely used methods that yielded results that were consistent with prediction. Perhaps the reviewer will at least concede that, if the presented findings do not refute the hypothesis, it should not be rejected until a superior one is proposed?

      * I think that in particular, the SILAC experiment(s) should be repeated and also performed with an arrhythmic control (such as CRY1/2 KO). *

      Whilst we agree that CRY1/2 KO cells show no circadian regulation of transcription and much more variable rhythms in PER2::LUC activity than wild type controls (Putker et al., EMBO J, 2021), in our hands circadian rhythms in proteome composition and protein phosphorylation in CRY1/2 KO are at least as prevalent as in wild type cells (see Wong et al., EMBO J, 2022). Indeed, when we performed a proteasome activity assay in CRY1/2 KO fibroblasts, we observed there was an apparent circadian variation, similar to WT but with a different phase. These data are now presented in revised Figure S1. Similarly, Lipton et al (Cell, 2015) showed circadian translational rhythms in cultured Bmal1 KO cells (see final figure), therefore it is not clear what would constitute an appropriate 'arrhythmic' control.

      In this study, for proteomics experiments, we used a combination of SILAC and TMT, as each technique alone would not be sufficient to answer our specific questions. These two techniques are very resource-intensive on their own, and even more so in combination. We therefore had to prioritise and for the second SILAC-TMT experiment decided to focus on cellular fractionation and questions pertaining macromolecular complexes, which were directly relevant to our hypothesis. While it would undoubtedly also be interesting to study how canonical clock genes, such as Cry1/2, impact turnover on a proteome-wide scale, the focus of our study is physiological regulation of proteome composition, rather than the function of Cryptochrome genes which we already explored in previous work (Putker et al., EMBO J, 2021; Wong et al., EMBO J, 2022).

      Comparability between the whole cell and MMC SILAC experiments is also limited due to the different experimental conditions (6h vs. 1.5h pulse, +booster).

      We do not make any direct comparisons, other than to report that broadly comparable numbers of proteins were detected. Implicitly this means there must be greater coverage of protein complexes in the second pSILAC experiment, which our data bears out. If we were not to report the first experiment, the reader would not understand why we refined the method used in the second. In reporting the results of the 6h pulse, we make the limitations of this experiment very clear i.e. biased towards highly abundant, highly turnover proteins, irrespective of cellular compartment. We should add that even in this experiment there was a clear trend towards rhythmic turnover of ribosomal proteins, but this did not quite achieve significance (p = 0.07) and so we did not want to make claims beyond the data.

      *The essential and new message of the paper is that (at least some) macromolecular complexes undergo circadian renewal (degradation and synthesis). Rather than just analysing an operationally defined pellet fraction by mass spectrometry, this could be shown in more detail and directly for one or two specific macromolecular complexes. Ribosomes, for example, seem particularly suitable, because there would also be the very simple approach of measuring the synthesis of ribosomal RNA by pulse labelling. To me, such an analysis would be perfectly sufficient as a proof of principle. I would then omit aspects such as rhythmic stress response, since many additional experiments are needed to demonstrate this convincingly. *

      Thank you for the excellent suggestion, we agree that validation is important. The ribosome is by far the most abundant macromolecular complex in the cell and was one of the major complexes to show clear evidence for circadian regulation of turnover, but not abundance, by our pSILAC proteomics. To validate this result, we took advantage of two important observations: (1) that all fully assembled ribosomes incorporate ribosomal RNA (rRNA) which can readily be separated from other cellular RNA by density gradient centrifugation; (2) pulse-labelling with heavy uridine-15N2 allows nascent RNA to be distinguished from pre-existing RNA. Thus, combining stable isotope labelling with ribosome purification, we can distinguish nascently assembled ribosomes from total ribosomes when the RNA is extracted, digested with RNAse, and the ratio of light to heavy UMP quantified by mass spectrometry. These data are presented in new figure 5, and are consistent with findings in Figures 3/4 that circadian regulation of ribosome turnover is prevalent over abundance, and that the phase of highest ribosome turnover coincides with the phases of high translation and turnover overall. We hope by addressing the reviewer's question by an entirely orthogonal method, he/she can share more confidence in our conclusions.

      The final figure is included because it tests predictions that were informed by the preceding experiments. It is not intended to be comprehensive exploration of how the integrated stress response changes with the circadian cycle, nor have we claimed this.

      * Specific points: The reader is strongly influenced by the cosine wave or straight lines in the graphs (e.g. 1c, e, 3h, 5b, etc) produced by the analysis of rhythmicity, which basically only gives a yes or no answer. But it is not really that simple. If the algorithm detects a rhythm what is its period? Is it the same as the period of the luciferase reporter? If the period lengths correlate, do the phases as well (e.g. see differences in phases 1c and e)? These questions are not addressed. *

      The temporal resolution of the time course data is much lower than the luciferase reporter and so the error of the fit is greater (usually 1-2h). For the cosine wave curve fit and the associated extra sum-of-squares F test, the period of the oscillation was fixed at either 24h or 25h, as determined from a parallel PER2::LUC control recording. This is now explicitly stated in the methods section

      In terms of phase, the general trend across all experiments is that bulk protein turnover, synthesis and degradation is higher during the 6-8h following the peak of PER2::LUC than at any other point in the circadian cycle. This is also consistent with our previous findings in mouse and human cells (Feeney et al, Nature, 2016; Stangherlin et al., Nat Comms, 2021) as well as findings from many different labs in vivo (e.g. Janich et al., Genome Res, 2016; Atger et al., 2015, PNAS; Sinturel et al., 2017, Cell). We are cautious about trying to be any more specific than this because each assay is measuring something different, and (as can be seen across the figures) there is also some modest variation in the phase of PER2::LUC between experiments, with respect the prior entraining temperature cycle (this will be reported in our forthcoming publication, Rzechorzek et al, in prep). To address the reviewer's point therefore, we have added the following to the discussion:

      "Across all experiments in this study, we find that protein synthesis, degradation and turnover is highest during the 6-8h that follow maximal production of the clock protein PER2. This is coincident with increased glycolytic flux and respiration (Putker et al, 2018), increased macromolecular crowding in the cytoplasm, decreased intracellular K+ concentration and increased mTORC activity (Feeney et al, 2016a; Stangherlin et al, 2021b; Wong et al, 2022)."

      * **The algorithm in Fig 1c predicts a rhythm for the chymotrypsin-like and the trypsin-like but not for the caspase-like activity. The peptide assay measures core proteasome activity independent of ubiquitylation and should therefore be dependent on proteasome concentration in the sample. How can then only two of the three proteasomal activities be rhythmic? Please elaborate and repeat with arrhythmic cells (e.g. CRY1/2 KO). The period length does not seem to correlate with the one of the reporter. Why is that? *

      The arrhythmic controls idea is partially addressed in the response above. We did perform a proteasome activity assay in CRY1/2 KO fibroblasts, and observed daily variation similar to WT, albeit with a different apparent phase. These data are now shown in Figure S1, and referred to in the main text as follows:

      "Moreover, we detected a significant interaction between genotype and biological time when comparing trypsin-like proteasome activity between wild type and Cryptochrome1/2-deficient cells, that lack canonical circadian transcriptional feedback repression (Fig S1B-E)".

      Besides this study, our two previous proteomic investigations of the fibroblast circadian proteome detected no biologically significant or consistent rhythm in proteasome subunit abundance (Wong et al., EMBO J, 2021; Hoyle et al., Science Translational Medicine, 2017). Moreover, proteasomes are long-lived stable complexes whose activity is determined by a combination of substrate-level, allosteric and post-translational regulatory mechanisms that includes their reversible sequestration into storage granules (Albert et al., PNAS, 2020; , Fu et al., PNAS, 2021; Yasuda et al., Nature, 2020). It is therefore very likely that the observed rhythm in trypsin- and chymotrypsin-like activity occurs post-translationally. Proteasome subunit composition is also known to change, which might be another reason for differences between the protease activities (Marshall and Vierstra, Front Mol Biosci, 2019; Zheng et al., J Neurochem, 2012).

      To communicate this succinctly, we have revised the relevant text as follows:

      Page 7: "Moreover, we detected a significant interaction between genotype and biological time when comparing trypsin-like proteasome activity between wild type and Cryptochrome1/2-deficient cells, that lack canonical circadian transcriptional feedback repression (Fig S1B, (Wong et al, 2022)). Previous proteomics studies under similar conditions have revealed minimal circadian variation in proteasome subunit abundance (Wong et al, 2022), suggesting that proteasome activity rhythmicity, and therefore rhythms in UPS-mediated protein degradation, are regulated post-translationally (Marshall & Vierstra, 2019; Hansen et al, 2021)."

      Regarding period length, we apologise for an oversight in Fig 1c: unlike all other experiments presented here, these fits were originally done with a flexible period length (between 20h and 36h). This has now been re-fitted in a similar manner to the other experiments (fixed period of 24h, same as the parallel PER2::LUC controls), and the updated data are presented. This has not influenced the results of the statistical tests (only changed the p-values slightly, but the significance levels remain the same).

      Fig. 1a,b suggest that there is a rhythm in global protein synthesis with a significant peak at 40h. Yet, Fig. 1e suggests otherwise. How can that be? Also, the degradation graph (lower panel 1c) has to be plotted with the ratios calculated from the data points and not the heavily processed fitted graphs. This can be very misleading.

      Fig1a,b was performed under quite different conditions to 1e. As described in the methods section, 35S-labelling experiments require a medium change during both pulse and chase (to replace normal Met with radioactive Met, and vice versa). To avoid growth factor/mTORC1-mediated stimulation of protein synthesis & turnover, these acute media changes must occur in the absence of serum; otherwise media changes would introduce artifacts. In contrast, puromycin labelling (Fig 1e) is performed without any media changes (as puromycin can be added directly to culture cell media), and therefore was performed in normal culture conditions of 10% serum. Thus, due to its well-established effect of growth factor/mTORC1 signalling on bulk translation rate, it is very likely that differences in the phase of translational rhythms between Fig1a,b and 1e are attributable to differing serum concentrations – this phenomenon of serum-dependency of phase is also described in Beale et al, 2023, bioRxiv https://doi.org/10.1101/2023.06.22.546020. The only important point, is that neither of these proof-of-principle experiments support the null hypothesis: that translation rate and turnover remains constant over the circadian cycle. Thus, the hypothesis being tested in Figure 1 is not rejected, and provides the rationale for the subsequent proteome-wide analyses.

      With respect to 1E, given the variance of measurement, the curve fits to Puro and Puro+BTZ already serve to test whether there is any significant ~24h component, a ratio of the respective data points would simply compound the error of measurement. The degradation plot is provided purely for illustrative purposes to help the reader i.e. if these fits were true, what would be expected? We have revised the figure to more clearly communicate that the degradation plot is presented purely as a visual aid, labelled “inferred”, and now show ratio plots in revised Figure 1.

      * **It also strikes me as odd that the amplitude of degradation increases (peak at 28h lower than at 30h) while the amplitude of the core clock oscillation dampens over time (peak at 54h higher than at 53h due to desynchronisation. Only two data values around 54h are responsible for the detected rhythm (2nd peak). Furthermore, phase and period do not agree with the rhythm of proteolytic activities shown in 1c. How can this be explained? *

      Due to the nature of the experiment, the degradation rate inferred from Figure 1B & 1E does not reflect proteasome activity exclusively. Rather it reflects the combined sum of processes that remove nascently produced proteins from the cell's digitonin-soluble fraction, which includes proteasomal degradation, but also autophagy, protein secretion and sequestration into other compartments. Therefore, the peak degradation in Fig 1B & E would not necessarily be expected to coincide with the peak of proteasome activity in Fig 1C. Again, these experiments in Figure 1 simply serve to test the hypothesis (change over circadian cycle) vs the null hypothesis (no change over the circadian cycle).

      To the question of amplitude increase, we speculate that this is due to metabolic changes in cultures over the course of three days – as serum and nutrients from the last medium change at T0 are depleted, cells need to increase degradation to promote turnover and recycling. As we suggest that the rhythms in turnover help cellular bioenergetic efficiency, it is quite plausible that amplitude increases as nutrient-concentrations fall. We are in process of further investigation into how exactly these rhythms vary with nutrient and serum status.i

      * Regarding the MS data shown in Figure 2, is it possible to show a positive / quality control? Best would be MS data of Luciferase (or PER2,3, RevErb/alpha, DBP) to show oscillation of protein levels with the same phase and period as the reporter. *

      Unfortunately, none of these low abundance transcription factors were detected in our MS runs. This is not surprising, given that their copy numbers are estimated at * In Fig. 2c examples of the 4 groups of proteins presented in 2e should be shown (both synthesis and total abundance arrhythmic, either one rhythmic or both rhythmic) and not just what appears to be random examples of rhythmic and arrhythmic proteins. *

      As also requested by another reviewer, we have revised the figure to include examples of each of the rhythmicity categories. No specific meaning is inferred from the chosen protein identities.

      Is it possible at all to distinguish between synthesis/turnover and assembly/disassembly of macromolecular complexes in the MMC SILAC experiment? If so, how?

      We followed the established protocol originally developed in our collaborator Kathryn Lilley's lab, where it has previously been shown that most proteins in the MMC fraction are in macromolecular assemblies (Geladaki et al, Nat Commun, 2019). Proteins that are rhythmically abundant in this fraction, but without an accompanying synthesis rhythm (e.g. Beta-actin, see Hoyle et al., Sci Trans Medicine, 2017) can be reliably assumed to arise solely from rhythmic assembly/disassembly i.e. they are captured in this fraction when assembled, but lost, and therefore not detected, in this fraction when disassembled. However, in the case of rhythmic synthesis and abundance, it is not possible with this technique to directly infer that rhythmic synthesis of a given protein is responsible for its rhythmic assembly in a complex, though they do correlate.

      Therefore, our new figure 5 (with thanks again for this suggestion) approaches this by an orthogonal method, relying on the important observations that a) ribosomes incorporate ribosomal RNA (rRNA) b) this can be readily separated from most other cellular RNA by density gradient centrifugation and c) pulse-labelling with heavy uridine-15N2 allows nascent RNA to be distinguished from pre-existing RNA. Using this technique, we validate a rhythm in production and assembly of mature ribosomes, with its peak consistent with the highest turnover time as measured in Figs 1 and 3, and MMC fraction proteomics (Supplemental table 3), at the descending phase of PER2::LUC.

      * **Looking at Fig. 4b,c, what is the fraction of rhythmic proteins from the MMC experiment that also oscillate in either synthesis, total abundance or both in the whole cell? Is there a general correlation at all? Please show. *

      There were no correlations greater than would be expected by chance (the sets of proteins rhythmic in either synthesis or degradation did not overlap significantly between whole-cell and MMC fractions, as determined by an odds ratio test).

      To communicate this we have added the following text:

      "It is also worth noting that although there were small sets of proteins that were rhythmic in both whole-cell (Figure 2) and MMC fractions (Figure 3), in both synthesis and total abundance, none of these four overlaps were higher than would have been expected by chance."

      * **Why is the phase of the oscillating proteins different in the two experiments (compare Figs. 2f,g and 4a) and does either of them match with the phase of the PER2::LUC reporter, which should be the peak synthesis phase of the clock? *

      This was a labelling error on our part, our apologies and thanks for drawing it to our attention. We had attempted to harmonise all these phase values so that they were mutually comparable between the two mass spec experiments, but omitted to update all the figures. They have now all been updated to be inter-consistent. From our experiments, the peak of PER2::LUC consistently precedes the timing of maximum bulk translation. This phase difference is, at least in part, attributable to the inactivation kinetics of firefly luciferase (see Feeney et al., J Biol Rhythms, 2016), i.e., under conditions of saturating luciferin substrate, PER2 protein abundance peaks several hours later than PER2::LUC activity when measured in longitudinal live cell assays.

      * Regarding the sensitivity to MG132 in Fig. 5b it doesn't make sense that, while eIF2alpha phosphorylation is arrhythmic in untreated cells and the levels of eIF2alpha phosphorylation are (apparently) not exhibiting a rhythmic change by administration of MG132 at different circadian timepoints, the ratio of P-eIF2alpha with and without MG132 suddenly is. Please show in Fig. S4b quantifications of the individual experiments with and without MG132. What is presented in 5b is after all the ratio of ratios of quantifications of Western blots, each of which individually does not display any appreciable rhythm. For me this is two much of processing of data. In my opinion, the MG132 4h acute treatment must show a detectable rhythm.*

      We apologise for being unclear in this panel and description. Our hypothesis concerned the fold-induction of the p-eIF2alpha:eIF2alpha ratio changing as a function of MG132 and time. Our reasoning being that the ratio may be more biologically-relevant as it is the relative change that cells sense and respond to, and not the absolute abundance of p-eIF2alpha. We applied a quantitative, two-channel fluorescent antibody technique to enable detection and quantification of p-eIF2alpha and eIF2alpha from each replicate at each time point from the same band of the same blot. We agree that no p-eIF2alpha rhythm is evident from a cursory inspection of any of the blots. This is due to the innate variance between dishes in extracted protein concentration, as well as the levels of basal eIF2alpha and its phosphorylation, and is the reason that we took great pains to be as quantitative as possible using the two-channel immuno-detection (LICOR). Due to the natural and stochastic variation in eIF2alpha levels and extraction between replicates and over time, it is difficult to get identical eIF2alpha loading to reveal the overlying rhythm in p-eIF2alpha, and furthermore, identical loading would give a misleading impression of the level of temporal variation of eIF2alpha levels. Quantification reveals temporal variation in the MG132 treated samples but not in the untreated controls (Supp Fig 5A) – suggesting that there may be circadian regulation of the cellular response to MG132 challenge, rather than a cell-autonomous p-eIF2alpha rhythm under basal conditions. We quantified fold-induction from MG132 vs untreated to present in Figure 6A. We have presented all the raw data in supplementary figure 5 for readers to validate through their own analysis.

      *Minor:

      In Fig. 1f please show dot blot with error bars as well as the individual experiments in the supplementals. Please check the graph legend (N>=3?) *

      Thank you for pointing out these omissions. The dot blot with error bars is now shown in Fig. 1F, and the full gels are now included as Fig. S2B. The main figure legend for 1f has also had the following added (explaining the N numbers):s

      "Four mice were used per condition, but in some cases one of the four injections were not successful i.e. no puromycin labelling was observed and so no quantification could be performed (full data in Fig. S2B)."

      * Please explain the mechanism of the "booster" used in the second SILAC experiment. *

      The following has been revised in the text:

      " Namely, we added a so-called booster channel: an additional fully heavy-labelled cell sample within a TMT mixture (Klann et al, 2020). When the mixture is analysed by MS, heavy peptides from the booster channel increase the overall signal of all identical heavy peptides at MS1 level; at MS2 and MS3 this results in improved detection of heavy proteins in the other TMT channels of interest, and is particularly advantageous for the proteins with lower turnover that would fall below the MS1 detection limit without the booster."

      *

      **p10 3rd paragraph: S2e not S3e *

      Thank you, this has been fixed.

      p12 last paragraph please add reference to Figs. 5f,g

      Thank you, this has been added.

      *Reviewer #3 (Significance (Required)): *

      xxxxx

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      I would like to express my appreciation for the authors' dedication to revising the manuscript. It is evident that they have thoughtfully addressed numerous concerns I previously raised, significantly contributing to the overall improvement of the manuscript.

      Response: We appreciate the reviewers’ recognition of our efforts in revising the manuscript.

      My primary concern regarding the authors' framing of their findings within the realm of habitual and goal-directed action control persists. I will try explain my point of view and perhaps clarify my concerns. While acknowledging the historical tendency to equate procedural learning with habits, I believe a consensus has gradually emerged among scientists, recognizing a meaningful distinction between habits and skills or procedural learning. I think this distinction is crucial for a comprehensive understanding of human action control. While these constructs share similarities, they should not be used interchangeably. Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses).

      Response: We would like to clarify that, contrary to the reviewer’s assertion of a scientific consensus on this matter, the discussion surrounding the similarities and differences between habits and skills remains an ongoing and unresolved topic of interest among scientists (Balleine and Dezfouli, 2019; Du and Haith, 2023; Graybiel and Grafton, 2015; Haith and Krakauer, 2018; Hardwick et al., 2019; Kruglanski and Szumowska, 2020; Robbins and Costa, 2017). We absolutely agree with the reviewer that “Procedural learning and motor skills can manifest either through intentional and planned actions (i.e., goal-directed) or autonomously and involuntarily (habitual responses)”. But so do habits. Some researchers also highlight the intentional/goal-directed nature of habits (e.g., Du and Haith, 2023, “Habits are not automatic” (preprint) or Kruglanski and Szumowska, 2020, “Habitual behavior is goal-driven”: “definitions of habits that include goal independence as a foundational attribute of habits are begging the question; they effectively define away, and hence dispose of, the issue of whether habits are goal-driven (p 1258).” Therefore, there is no clear consensus concerning the concept of habit.

      While we acknowledge the meaningful distinctions between habits and skills, we also recognize a substantial body of literature supporting the overlap between these concepts (cited in our manuscript), particularly at the neural level. The literature clearly indicates that both habits and skills are mediated by subcortical circuits, with a progressive disengagement of cognitive control hubs in frontal and cingulate cortices as repetition evolves. We do not use these concepts interchangeably. Instead, we simply present evidence supporting the assertion that our trained app sequences meet several criteria for their habitual nature.

      Our choice of Balleine and Dezfouli (2018)'s criteria stemmed from the comprehensive nature of their definitions, which effectively synthesized insights from various researchers (Mazar and Wood, 2018; Verplanken et al., 1998; Wood, 2017, etc). Importantly, their list highlights the positive features of habits that were previously overlooked. However, these authors still included a controversial criterion ("habits as insensitive to changes in their relationship to their individual consequences and the value of those consequences"), even though they acknowledged the problems of using outcome devaluation methods and of relying on a null-effect. According to Kruglanski and Szumowska (2020), this criterion is highly problematic as “If, by definition, habits are goalindependent, then any behavior found to be goal-dependent could not be a habit on sheer logical grounds” (p. 1257). In their definition, “habitual behavior is sensitive to the value of the reward (i.e., the goal) it is expected to mediate and is sensitive to the expectancy of goal attainment (i.e., obtainment of the reward via the behavior, p.1265). In fact, some recent analyses of habitual behavior are not using devaluation or revaluation as a criterion (Du and Haith, 2023). This article, for example, ascertains habits using different criteria and provides supporting evidence for trained action sequences being understood as skills, with both goal-directed and habitual components.

      In the discussion of our manuscript, we explicitly acknowledge that the app sequences can be considered habitual or goal-directed in nature and that this terminology does not alter the fact that our overtrained sequences exhibit clear habitual features.

      Watson et al. (2022) aptly detailed my concerns in the following statements: "Defining habits as fluid and quickly deployed movement sequences overlaps with definitions of skills and procedural learning, which are seen by associative learning theorists as different behaviors and fields of research, distinct from habits."

      "...the risk of calling any fluid behavioral repertoire 'habit' is that clarity on what exactly is under investigation and what associative structure underpins the behavior may be lost." I strongly encourage the authors, at the very least, to consider Watson et al.'s (2022) suggestion: "Clearer terminology as to the type of habit under investigation may be required by researchers to ensure that others can assess at a glance what exactly is under investigation (e.g., devaluationinsensitive habits vs. procedural habits)", and to refine their terminology accordingly (to make this distinction clear). I believe adopting clearer terminology in these respects would enhance the positioning of this work within the relevant knowledge landscape and facilitate future investigations in the field.

      Response: We would like to highlight that we have indeed followed Watson et al (2022)’s recommendations on focusing on other features/criteria of habits at the expense of the outcome devaluation/contingency degradation paradigm, which has been more controversial in the human literature. Our manuscript clearly aligns with Watson et al. (2022) ‘s recommendations: “there are many other features of habits that are not captured by the key metrics from outcome devaluation/contingency degradation paradigms such as the speed at which actions are performed and the refined and invariant characteristics of movement sequences (Balleine and Dezfouli, 2019). Attempts are being made to develop novel behavioral tasks that tap into these positive features of habits, and this should be encouraged as should be tasks that are not designed to assess whether that behavior is sensitive to outcome devaluation, but capture the definition of habits through other measures”.

      Regarding the authors' use of Balleine and Dezfouli's (2018) criteria to frame recorded behavior as habitual, as well as to acknowledgment the study's limitations, it's important to highlight that while the authors labelled the fourth criterion (which they were not fulfilling) as "resistance to devaluation," Balleine and Dezfouli (2018) define it as "insensitive to changes in their relationship to their individual consequences and the value of those consequences." In my understanding, this definition is potentially aligned with the authors' re-evaluation test, namely, it is conceptually adequate for evaluating the fourth criterion (which is the most accepted in the field and probably the one that differentiate habits from skills). Notably, during this test, participants exhibited goaldirected behavior.

      The authors characterized this test as possibly assessing arbitration between goal-directed and habitual behavior, stating that participants in both groups "demonstrated the ability to arbitrate between prior automatic actions and new goal-directed ones." In my perspective, there is no justification for calling it a test of arbitration. Notably, the authors inferred that participants were habitual before the test based on some criteria, but then transitioned to goal-directed behavior based on a different criterion. While I agree with the authors' comment that: "Whether the initiation of the trained motor sequences in experiment 3 (arbitration) is underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1)." they implicitly assert a shift from habit to goal-directed behavior without providing evidence that relies on the same probed mechanism. Therefore, I think it would be more cautious to refer to this test as solely an outcome revaluation test. Again, the results of this test, if anything, provide evidence that the fourth criterion was tested but not met, suggesting participants have not become habitual (or at least undermines this option).

      Response: In our previously revised manuscript, we duly acknowledged that the conventional (perhaps nowadays considered outdated) goal devaluation criterion was not met, primarily due to constraints in designing the second part of the study. We did cite evidence from another similar study that had used devaluation app-trained action sequences to demonstrate habitual qualities (but the reviewer ignored this).

      The reviewer points out that we did use a manipulation of goal revaluation in one of the follow-up tests conducted (although this was not a conventional goal revaluation test inasmuch that it was conducted in a novel context). In this test, please note that we used 2 manipulations: monetary and physical effort. Although we did show that subjects, including OCD patients, were apparently goaldirected in the monetary reward manipulation, this was not so clear when goal re-evaluation involved the physical effort expended. In this effort manipulation, participants were less goaloriented and OCD patients preferred to perform the longer, familiar, to the shorter, novel sequence, thus exhibiting significantly greater habitual tendencies, as compared to controls. Hence, we cannot decisively conclude that the action sequence is goal-directed as the reviewer is arguing. In fact, the evidence is equivocal and may reflect both habitual and goal-directed qualities in the performance of this sequence, consistent with recent interpretations of skilled/habitual sequences (Du and Haith, 2023). Relying solely on this partially met criterion to conclude that the app-trained sequences are goal-directed, and therefore not habitual, would be an inaccurate assessment for several reasons: 1) the action sequences did satisfy all other criteria for being habitual; 2) this approach would rest on a problematic foundation for defining habits, as emphasized by Kruglanski & Szumowska (2020); and 3) it would succumb to the pitfall of subscribing to a zero-sum game perspective, as cautioned by various researchers, including the review by Watson et al. (2022) cited by the referee, thus oversimplifying the nuanced nature of human behavior.

      While we have previously complied with the reviewer’s suggestion on relabelling our follow-up test as a “revaluation test” instead of an “arbitration test”, we have now explicitly removed all mentions of the term “arbitration” (which seems to raise concerns) throughout the manuscript. As the reviewer has suggested, we now use a more refined terminology by explicitly referring to the measured behavior as "procedural habits", as he/she suggested. We have also extensively revised the discussion section of our manuscript to incorporate the reviewer’s viewpoint. We hope that these adjustments enhance the clarity and accuracy of our manuscript, addressing the concerns raised during this review process.

      In essence, this is an ontological and semantic matter, that does not alter our findings in any way. Whether the sequences are consider habitual or goal directed, does not change our findings that 1) Both groups displayed equivalent procedural learning and automaticity attainment; 2) OCD patients exhibit greater subjective habitual tendencies via self-reported questionnaires; 3) Patients who had elevated compulsivity and habitual self-reported tendencies engaged significantly more with the motor habit-training app, practiced more and reported symptom relief at the end of the study; 4) these particular patients also show an augmented inclination to attribute higher intrinsic value to familiar actions, a possible mechanism underlying compulsions.

      Reviewer #2 (Recommendations For The Authors):

      A few more small comments (with reference to the point numbers indicated in the rebuttal):

      (14) I am not entirely sure why the suggested analysis is deemed impractical (i.e., why it cannot be performed by "pretending" participants received the points they should have received according to their performance). This can further support (or undermine) the idea of effect of reward on performance rather than just performance on performance.

      Response: We have now conducted this analysis, generating scores for each trial of practices after day 20, when participants no longer gained points for their performance. This analysis assesses whether participants trial-wise behavioral changes exhibit a similar pattern following simulated relative increases or decrease in scores, as if they had been receiving points at this stage. Note that this analysis has fewer trials available, around 50% less on average.

      Before presenting our results, we wish to emphasize the importance of distinguishing between the effects of performance on performance and the effects of reward on performance. In response to a reviewer's suggestion, we assessed the former in the first revision of our manuscript. We normalized the movement time variable and evaluated how normalized behavioral changes responded to score increments and decrements. The results from the original analyses were consistent with those from the normalized data.

      Regarding the phase where participants no longer received scores, we believe this phase primarily helps us understand the impact of 'predicted' or 'learned' rewards on performance. Once participants have learned the simple association between faster performance and larger scores, they can be expected to continue exhibiting the reward sensitivity effects described in our main analysis. We consider it is not feasible to assess the effects of performance on performance during the reward removal phase, which occurs after 20 days. Therefore, the following results pertain to how the learned associations between faster movement times and scores persist in influencing behavior, even when explicit scores are no longer displayed on the screen.

      Results: The main results of the effect of reward on behavioral changes persist, supporting that relative increases or decreases in scores (real or imagined/inferred) modulate behavioral adaptations trial-by-trial in a consistent manner across both cohorts. The direction of the effects of reward is the same as in the main analyses presented in the manuscript: larger mean behavioral changes (smaller std) following ∆R- . First, concerning changes in “normalized” movement time (MT) trial-by-trial, we conducted a 2 x 2 factorial analysis of the centroid of the Gaussian distributions with the same factors Reward, Group and Bin. This analysis demonstrated a significant main effect of Reward (P = 2e-16), but not of Group (P = 0.974) or Bin (P = 0.281). There were no significant interactions between factors. The main Reward effect can be observed in the top panel of the figure below. The same analysis applied to the spread (std) of the Gaussian distributions revealed a significant main effect of Reward (P = 0.000213), with no additional main effects or interactions.

      Author response image 1.

      Next, conducting the same 2 x 2 factorial analyses on the centroid and spread of the Gaussian distributions fitted to the Consistency data, we also obtained a robust significant main effect of Reward. For the centroid variable, we obtained a significant main effect of Reward (P = 0.0109) and Group (P = 0.0294), while Bin and the factor interactions were non-significant. See the top panel of the figure below.

      On the other hand, Reward also modulated significantly the spread of the Gaussian distributions fitted to the Consistency data, P = 0.00498. There were no additional significant main effects or interactions. See the bottom panel in the figure below.

      Note that here the factorial analysis was performed on the logarithmic transformation of the std.

      Author response image 2.

      (16) I find this result interesting and I think it might be worthwhile to include it in the paper.

      Response: We have now included this result in our revised manuscript (page 28)

      (18) I referred to this sentence: "The app preferred sequence was their preferred putative habitual sequence while the 'any 6' or 'any 3'-move sequences were the goal-seeking sequences." In my understanding, this implies one choice is habitual and another indicates goal-directedness.

      One last small comment:
In the Discussion it is stated: "Moreover, when faced with a choice between the familiar and a new, less effort-demanding sequence, the OCD group leaned toward the former, likely due to its inherent value. These insights align with the theory of goal-direction/habit imbalance in OCD (Gillan et al., 2016), underscoring the dominance of habits in particular settings where they might hold intrinsic value."

      This could equally be interpreted as goal-directed behavior, so I do not think there is conclusive support for this claim.

      Response: The choice of the familiar/trained sequence, as opposed to the 'any 6' or 'any 3'-move sequences cannot be explicitly considered goal-directed: firstly, because the app familiar sequences were associated with less monetary reward (in the any-6 condition), and secondly, because participants would clearly need more effort and time to perform them. Even though these were automatic, it would still be much easier and faster to simply tap one finger sequentially 6 times (any6) or 3 times (any-3). Therefore, the choice for the app-sequence would not be optimal/goaldirected. In this sense, that choice aligns with the current theory of goal-direction/habit imbalance of OCD. We found that OCD patients prefer to perform the trained app sequences in the physical effort manipulation (any-3 condition). While this, on one hand cannot be explicitly considered a goal-directed choice, we agree that there is another possible goal involved here, which links to the intrinsic value associated to the familiar sequence. In this sense the action could potentially be considered goal-directed. This highlights the difficulty of this concept of value and agrees with: 1) Hommel and Wiers (2017): “Human behavior is commonly not driven by one but by many overlapping motives . . . and actions are commonly embedded into larger-scale activities with multiple goals defined at different levels. As a consequence, even successful satiation of one goal or motive is unlikely to also eliminate all the others(p. 942) and 2) Kruglanski & Szumowska (2020)’s account that “habits that may be unwanted from the perspective of an outsider and hence “irrational” or purposeless, may be highly wanted from the perspective of the individual for whom a habit is functional in achieving some goal” (p. 1262) and therefore habits are goal-driven.

      References:

      Balleine BW, Dezfouli A. 2019. Hierarchical Action Control: Adaptive Collaboration Between Actions and Habits. Front Psychol 10:2735. doi:10.3389/fpsyg.2019.02735

      Du Y, Haith A. 2023. Habits are not automatic. doi:10.31234/osf.io/gncsf Graybiel AM, Grafton ST. 2015. The Striatum: Where Skills and Habits Meet. Cold Spring Harb Perspect Biol 7:a021691. doi:10.1101/cshperspect.a021691

      Haith AM, Krakauer JW. 2018. The multiple effects of practice: skill, habit and reduced cognitive load. Current Opinion in Behavioral Sciences 20:196–201. doi:10.1016/j.cobeha.2018.01.015

      Hardwick RM, Forrence AD, Krakauer JW, Haith AM. 2019. Time-dependent competition between goal-directed and habitual response preparation. Nat Hum Behav 1–11. doi:10.1038/s41562019-0725-0

      Hommel B, Wiers RW. 2017. Towards a Unitary Approach to Human Action Control. Trends Cogn Sci 21:940–949. doi:10.1016/j.tics.2017.09.009

      Kruglanski AW, Szumowska E. 2020. Habitual Behavior Is Goal-Driven. Perspect Psychol Sci 15:1256– 1271. doi:10.1177/1745691620917676

      Mazar A, Wood W. 2018. Defining Habit in Psychology In: Verplanken B, editor. The Psychology of Habit: Theory, Mechanisms, Change, and Contexts. Cham: Springer International Publishing. pp. 13–29. doi:10.1007/978-3-319-97529-0_2

      Robbins TW, Costa RM. 2017. Habits. Current Biology 27:R1200–R1206. doi:10.1016/j.cub.2017.09.060

      Verplanken B, Aarts H, van Knippenberg A, Moonen A. 1998. Habit versus planned behaviour: a field experiment. Br J Soc Psychol 37 ( Pt 1):111–128. doi:10.1111/j.2044-8309.1998.tb01160.x

      Watson P, O’Callaghan C, Perkes I, Bradfield L, Turner K. 2022. Making habits measurable beyond what they are not: A focus on associative dual-process models. Neurosci Biobehav Rev 142:104869. doi:10.1016/j.neubiorev.2022.104869

      Wood W. 2017. Habit in Personality and Social Psychology. Pers Soc Psychol Rev 21:389–403. doi:10.1177/1088868317720362

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ need further interrogation. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors for the positive assessment of our manuscript. We have carefully considered the reviewers’ constructive and helpful comments and revised our manuscript accordingly. To address the question about the dissociable relationships between global and local BM processing, we have provided more evidence and additional analyses in this revised version.

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating differences in biological motion perception in participants with ADHD in comparison with controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated local and global (holistic) biological motion perception, the group, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention/impulsivity). As well as local global biological motion perception is reduced in ADHD participants. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not the controls. A path analysis in the ADHD data suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature and adds potentially also new behavioral markers for this clinical condition. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thank you for your positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper.

      We agree that the relationship between genetic factors and BM processing in ADHD needs more investigation, We have modified our statement in Discussion section as following:

      “Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19.” (lines 421 - 425),

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a relatively clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate your positive assessment of our work.

      Weaknesses:

      Except for the main analysis, it is unclear what the authors' specific predictions are regarding the three different tasks they employ. The three BM tasks are used to probe different processes underlying BM perception, but it is difficult to gather from the introduction why these three specific tasks were chosen and what predictions the authors have about the performance of the ADHD group in these tasks. Relatedly, the authors do not report whether (and if so, how) they corrected for multiple comparisons in their analyses. As the number of tests one should control for depends on the theoretical predictions (http://daniellakens.blogspot.com/2016/02/why-you-dont-need-to-adjust-you-alpha.html), both are necessary for the reader to assess the statistical validity of the results and any inferences drawn from them. The same is the case for the secondary analyses exploring relationships between the 3 individual BM tasks and social function measured by the social responsivity scale (SRS).

      We appreciate these constructive suggestions. In response, we have included a detailed description in the Introduction section explaining why we employed three different tasks and our predictions about the performance in ADHD:

      “Despite initial indications, a comprehensive investigation into BM perception in ADHD is warranted. We proposed that it is essential to deconstruct BM processing into its multiple components and motion features, since treating them as a single entity may lead to misleading or inconsistent findings31. To address this issue, we employed a carefully designed behavioral paradigm used in our previous study19, making slight adjustments to adapt for children. This paradigm comprises three tasks. Task 1 (BM-local) aimed to assess the ability to process local BM cues. Scrambled BM sequences were displayed and participants could use local BM cues to judge the facing direction of the scrambled walker. Task 2 (BM-global) tested the ability to process the global configuration cues of the BM walker. Local cues were uninformative, and participants used global BM cues to determine the presence of an intact walker. Task 3 (BM-general) tested the ability to process general BM cues (local + global cues). The stimulus sequences consisted of an intact walker and a mask containing similar target local cues, so participants could use general BM cues (local + global cues) to judge the facing direction of the walker.” (lines 116 - 130)

      “In Experiment 1, we examined three specific BM perception abilities in children with ADHD. As mentioned earlier, children with ADHD also show impaired social interaction, which implies atypical social cognition. Therefore, we speculated that children with ADHD performed worse in the three tasks compared to TD children.” (lines 131 - 134)

      Additionally, we have reported the p values corrected for multiple comparisons (false discovery rate, FDR) in the revised manuscript wherever it was necessary to adjust the alpha (lines 310 - 316; Table 2). The pattern of the results remained unchanged.

      In relation to my prior point, the authors could provide more clarity on how the conclusions drawn from the results relate to their predictions. For example, it is unclear what specific conclusions the authors draw based on their findings that ADHD show performance differences in all three BM perception tasks, but only local BM is related to social function within this group. Here, the claim is made that their results support a specific hypothesis, but it is unclear to me what hypothesis they are actually referring to (see line 343 & following). This lack of clarity is aggravated by the fact that throughout the rest of the discussion, in particular when discussing other findings to support their own conclusions, the authors often make no distinction between the two processes of interest. Lastly, some of the authors' conclusions related to their findings on local vs global BM processing are not logically following from the evidence: For instance, the authors conclude that their data supports the idea that social atypicalities are likely to reduce with age in ADHD individuals. However, according to their own account, local BM perception - the only measure that was related to social function in their study - is understood to be age invariant (and was indeed not predicted by age in the present study).

      Thank you for pointing out this issue. We have carefully revised the Discussion section about our findings to clarify these points:

      “Our study contributes several promising findings concerning atypical biological motion perception in ADHD. Specifically, we observe the atypical local and global BM perception in children with ADHD. Notably, a potential dissociation between the processing of local and global BM information is identified. The ability to process local BM cues appears to be linked to the traits of social interaction among children with ADHD. In contrast, global BM processing exhibits an age-related development. Additionally, general BM perception may be affected by factors including attention.” (lines 387 - 393)

      We have provided a detailed discussion on the two processes of interest to clarify their potential differences and the possible reasons behind the difference of the divergent developmental trajectories between local and global BM processing:

      “BM perception is considered a multi-level phenomenon56-58. At least in part, processing information of local BM and global BM appears to involve different genetic and neural mechanisms16,19. Using the classical twin method, Wang et al. found that the distinction between local and global BM processing may stem from the dissociated genetic bases. The former, to a great degree, seems to be acquired phylogenetically20,21,59,60, while the latter is primarily obtained through individual development19. The sensitivity to local rather than global BM cues seems to emerge early in life. Visually inexperienced chicks exhibit a spontaneous preference for the BM stimuli of hen, even when the configuration was scrambled20. The same finding was reported in newborns. On the contrary, the ability to process global BM cues rather than local BM cues may be influenced by attention28,29 and shaped by experience24,56.” (lines 419 - 430)

      “We found that the ability to process global and general BM cues improved significantly with age in both TD and ADHD groups, which imply the processing module for global BM cues tends to be mature with development. In the ADHD group, the improvement in processing general and global BM cues is greater than that in processing local BM cues, while no difference was found in TD group. This may be due to the relatively higher baseline abilities of BM perception in TD children, resulting in a relatively milder improvement. These findings also suggest a dissociation between the development of local and global BM processing. There seems to be an acquisition of ability to process global BM cues, akin to the potential age-related improvements observed in certain aspects of social cognition deficits among individuals with ADHD5, whereas local BM may be considered an intrinsic trait19.” (lines 438 -449)

      In addition, we have rephased some inaccurate statements in revised manuscript. Another part of social dysfunction might be stable and due to the atypical local BM perception in ADHD individuals, although some studies found a part of social dysfunction would reduce with age in ADHD individuals. One reason is that some factors related to social dysfunction would improve with age, like the symptom of hyperactivity.

      Results reported are incomplete, making it hard for the reader to comprehensively interpret the findings and assess whether the conclusions drawn are valid. Whenever the authors report negative results (p-values > 0.05), the relevant statistics are not reported, and the data not plotted. In addition, summary statistics (group means) are missing for the main analysis.

      Thanks for your comments. We have provided the complete statistical results in the revised manuscript (lines 309 - 316) and supplementary material, which encompass relevant statistics and plots of negative results (Figure 4, Figure S2 and S3), in accordance with our research questions. And we have also included summary statistics in the Results section (lines 287 - 293).

      Some of the conclusions/statements in the article are too strong and should be rephrased to indicate hypotheses and speculations rather than facts. For example, in lines 97-99 the authors state that the finding of poor BM performance in TD children in a prior study 'indicated inferior applicability' or 'inapplicable experimental design'. While this is one possibility, a perhaps more plausible interpretation could be that TD children show 'poor' performance due to outstanding maturation of the underlying (global) BM processes (as the authors suggest themselves that BM perception can improve with age). There are several other examples where statements are too strong or misleading, which need attention.

      We thank you for pointing out the issue. We have toned down and rephrased the strong statements and made the necessary revisions.

      “Another study found that children with ADHD performed worse in BM detection with moderate ratios of noise34. This may be due to the fact that BM stimuli with noise dots will increase the difficulty of identification, which highlights the difference in processing BM between the two groups33,35.” (lines 111 - 115)

      Reviewer #3 (Public Review):

      Summary:

      The authors presented point light displays of human walkers to children (mean = 9 years) with and without ADHD to compare their biological motion perception abilities and relate them to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three biological motion tasks, but that those loading more heavily on local processing related to social interaction skills and global processing to age. The important and solid findings are informative for understanding this complex condition, as well as biological motion processing mechanisms in general. However, I am unsure that these differences between local and global skills are truly supported by the data and suggest some further analyses.

      Strengths:

      The authors present clear differences between the ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate your positive feedback. In revised manuscript, we have added more analyses to support the differences between local and global motion processing. Please refer to our response to the point #3 you mentioned below.

      Weaknesses:

      I am unsure that the data are strong enough to support claims about differences between global and local processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but do not seem so plausible to me. I am also concerned about gender, and possible autism, confounds when examining the effect of ADHD. Specifics:

      Gender confound. There are proportionally more boys in the ADHD than TD group. The authors appear to attempt to overcome this issue by including gender as a covariate. I am unsure if this addresses the problem. The vast majority of participants in the ADHD group are male, and gender is categorically, not continuously, defined. I'm pretty sure this violates the assumptions of ANCOVA.

      We appreciate your comments. We concur with you that although we observed a clear difference between local and global BM processing in ADHD, the evidence is to some extent preliminary. The mechanistic possibilities for why these abilities may dissociate have been discussed in revised manuscript. Please refer to the response to reviewer 2’s point #2. To further examine if gender played a role in the observed results, we used a statistical matching technique to obtain a sub-dataset. The pattern of results remained with the more balanced dataset (see Supplementary Information part 1). According to your suggestion, we have also presented the results without using gender as a covariate in main text and also separated the data of boys and girls on the plots (see Figure 1 and Figure S1). There were indeed no signs of a gender effect.

      Autism. Autism and ADHD are highly comorbid. The authors state that the TD children did not have an autism or ADHD diagnosis, but they do not state that the ADHD children did not have an autism diagnosis. Given the nature of the claims, this seems crucial information for the reader.

      Thanks for your suggestion. We have confirmed that all children with ADHD in our study were not diagnosed with autism. We used a semi-structured interview instrument (K-SADSPL-C) to confirm every recruited child with ADHD but not with ASD. The exclusion criteria for both groups were mentioned in the Materials and methods section:

      “Exclusion criteria for both groups were: (a) neurological diseases; (b) other neurodevelopmental disorders (e.g., ASD, Mental retardation, and tic disorders), affective disorders and schizophrenia…” (lines 158 - 162)

      Conclusions. The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. I think that if the authors wish to make strong claims here they must show inferential stats supporting (1) a difference between ADHD and TD SRS-Task 1 correlations, and (2) a difference in those differences for Task 2 and 3 relative to Task 1. I think they should also show a scatterplot of this correlation, with separate lines of best fit for the two groups, for Tasks 2 and 3 as well. I.e. Figure 4 should have 3 panels. I would recommend the same type of approach for age. Currently, they have small samples for correlations, and are reading much of theoretical significance between some correlations passing significance threshold and others not. It would be incredibly interesting if the social skills (as measured by SRS) only relate to local BM abilities, and age only to global, but I think the data are not so clear with the current information. I would be surprised if all BM abilities did not improve with age. Even if there is some genetic starter kit (and that this differs according to particular BM component), most abilities improve with learning/experience/age.

      Thank you for this recommendation. We have added more statistics to test differences between the correlations (a difference between ADHD and TD in SRS-Task 1 correlations (see the first paragraph of Supplementary Information part 2), a difference in SRS-response accuracy correlations for Task 2 and 3 relative to Task 1(see the second paragraph of Supplementary Information part 2), and a difference in age-response accuracy correlations for Task 2 and 3 relative to Task 1 in ADHD group (see Supplementary Information part 3)). Additionally, we have included scatterplots for SRS-Task1, SRS-Task2, SRS-Task3 (with separate lines of best fit for the two groups in each, see Figure 4), SRS-ADHD, SRS-TD, age-ADHD and age-TD (with separate lines of best fit for the three tasks in each, see Figure S2 and S3) to make a clear demonstration. Detailed results have been presented in the revised manuscript and Supplementary Information. We expect these further analyses would strengthen our conclusions.

      Theoretical assumptions. The authors make some sweeping statements about local vs global biological motion processing that need to be toned down. They assume that local processing is specifically genetically whereas global processing is a product of experience. The fact their global, but not local, task performance improves with age would tend to suggest there could be some difference here, but the existing literature does not allow for this certainty. The chick studies showing a neonatal preference are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      Thank you for pointing out this issue. We have toned down rephrased our claims that the difference between local and global BM processing according to your suggestion:

      “These findings suggest that local and global mechanisms might play different roles in BM perception, though the exact mechanisms underlying the distinction remain unclear. Exploring the two components of BM perception will enhance our understanding of the difference between local and global BM processing, shedding light on the psychological processes involved in atypical BM perception.” (lines 87 - 92)

      Reviewer #1 (Recommendations For The Authors):

      I have only a number of minor points that should be addressed prior to publication:

      L. 95ff: What is meant by 'inapplicability of experimental designs' ? This paragraph is somewhat unclear.

      In revised manuscript, we have clarified this point (lines 111 - 115).

      L. 146: The groups were not perfectly balanced for sex. Would results change fundamentally in a more balanced design, or can arguments be given that gender does not play a role, like it seems to be the case for some functions in biological motion perception (e.g. Pavlova et al. 2015; Tsang et al 2018). One could provide a justification that this disbalance does not matter or test for subsampled balanced data sets maybe.

      This point is similar to the point #1 from reviewer 3, and we have addressed this issue in our response above.

      L. 216 f.: In this paragraph it does not become very clear that the mask for the global task consisted of scrambles generated from walkers walking in the same direction. The mask for the local task then should consist of a balanced mask that contains the same amount of local motion cues indicating right and leftwards motion. Was this the case? (Not so clear from this paragraph.)

      Regarding the local task, the introduction of mask would make the task too difficult for children. Therefore, in the local task, we only displayed a scrambled walker without a mask, which was more suitable for children to complete the task. We have made clear this point in the corresponding paragraph (lines 232 - 241).

      L. 224 ff.: Here it would be helpful to see the 5 different 'facing' directions of the walkers. What does this exactly mean? Do they move on oblique paths that are not exactly orthogonal to the viewing directions, and how much did these facing directions differ?

      Out of the five walkers we used, two faced straight left or right, orthogonal to the viewing directions. Two walked with their bodies oriented 45 degrees from the observer, to the left or right. The last one walked towards the observer. We have included a video (Video 4) to demonstrate the 5 facing directions.

      L. 232: How was the number of 5 practicing trials determined/justified?

      As mentioned in main text, global BM processing is susceptible to learning. Therefore, too many practicing trials would increase BM visual experience and influence the results. We determined the number of training trials to be 5 based on the results of the pilot experiment. During this phase, we observed that nearly all children were able to understand the task requirements well after completing 5 practicing trials.

      L 239: Apparently no non-parametric statistics was applied. Maybe it would be good to mention in the Statistics section briefly why this was justified.

      We appreciate your suggestion and have cited two references in the Statistics section (Fagerland et al. 2012, Rochon et al. 2012). Fagerland et al., mentioned that when the sample size increases, the t-test is more robust. According to the central limit theorem, when the sample size is greater than 30, the sampling distribution of the mean can be safely assumed to be normal.

      (http://www2.psychology.uiowa.edu/faculty/mordkoff/GradStats/part%201/I.07%20normal.p df). In fact, we also ran non-parametric statistics for our data and found the results to be robust.

      L 290: 'FIQ' this abbreviation should be defined.

      Regarding the abbreviation ’FIQ’, it stands for the abbreviation of the full-scale intellectual quotient, which was mentioned in Materials and methods section:

      “Scores of the four broad areas constitute the full-scale intellectual quotient (FIQ).”

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors gender etc. time appropriate beta_i values. This formula should be corrected or one just says that a GLM was run with the predictors gender ....

      The same criticism applies to these other models that follow.

      We thank you for pointing this out. We have modified all formulas accordingly in the revised manuscript (see part3 of the Results section).

      All these models assume linearity of the combination of the predictors.was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the group of patients. Does the same observation also apply to the normals?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data of the study will be available at https://osf.io/37p5s/.

      (2) Although overall, the language was clear and understandable, there are a few parts where language might confuse a reader and lead to misconceptions. For instance, line 52: Did the authors mean to refer to 'emotions and intentions' instead of 'emotions and purposes'? See also examples where rephrasing may help to reflect a statement is speculation rather than fact.

      Thanks for the comments. We have carefully checked the full text and rephrased the confused statements.

      (3) Line 83/84: Autism is not a 'mental disorder' - please change to something like 'developmental disability'. Authors are encouraged to adapt their language according to terms preferred by the community (e.g., see Fig. 5 in this article:

      https://onlinelibrary.wiley.com/doi/10.1002/aur.2864)

      Suggestion well taken. We have changed the wording accordingly:

      “In recent years, BM perception has received significant attention in studies of mental disorders (e.g., schizophrenia30) and developmental disabilities, particularly in ASD, characterized by deficits in social communication and social interaction31,32.” (lines 93 - 95)

      (4) Please report how the sample size for the study was determined.

      In the Materials and methods section (lines 168 - 173), we explained how the sample size was determined.

      Line 94: It would be helpful to have a brief description of what neurophysiological differences have been observed upon BM perception in children with ADHD.

      Thanks for the comment. We have added a brief description of neurophysiological findings in children with ADHD (lines 108 - 111).

      (6) Line 106/107 and 108/109: please add references.

      We have revised this part, and the relevant findings and references are in line with the revised manuscript (lines 77, 132 - 133).

      (7) Line 292: Please add what order the factors were entered into each regression model.

      Regarding this issue, we used SPSS 26 for the main analysis. SPSS utilizes the Type III sum of squares (default) to evaluate models. Regardless of the order in the GLM, we will obtain the same result. For more information, please refer to the documentation of SPSS 26 (https://www.ibm.com/docs/en/spss-statistics/26.0.0?topic=features-glm-univariate-analysis).

      Reviewer #3 (Recommendations For The Authors)

      (1) Task specifics. It is key to understanding the findings, as well as the dissociation between tasks, that the precise nature of the stimuli is clear. I think there is room for improvement in description here. Task 1 is described as involving relocating dots within the range of the intact walker. Of course, PLWs are created by presenting dots at the joints, so relocation can involve either moving to another place on the body, or random movement within the 2D spatial array (which likely involves moving it off the body). Which was done? It is said that Ps must indicate the motion direction, but what was the display of the walker? Sagittal? Task 2 requires detecting whether there is an intact walker amongst scrambled walkers. Were all walkers completely overlaid? Task 3 requires detecting the left v right facing of an intact walker at different orientations, presented amongst noise. So Task 3 requires determining facing direction and Task 1 walking direction. Are these tasks the same but described differently? Or can walkers ever walk backwards? Wrt this point, I also think it would help the reader if example videos were uploaded.

      We appreciate you for bringing this to our attention. With regards to Task 1, it appears that your second speculation is correct. We scrambled the original dots and randomly presented them within the 2D spatial array (which likely involved moving them off the body). As a result, the global configuration of the 13 dots was completed disrupted while preserving the motion trajectory of each individual dot. This led to the display of scrambled dots on the monitor (which does not resemble a human). In practice, these local BM cues contain information about motion direction. In Task 2, the target walkers completely overlaid by a mask that is approximately 1.44 times the size of the intact walker. The task requirements of Task 1 and Task3 are same, which is judging the motion (walking) direction. The difference is that Task 1 displayed a scrambled walker while Task 3 displayed an intact walker within a mask. We have clarified these points and improved our descriptions in Procedure section and created example videos for each task, which we believe will be helpful for the readers to understand each task.

      (2) Gender confound (see above). I think that the authors should present the results without gender as a covariate. Can they separate boys and girls on the plots with different coloured individual datapoints, such that readers can see whether it's actually a gender effect driving the supposed ADHD effect? And show that there are no signs of a gender effect in their TD group?

      This point is similar to the point #1 you mentioned. Please refer to our response to that point above.

      (3) Autism possible confound (see above). I think the authors must report whether any of the ADHD group had an autism diagnosis.

      Please refer to the response for the point #2 your mentioned.

      (4) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors should add stats demonstrating differences between the correlations to support such claims, as well as demonstrating appropriate scatterplots for SRS-Task 1, SRS-Task 2, SRS-Task 3 and age-Task 1, age-Task2 and age-Task 3 (with separate lines of best fit for the two groups in each).

      Please refer to the response for the point #3 your mentioned.

      (5) Theoretical assumptions (see above). I would suggest rephrasing all claims here to outline that these discussed mechanistic differences between local and global BM processing are only possibilities and not known on the basis of existing data.

      Please refer to the response for the point #4 your mentioned.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor suggestions:

      Abstract: I really liked the conclusion (that IM and VWM are two temporal extremes of the same process) as articulated in lines 557--563. (It is always satisfying when the distinction between two things that seem fundamentally different vanishes). If something like this but shorter could be included in the Abstract, it would highlight the novel aspects of the results a little more, I think.

      Thank you for this comment. We have added the following to the abstract:

      “A key conclusion is that differences in capacity classically thought to distinguish IM and VWM are in fact contingent upon a single resource-limited WM store.”

      L 216: There's an orphan parenthesis in "(justifying the use".

      Fixed.

      L 273: "One surprising result was the observed set size effect in the 0 ms delay condition". In this paragraph, it might be a good idea to remind the reader of the difference between the simultaneous and zero-delay conditions. If I got it right, the results differ between these conditions because it takes some amount of processing time to interpret the cue and free the resources associated with the irrelevant stimuli. Recalling that fact would make this paragraph easier to digest.

      That is correct. However, at this point in the text, we have not yet fitted the DyNR model to the data. Therefore, we believe that introducing cue processing and resource reallocation as concepts that differentiate between those two conditions would disrupt the flow of this paragraph. We address these points soon after, in a paragraph starting on line 341.

      Figures 3, 5: The labels at the bottom of each column in A would be more clear if placed at the top of each column instead. That way, the x-axis for the plots in A could be labeled appropriately, as "Error in orientation estimate" or something to that effect.

      We edited both figures, now Figure 4 and Figure 6, as suggested.

      L 379: It should be "(see Eq 6)", I believe.

      That is correct, line 379 (currently line 391) should read ‘Eq 6’. Fixed.

      L 379--385: I was a bit mystified as to why the scaled diffusion rate produced a worse fit than a constant rate. I imagine the scaled version was set to something like

      sigma^2_diff_scaled = sigma^2_base + K*(N-1)

      where N is the set size and sigma^2_base and K are parameters. If this model produced a similar fit as with a constant diffusion rate, the AIC would penalize it because of the extra parameter. But why would the fit be worse (i.e., not match the pattern of variability)? Shouldn't the fitter just find that the K=0 solution is the best? Not a big deal; the Nelder-Mead solutions can wobble when that many parameters are involved, but if there's a simple explanation it might be worth commenting on.

      The scaled diffusion was implemented by extending Eq 6 in the following way:

      σ(t)2 = (t-toffset) * σ̇ 2diff * N

      where N is set size. Therefore, the scaling was not associated with a free parameter that could become 0 if set size did not affect diffusion rate, but variability rather mandatory increased with set size. We now clarify this in the text:

      “The second variant was identical to the proposed model, except that we replaced the constant diffusion rate with a set size scaled diffusion rate by multiplying the right side of Eq 6 by N.“

      Figure 4 is not mentioned in the main text. Maybe the end of L 398 would be a good place to point to it. The paragraph at L 443-455 would also benefit from a couple of references to it.

      Thank you for this suggestion. Figure 4 (now Figure 5) was previously mentioned on line 449 (previously line 437), but now we have included it on line 410 (previously line 398), within the paragraph spanning lines 455-467 (previously 443-455), and also on line 136 where we first discuss masking effects.

      L 500: Figure S7 is mentioned before Figures S5 and S6. Quite trivial, I know....

      Thank you for this comment. There was no specific reason for Figure S7 to appear after S5 & S6, so we simply swapped their order to be consistent with how they are referred to in the manuscript (i.e., S7 became S5, S5 became S6, and S6 became S7).

      Reviewer #2 (Recommendations For The Authors):

      (1) One potential weakness is that the model assumes sensory information is veridical. However, this isn't likely the case. Acknowledging noise in sensory representations could affect the model interpretation in a couple of different ways. First, neurophysiological recordings have shown normalization affects sensory representations, even when a stimulus is still present on the screen. The DyNR model partially addresses this concern because reports are drawn from working memory, which is normalized. However, if sensory representations were also normalized, then it may improve the model variant where subjects draw directly from sensory representations (an alternative model that is currently described but discarded).

      Thank you for this suggestion. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      Second, visual adaptation predicts sensory information should decrease over time. This would predict that for long stimulus presentation times, the error would increase. Indeed, this seems to be reflected in Figure 5B. This effect is not captured by the DyNR model.

      Indeed, neural responses in the visual cortex have been observed to quickly adapt during stimulus presentation, showing reduced responses to prolonged stimuli after an initial transient (Groen et al., 2022; Sawamura et al., 2006; Zhou et al., 2019). This adaptation typically manifests as 1) reduced activity towards the end of stimulus presentation and 2) a faster decay towards baseline activity after stimulus offset.

      In the DyNR model, we use an idealized solution in which we convolve the presented visual signal with a response function (i.e., temporal filter). At the longest presentation durations, in DyNR, the sensory signal plateaus and remains stable until stimulus offset. Because our psychophysical data does not allow us to identify the exact neural coding scheme that underlies the sensory signal, we tend to favour this simple implementation, which is broadly consistent with some previous attempts to model temporal dynamics in sensory responses (e.g., Carandini and Heeger, 1994). However, we agree with the reviewer that some adaptation of the sensory signal with prolonged presentation would also be consistent with our data.

      We have added the following to the manuscript:

      “In Experiment 2, the longest presentation duration shows an upward trend in error at set sizes 4 and 10. While this falls within the range of measurement error, it is also possible that this is a meaningful pattern arising from visual adaptation of the sensory signal, whereby neural populations reduce their activity after prolonged stimulation. This would mean less residual sensory signal would be available after the cue to supplement VWM activity, predicting a decline in fidelity at higher set sizes. Visual adaptation has previously been successfully accounted for by a type of delayed normalization model in which the sensory signal undergoes a series of linear and nonlinear transformations (Zhou et al., 2019). Such a model could in future be incorporated into DyNR and validated against psychophysical and neural data.”

      Carandini, M., & Heeger, D. J. (1994). Summation and division by neurons in primate visual cortex. Science, 264(5163), 1333–1336. https://doi.org/10.1126/science.8191289

      Groen, I. I. A., Piantoni, G., Montenegro, S., Flinker, A., Devore, S., Devinsky, O., Doyle, W., Dugan, P., Friedman, D., Ramsey, N. F., Petridou, N., & Winawer, J. (2022). Temporal Dynamics of Neural Responses in Human Visual Cortex. The Journal of Neuroscience, 42(40), 7562–7580. https://doi.org/10.1523/JNEUROSCI.1812-21.2022

      Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of Neuronal Adaptation Does Not Match Response Selectivity: A Single-Cell Study of the fMRI Adaptation Paradigm. Neuron, 49(2), 307–318. https://doi.org/10.1016/j.neuron.2005.11.028

      Zhou, J., Benson, N. C., Kay, K., & Winawer, J. (2019). Predicting neuronal dynamics with a delayed gain control model. PLOS Computational Biology, 15(11), e1007484. https://doi.org/10.1371/journal.pcbi.1007484

      (2) A second potential weakness is that, in Experiment 1, the authors briefly change the sensory stimulus at the end of the delay (a 'phase shift', Fig. 6A). I believe this is intended to act as a mask. However, I would expect that, in the DyNR model, this should be modeled as a new sensory input (in Experiment 2, 50 ms is plenty of time for the subjects to process the stimuli). One might expect this change to disrupt sensory and memory representations in a very characteristic manner. This seems to make a strong testable hypothesis. Did the authors find evidence for interference from the phase shift?

      The phase shift was implemented with the intention of reducing retinal after-effects, essentially acting as a mask for retinal information only; crucially the orientation of the stimulus is unchanged by the phase shift, so from the perspective of the DyNR model, it transmits the same orientation information to working memory as the original stimulus.

      If our objective were to model sensory input at the level of individual neurons and their receptive fields, we would indeed need to treat this phase shift as a novel input. Nevertheless, for DyNR, conceived as an idealization of a biological system for encoding orientation information, we can safely assume that visual areas in biological organisms have a sufficient number of phase-sensitive simple cells and phase-indifferent complex cells to maintain the continuity of input to VWM.

      When comparing conditions with and without the phase shift of stimuli (Fig S1B), we found performance to be comparable in the perceptual condition (simultaneous presentation) and with the longest delay (1 second), suggesting that the phase shift did not change the visibility or encoding of information into VWM. In contrast, we found strong evidence that observers had access to an additional source of information over intermediate delays when the phase shift was not used. This was evident through enhanced recall performance from 0 ms to 400 ms delay. Based on this, we concluded that the additional source of information available in the absence of a phase shift was accessible immediately following stimulus offset and had a brief duration, aligning with the theoretical concept of retinal afterimages.

      (3) It seems odd that the mask does not interrupt sensory processing in Experiment 2. Isn't this the intended purpose of the mask? Should readers interpret this as all masks not being effective in disrupting sensory processing/iconic memory? Or is this specific to the mask used in the experiment?

      Visual masks are often described as instantly and completely halting the visual processing of information that preceded the mask. We also anticipated the mask would entirely terminate sensory processing, but our data indicate the effect was not complete (as indicated by model variants in Experiment 2). Nevertheless, we believe we achieved our intended goal with this experiment – we observed a clear modulation of response errors with changing stimulus duration, indicating that the post-stimulus information that survived masking did not compromise the manipulation of stimulus duration. Moreover, the DyNR model successfully accounted for the portion of signal that survived the mask.

      We can identify two possible reasons why masking was incomplete. First, it is possible that the continuous report measure used in our experiments is more sensitive than the discrete measures (e.g., forced-choice methods) commonly employed in experiments that found masks to be 100% effective. Second, despite using a flickering white noise mask at full contrast, it is possible that it may not have been the most effective mask; for instance, a mask consisting of many randomly oriented Gabor patches matched in spatial frequency to the stimuli could prove more effective. We decided against such a mask because we were concerned that it could potentially act as a new input to orientation-sensitive neurons, rather than just wiping out any residual sensory activity.

      (4) I apologize if I missed it, but the authors did not compare the DyNR model to a model without decaying sensory information for Experiment 1.

      We tested two DyNR variants in which the diffusion process was solely responsible for memory fidelity dynamics. These models assumed that the sensory signal terminates abruptly with stimuli offset, and the VWM signal encoding the stimuli was equal to the limit imposed by normalization, independent of the delay duration.

      As variants of this model failed to account for the observed response errors both quantitatively (see 'Fixed neural signal' under Model variants) and qualitatively (Figure S3), we decided not to test any more restrictive variants, such as the one without sensory decay and diffusion.

      (5) In the current model, selection is considered to be absolute (all or none). However, this need not be the case (previous work argues for graded selection). Could a model where memories are only partially selected, in a manner that is mediated by load, explain the load effects seen in behavior?

      Thank you for this point. If attentional selection was partial, it would affect the observers’ efficiency in discarding uncued objects to release allocated resources and encode additional information about the cued item. We and others have previously examined whether humans can efficiently update their VWM when previous items become obsolete. For example, Taylor et al. (2023) showed that observers could efficiently remove uncued items from VWM and reallocate the released resources to new visual information. These findings align with results from other studies (e.g., Ecker, Oberauer, & Lewandowsky, 2014; Kessler & Meiran, 2006; Williams et al., 2013).

      Based on these findings, we feel justified in assuming that observers in our current task were capable of fully removing all uncued objects, allowing them to continue the encoding process for the cued orientation that was already partially stored in VWM, such that the attainable limit on representational precision for the cued item equals the maximum precision of VWM.

      Partial removal could in principle be modelled in the DyNR model by introducing an additional plateau parameter specifying a maximum attainable precision after the cue. Our concern would be that such a plateau parameter would trade off with the parameter associated with Hick’s law (i.e., cue interpretation time). The former would control the amount of information that can be encoded into VWM, while the latter regulates the amount of sensory information available for encoding. We are wary of adding additional parameters, and hence flexibility, to the model where we do not have the data to sufficiently constrain them.

      Ecker, U. K. H., Oberauer, K., & Lewandowsky, S. (2014b). Working memory updating involves item-specific removal. Journal of Memory and Language, 74, 1–15. https://doi.org/10.1016/j.jml. 2014.03.006

      Kessler, Y., & Meiran, N. (2006). All updateable objects in working memory are updated whenever any of them are modified: Evidence from the memory updating paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 570–585. https://doi.org/10.1037/0278-7393.32.3.570

      Taylor, R., Tomić, I., Aagten-Murphy, D., & Bays, P. M. (2023). Working memory is updated by reallocation of resources from obsolete to new items. Attention, Perception, & Psychophysics, 85(5), 1437–1451. https://doi.org/10.3758/s13414-022-02584-2

      Williams, M., & Woodman, G. F. (2012). Directed forgetting and directed remembering in visual working memory. Journal of Experimental Psychology. Learning, Memory, and Cognition, 38(5), 1206–1220. https://doi.org/10.1037/a0027389

      (6) Previous work, both from the authors and others, has shown that memories are biased as if they are acted on by attractive/repulsive forces. For example, the memory of an oriented bar is biased away from horizontal and vertical and biased towards diagonals. This is not accounted for in the current model. In particular, this could be one mechanism to generate a non-uniform drift rate over time. As noted in the paper, a non-uniform drift rate could capture many of the behavioral effects reported.

      The reviewer is correct that the model does not currently include stimulus-specific effects, although our work on that topic provides a clear template for incorporating them in future (e.g. Taylor & Bays, 2018). Specifically on the question of generating a non-uniform drift, we have another project that currently looks at this exact question (cited in our manuscript as Tomic, Girones, Lengyel, and Bays; in prep.). By examining various datasets with varying memory delays, including the Additional Dataset 1 reported in the Supplementary Information, we found that stimulus-specific effects on orientation recall remain constant with retention time. Specifically, although there is a clear increase in overall error over time, estimation biases remain constant in direction and amplitude, indicating that the bias does not manifest in drift rates (see also Rademaker et al., 2018; Figure S1).

      Taylor, R., & Bays, P. M. (2018). Efficient coding in visual working memory accounts for stimulus-specific variations in recall. The Journal of Neuroscience, 1018–18. https://doi.org/10.1523/JNEUROSCI.1018-18.2018

      Rademaker, R. L., Park, Y. E., Sack, A. T., & Tong, F. (2018). Evidence of gradual loss of precision for simple features and complex objects in visual working memory. Journal of Experimental Psychology: Human Perception and Performance. https://doi.org/10.1037/xhp0000491

      (7) Finally, the authors use AIC to compare many different model variants to the DyNR model. The delta-AICs are high (>10), indicating a strong preference for the DyNR model over the variants. However, the overall quality of fit to the data is not clear. What proportion of the variance in data was the model able to explain? In particular, I think it would be helpful for the reader if the authors reported the variance explained on withheld data (trials, conditions, or subjects).

      Thank you for this comment.

      Below we report the estimates of r2, representing the goodness of fit between observed data (i.e., RMSE) and the DyNR model predictions.

      In Experiment 1, the r2 values between observations and predictions were computed across delays for each set size, yielding the following estimates: r2ss1 = 0.60; r2ss4 = 0.87; r2ss10 = 0.95. Note that lower explained variance for set size 1 arises from both data and model predictions having near-constant precision.

      In Experiment 2, we calculated r2 between observations and predictions across presentation durations, separately for each set size, resulting in the following estimates: r2ss1 = 0.88; r2ss4 = 0.71; r2ss10 = 0.70. Note that in this case the decreasing percentage of explained variance with set size is a consequence of having less variability in both data and model predictions with larger set sizes.

      While these estimates suggest that the DyNR model effectively fits the psychophysical data, a more rigorous validation approach would involve cross-validation checks across all conditions with a withheld portion of trials. Regrettably, due to the large number of conditions in each experiment, we could only collect 50 trials per condition. We are sceptical that fitting the model to even fewer trials, as necessary for cross-validation, would provide a reliable assessment of model performance.

      Minor: It isn't clear to me why the behavioral tasks are shown in Figure 6. They are important for understanding the results and are discussed earlier in the manuscript (before Figure 3). This just required flipping back and forth to understand the task before I could interpret the results.

      Thank you for this comment. We have now moved the behavioural task figure to appear early in the manuscript (as Figure 3).

      Reviewer #3 (Recommendations For The Authors):

      (1) Dynamics of sensory signals during perception

      I believe that the modeled sensory signal is a reasonable simplification and different ways to model the decay function are discussed. I would like to ask the authors to discuss the implications of slightly more complex initial sensory transients such as the ones shown in Teeuwen (2021). Specifically for short exposure times, this might be particularly relevant for the model fits as some of the alternative models diverge from the data for short exposures. In addition, the role of feedforward (initial transient?) and feedback signaling (subsequent "plateau" activity) could be discussed. The first one might relate more strongly to sensory signals whereas the latter relates more to top-down attention/recurrent processing/VWM.

      Particularly, this latter response might also be sensitive to the number of items present on the screen which leads to a related question pertaining to the limitations of attention during perception. Some work suggests that perception is similarly limited in the amount of information that can be represented concurrently (Tsubomi, 2013). Could the authors discuss the implications of this hypothesis? What happens if maximum sensory amplitude is set as a free parameter in the model?

      Tsubomi, H., Fukuda, K., Watanabe, K., & Vogel, E. K. (2013). Neural limits to representing objects still within view. Journal of Neuroscience, 33(19), 8257-8263.

      Thank you for this question. Below, we unpack it and answer it point by point.

      While we agree our model of the sensory response is justified as an idealization of the biological reality, we also recognise that recent electrophysiological recordings have illuminated intricacies of neuronal responses within the striate cortex, a critical neural region associated with sensory memory (Teeuwen et al, 2021). Notably, these recordings reveal a more nuanced pattern where neurons exhibit an initial burst of activity succeeded by a lower plateau in firing rate, and stimulus offset elicits a second small burst in the response of some neurons, followed by a gradual decrease in activity after the stimulus disappears (Teeuwen et al, 2021).

      In general, asynchronous bursts of activity in individual neurons will tend to average out in the population making little difference to predictions of the DyNR model. Synchronized bursts at stimulus onset could affect predictions for the shortest presentations in Exp 2, however the model appears to capture the data very well without including them. We would be wary of incorporating these phenomena into the model without more clarity on their universality (e.g., how stimulus-dependent they are), their significance at the population level (as opposed to individual neurons), and most importantly, their prominence in visual areas outside striate cortex. Specifically, while Teeuwen et al. (2021) described activity in V1, our model does not make strong assumptions about which visual areas are the source of the sensory input to WM. Based on these uncertainties we believe the idealized sensory response is justified for use in our model.

      Next, thank you for the comment on feedforward and feedback signals. We have added the following to our manuscript:

      “Following onset of a stimulus, the visual signal ascends through visual areas via a cascade of feedforward connections. This feedforward sweep conveys sensory information that persists during stimulus presentation and briefly after it disappears (Lamme et al., 1998). Simultaneously, reciprocal feedback connections carry higher-order information back towards antecedent cortical areas (Lamme and Roelfsema, 2000). In our psychophysical task, feedback connections likely play a critical role in orienting attention towards the cued item, facilitating the extraction of persisting sensory signals, and potentially signalling continuous information on the available resources for VWM encoding. While our computational study does not address the nature of these feedforward and feedback signals, a challenge for future research is to describe the relative contributions of these signals in mediating transmission of information between sensory and working memory (Semedo et al., 2022).”

      Lamme, V. A., Supèr, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in the visual cortex. Current Opinion in Neurobiology, 8(4), 529–535. https://doi.org/10.1016/S0959-4388(98)80042-1

      Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neurosciences, 23(11), 571–579. https://doi.org/10.1016/S0166-2236(00)01657-X

      Semedo, J. D., Jasper, A. I., Zandvakili, A., Krishna, A., Aschner, A., Machens, C. K., Kohn, A., & Yu, B. M. (2022). Feedforward and feedback interactions between visual cortical areas use different population activity patterns. Nature Communications, 13(1), 1099. https://doi.org/10.1038/s41467-022-28552-w

      Finally, both you and Reviewer 2 raised a similar interesting question regarding capacity limitations of attention during perception Such a limitation could be modelled by freely estimating sensory amplitude and implementing divisive normalization to that signal, similar to how VWM is constrained. We can consider two potential mechanisms through which divisive normalization might be incorporated into sensory processing within the DyNR model.

      The first possibility involves assuming that normalization is pre-attentive. In this scenario, the sensory activity of each object would be rescaled at the lowest level of sensory processing, occurring before the allocation of attentional or VWM resources. One strong prediction of such an implementation is that recall error in the simultaneous cue condition (Experiment 1) should vary with set size. However, this prediction is inconsistent with the observed data, which failed to show a significant difference between set sizes, and is more closely aligned with the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). On that basis, we anticipate that introducing normalization as a pre-attentive mechanism would impair the model fit.

      An alternative scenario is to consider normalization as post-attentive. In the simultaneous cueing condition, only one item is attended (i.e., the cued one), regardless of the displayed set size. Here, we would expect normalized activity for a single item, regardless of the number of presented objects, which would then be integrated into VWM. This expanded DyNR model with post-attentive normalization would make exactly the same predictions as the proposed DyNR for recall fidelity, so distinguishing between these models would not be possible based on working memory experiments.

      To acknowledge the possibility that sensory signals could undergo divisive normalization and to motivate future research, we have added the following to our manuscript:

      “As well as being implicated in higher cognitive processes including VWM (Buschman et al, 2011; Sprague et al., 2014), divisive normalization has been shown to be widespread in basic sensory processing (Bonin et al., 2005; Busse et al., 2009; Ni et al., 2017). The DyNR model presently incorporates the former but not the latter type of normalization. While the data observed in our experiments do not provide evidence for normalization of sensory signals (note comparable recall errors across set size in the simultaneous cue condition of Experiment 1), this may be because sensory suppressive effects are localized and our stimuli were relatively widely separated in the visual field: future research could explore the consequences of sensory normalization for recall from VWM using, e.g., centre-surround stimuli (Bloem et al., 2018).”

      Bloem, I. M., Watanabe, Y. L., Kibbe, M. M., & Ling, S. (2018). Visual Memories Bypass Normalization. Psychological Science, 29(5), 845–856. https://doi.org/10.1177/0956797617747091

      Bonin, V., Mante, V., & Carandini, M. (2005). The Suppressive Field of Neurons in Lateral Geniculate Nucleus. The Journal of Neuroscience, 25(47), 10844–10856. https://doi.org/10.1523/JNEUROSCI.3562-05.2005

      Buschman, T. J., Siegel, M., Roy, J. E., & Miller, E. K. (2011). Neural substrates of cognitive capacity limitations. Proceedings of the National Academy of Sciences, 108(27), 11252–11255. https://doi.org/10.1073/pnas.1104666108

      Busse, L., Wade, A. R., & Carandini, M. (2009). Representation of Concurrent Stimuli by Population Activity in Visual Cortex. Neuron, 64(6), 931–942. https://doi.org/10.1016/j.neuron.2009.11.004

      Ni, A. M., & Maunsell, J. H. R. (2017). Spatially tuned normalization explains attention modulation variance within neurons. Journal of Neurophysiology, 118(3), 1903–1913. https://doi.org/10.1152/jn.00218.2017

      Sprague, T. C., Ester, E. F., & Serences, J. T. (2014). Reconstructions of Information in Visual Spatial Working Memory Degrade with Memory Load. Current Biology, 24(18), 2174–2180. https://doi.org/10.1016/j.cub.2014.07.066

      (2) Effectivity of retro-cues at long delays

      Can the authors discuss how cues presented at long delays (>1000 ms) can still lead to increased memory fidelity when sensory signals are likely to have decayed? A list of experimental work demonstrating this can be found in Souza & Oberauer (2016).

      Souza, A. S., & Oberauer, K. (2016). In search of the focus of attention in working memory: 13 years of the retro-cue effect. Attention, Perception, & Psychophysics, 78, 1839-1860.

      The increased memory fidelity observed with longer delays between memory array offset and cue does not result from integrating available sensory signals into VWM because the sensory signal would have completely decayed by that time. Instead, research so far has indicated several alternative mechanisms that could lead to higher recall precision for cued items, and we can briefly summarize some of them, which are also reviewed in more detail in Souza and Oberauer (2016).

      One possibility is that, after a highly predictive retro-cue indicates the to-be-tested item, uncued items can simply be removed from VWM. This could result in decreased interference for the cued item, and consequently higher recall precision. Secondly, the retro-cue could also indicate which item can be selectively attended to, and thereby differentially strengthening it in memory. Furthermore, the retro-cue could allow evidence to accumulate for the target item ahead of decision-making, and this could increase the probability that the correct information will be selected for response. Finally, the retro-cued stimulus could be insulated from interference by subsequent visual input, while the uncued stimuli may remain prone to such interference.

      A neural account of this retro-cue effect based on the original neural resource model has been proposed in Bays & Taylor, Cog Psych, 2018. However, as we did not use a retro-cue design in the present experiments, we have decided not to elaborate on this in the manuscript.

      (3) Swap errors

      I am somewhat surprised by the empirically observed and predicted pattern of swap errors displayed in Figure S2. For set size 10, swap probability does not consistently increase with the duration of the retention interval, although this was predicted by the author's model. At long intervals, swap probability is significantly higher for large compared to small set sizes, which also seems to contrast with the idea of shared, limited VWM resources. Can the authors provide some insight into why the model fails to reproduce part of the behavioral pattern for swap errors? The sentence in line 602 might also need some reconsideration in this regard.

      Determining the ground truth for swap errors poses a challenge. The prevailing approach has been to employ a simpler model that estimates swap errors, such as a three-component mixture model, and use those estimates as a proxy for ground truth. However, this method is not without its shortcomings. For example, the variability of swap frequency estimates tends to increase with variability in the report feature dimension (here, orientation). This is due to the increasing overlap of response probability distributions for swap and non-swap responses. Consequently, the discrepancy between any two methods of swap estimation is most noticeable when there is substantial variability in orientation reports (e.g., 10 items and long delay or short exposure).

      When modelling swap frequency in the DyNR model, our aim was to provide a parsimonious account of swap errors while implementing similar dynamics in the spatial (cue) feature as in the orientation (report) feature. This parametric description captured the overall pattern of swap frequency with set size and retention and encoding time, but is still only an approximation of the predictions if we fully modelled memory for the conjunction of cue and report features (as in e.g. Schneegans & Bays, 2017; McMaster et al, 2020).

      We expanded the existing text in the section ‘Representational dynamics of cue-dimension features’ of our manuscript:

      “… Although we did not explicitly model the neural signals representing location, the modelled dynamics in the probability of swap errors were consistent with those of the primary memory feature. We provided a more detailed neural account of swap errors in our earlier works that is theoretically compatible with the DyNR model (McMaster et al., 2020; Schneegans & Bays, 2017).

      The DyNR model successfully captured the observed pattern of swap frequencies (intrusion errors). The only notable discrepancy between DyNR and the three-component mixture model (Fig. S2) arises with the largest set size and longest delay, although with considerable interindividual variability. As the variability in report-dimension increases, the estimates of swap frequency become more variable due to the growing overlap between the probability distributions of swap and non-swap responses. This may explain apparent deviations from the modelled swap frequencies with the highest set size and longest delay where orientation response variability was greatest. “

      McMaster, J. M. V., Tomić, I., Schneegans, S., & Bays, P. M. (2022). Swap errors in visual working memory are fully explained by cue-feature variability. Cognitive Psychology, 137, 101493. https://doi.org/10.1016/j.cogpsych.2022.101493

      Schneegans, S., & Bays, P. M. (2017). Neural Architecture for Feature Binding in Visual Working Memory. The Journal of Neuroscience, 37(14), 3913–3925. https://doi.org/10.1523/JNEUROSCI.3493-16.2017

      (4) Direct sensory readout

      The model assumes that readout from sensory memory and from VWM happens with identical efficiency. Currently, we don't know if these two systems are highly overlapping or are fundamentally different in terms of architecture and computation. In the case of the latter, it might be less reasonable to assume that information readout would happen at similar efficiencies, as it is currently assumed in the manuscript. Perhaps the authors could briefly discuss this possibility.

      In the direct sensory read-out model, we did not explicitly model the efficiency of readout from either sensory or VWM store. However, the distinctive prediction of this model is that the precision of recall changes exponentially with delay at every set size, including one item. This prediction does not depend on the relative efficiency of readout from sensory and working memory, but only on the principle that direct readout from sensory memory bypasses the capacity limit on working memory. This prediction is inconsistent with the pattern of results observed in Experiment 1, where early cues did not show a beneficial effect on recall error for set size 1. While the proposal raised by the reviewer is intriguing, even if we were to model the process of readout from both the sensory and VWM stores with different efficiencies, the direct read-out model could not account for the near-constant recall error with delay for set size one.

      (5) Encoding of distractors

      One of the model assumptions is that, for simultaneous presentations of memory array and cue only the cued feature will be encoded. Previous work has suggested that participants often accidentally encode distractors even when they are cued before memory array onset (Vogel 2005). Given these findings, how reasonable is this assumption in the authors' model?

      Vogel, E. K., McCollough, A. W., & Machizawa, M. G. (2005). Neural measures reveal individual differences in controlling access to working memory. Nature, 438(7067), 500-503.

      Although previous research suggested that observers can misinterpret the pre-cue and encode one of the uncued items, our results argue against this being the case in the current experiment. Such encoding failures would manifest in overall recall error, resulting in a gradient of error with set size, owing to the presence of more adjacent distractors in larger set sizes. However, when we compared recall errors between set sizes in the simultaneous cue condition, we did not find a significant difference between set sizes, and moreover, our results were more likely under the hypothesis of no-difference (F(2,18) = 1.26, p = .3, η2 = .04, BF10 = 0.47). If observers occasionally encoded and reported one of the uncued items in the simultaneous cue condition, those errors were extremely infrequent and did not affect the overall error distributions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al., investigated the relationship between monocular and binocular responses of V1 superficial-layer neurons using two-photon calcium imaging. They found a strong relationship in their data: neurons that exhibited a greater preference for one eye or the other (high ocular dominance) were more likely to be suppressed under binocular stimulation, whereas neurons that are more equivalently driven by each other (low ocular dominance) were more likely to be enhanced by binocular stimulation. This result chiefly demonstrates the relationship between ocular dominance and binocular responses in V1, corroborating what has been shown previously using electrophysiological techniques but now with greater spatial resolution (albeit less temporal resolution). The binocular responses were well-fitted by a model that institutes divisive normalization between the eyes that accounts for both the suppression and enhancement phenomena observed in the subpopulation of binocular neurons. In so doing, the authors reify the importance of incorporating ocular dominance in computational models of binocular combination.

      The conclusions of this paper are mostly well supported by the data, but there are some limitations of the methodology that need to be clarified, and an expansion of how the results relate to previous work would better contextualize these important findings in the literature.

      Strengths:

      The two-photon imaging technique used to resolve the activity of individual neurons within intact brain tissue grants a host of advantages. Foremost, two-photon imaging confers considerably high spatial resolution. As a result, the authors were able to sample and analyze the activity from thousands of verified superficial-layer V1 neurons. The animal model used, awake macaques, is also highly relevant for the study of binocular combination. Macaques, like humans, are binocular animals, meaning they have forward-facing eyes that confer overlapping visual fields. Importantly, macaque V1 is organized into cortical columns that process specific visual features from the separate eyes just like in humans. In combination with a powerful imaging technique, this allowed the authors to evaluate the monocular and binocular response profiles of V1 neurons that are situated within neighboring ocular dominance columns, a novel feat. To this aim, the approach was well-executed and should instill further confidence in the notion that V1 neurons combine monocular information in a manner that is dependent on the strength of their ocular dominance.

      Weaknesses:

      While two-photon imaging provides excellent spatial resolution, its temporal resolution is often lower compared to some other techniques, such as electrophysiology. This limits the ability to study the fast dynamics of neuronal activity, a well-understood trade-off of the method. The issue is more so that the authors draw comparisons to electrophysiological studies without explicit appreciation of the temporal difference between these techniques. In a similar vein, two-photon imaging is limited spatially in terms of cortical depth, preferentially sampling from neurons in layers 2/3. This limitation does not invalidate any of the interpretations but should be considered by readers, especially when making comparisons to previous electrophysiological reports using microelectrode linear arrays that sample from all cortical layers. Indeed, it is likely that a complete picture of early cortical binocular processing will require high spatial resolution (i.e., sampling from neurons in neighboring ocular dominance columns, from pia mater to white matter) at the biophysically relevant timescales (1ms resolution, capturing response dynamics over the full duration of the stimulus presentation, including the transient onset and steady-state periods).

      To address the same concern from all three reviewers, we discussed the technical limitations of two photon calcium imaging at the end of Discussion, including limited imaging depth, low temporal resolution, and nonlinearity. The relevant texts are copied here:

      (Ln 304) “Limitations of the current study

      Although capable of sampling a large number of neurons at cellular resolution and with low sampling bias, two-photon calcium imaging has its known limitations that may better make it a complementary research tool to electrophysiological recordings.

      For example, two-photon imaging can only sample neurons from superficial-layers, while binocular neurons also exist in deeper layers, and even neurons in the input layer are affected by feedback from downstream binocular neurons to exhibit binocular response properties (Dougherty, Cox, Westerberg, & Maier, 2019). Furthermore, calcium signals are relatively slow and cannot reveal the fast dynamics of neuronal responses. Due to these spatial and temporal limitations, a more complete picture of the neuronal mechanisms underlying binocular combination of monocular responses may come from studies using both technologies.

      In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although calcium signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rates within a range of 10-150 Hz (Li, Liu, Jiang, Lee, & Tang, 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the differences in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      (Recommendations For The Authors):

      Overall, my main suggestion for the authors to improve the paper is to revise some of the interpretations of their results in relation to previous research. The purpose of the present study was to illustrate a more complete picture of the binocular combination of monocular responses by taking into consideration the ocular dominance of V1 cells (lines 34-36). A study published earlier this year had an identical purpose (Mitchell et al., Current Biology, 2023) and arrived at a highly similar conclusion (and also applied divisive normalization to fit their data). I would ask that this paper be mentioned in the introduction and discussed.

      The Mitchell et al 2023 paper is added to the Introduction and Discussion:

      (Ln 50) “In addition (to the Dougherty et al 2019 paper from the same group), Mitchell, Carlson, Westerberg, Cox, and Maier (2023) reported that binocular combination of monocular stimuli with different contrasts is also affected by neurons’ eye preference.”

      (Ln 286) “The critical roles of ocular dominance have been largely overlooked by extant binocular vision models to our knowledge, except that Anderson and Movshon (1989) demonstrated that a model consisting of multiple ocular dominance channels can better explain their psychophysical adaptation data, and that Mitchell et al. (2023) revealed that binocular combination of different contrasts presented to different eyes are affected by neurons’ ocularity preference.”

      Nevertheless, the results of the present study are very valuable. They add substantial spatial resolution and sophisticated relational analysis of monocular and binocular responses that Mitchell et al., 2023 did not include. Therefore, my suggestion is to emphasize the advantages of two-photon imaging in the introduction, focusing on the ability to image neurons in neighboring ocular dominance columns. The rigorous modeling of the relationship between nearby neurons with a range of eye preferences, in tandem with the incredible yield of two-photon imaging, is what sets this paper apart from previous electrophysiological work.

      The finding that binocular responses were dependent on ocular dominance is largely consistent with previous electrophysiological results. However, there should be a paragraph in the discussion section that speaks to the limitations of comparing two-photon imaging data to electrophysiological data. Namely, there are two limitations:

      (1) These two techniques confer different temporal resolutions. It is conceivable that some of the electrophysiology relationships (for example, described by Dougherty et al., 2019) may be dependent on the temporal window over which the data was averaged, typically over 50-100ms around stimulus onset, or 100-250ms comprising the neurons' sustained response to the stimulus. This possible explanation of the difference in obtained results would be especially useful for the discussion paragraph starting at line 232. It would also be helpful to readers for there to be some mention of the advantage of having high temporal resolution (i.e., the benefits of electrophysiology) since (a) recent work has distinguished between sequential stages of binocular combination (Cox et al., 2019) and (b) modern models of V1 neurons emphasize recurrent feedback to explain V1 temporal dynamics (see Heeger et al., 2019; Rubin et al., 2015), which could prove to be relevant for combination of stimuli in the two eyes (Fleet et al., 1997).

      Our discussion regarding the technical limitations of 2-p calcium imaging has been listed earlier. Specific to the Dougherty et 2019 paper, we added the following discussion to address the issue of temporal resolution difference between two technologies.

      (Ln 266) “In addition, it is unclear whether the discrepancies are caused by different temporal resolutions of electrode recording and calcium imaging. The results of Dougherty et al. (2019) represent changes of neuronal spike activities over a period of approximately 50-200 ms after the stimulus onset, which may reflect the sustained neuronal responses to the stimulus and possible feedback signals. Calcium signals are much slower and indicative of the aggregated neuronal responses over a longer period (up to 1000 ms in the current study). They should have smeared, rather than exaggerated, the differences between monocular and binocular responses, although we cannot exclude the possibility that some neuronal response changes beyond 200 ms are responsible for the discrepancies.”

      (2) The sample of V1 neurons in this study is limited to cells in the most superficial layers of the cortex (layers 2/3). This limitation is, of course, well understood, but it should be mentioned at least in the context of studying the formative mechanisms of binocular combination in V1 (since we know that binocular neurons also exist in layers 5/6, and there is now substantial evidence that even layer 4 neurons are not as "monocular" as we previously thought (Dougherty et al., 2019)).

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      In short, I believe the paper would be improved by (1) adding the above citations in the appropriate places, (2) acknowledging in the introduction that this question has been investigated electrophysiologically but emphasizing the advantages of two-photon imaging, and (3) adding a paragraph to the discussion section that discusses the temporal and spatial limitations when using two-photon imaging to study binocular combination, particularly when comparing the results to electrophysiology.

      Reviewer #2 (Public Review):

      Summary:

      This study examines the pattern of responses produced by the combination of left-eye and right-eye signals in V1. For this, they used calcium imaging of neurons in V1 of awake, fixating monkeys. They take advantage of calcium imaging, which yields large populations of neurons in each field of view. With their data set, they observe how response magnitude relates to ocular dominance across the entire population. They analyze carefully how the relationship changed as the visual stimulus switched from contra-eye only, ipsi-eye only, and binocular. As expected, the contra-eye-dominated neurons responded strongly with a contra-eye-only stimulus. The ipsi-eye-dominated neurons responded strongly with an ipsi-eye-only stimulus. The surprise was responses to a binocular stimulus. The responses were similarly weak across the entire population, regardless of each neuron's ocular dominance. They conclude that this pattern of responses could be explained by interocular divisive normalization, followed by binocular summation.

      Strengths:

      A major strength of this work is that the model-fitting was done on a large population of simultaneously recorded neurons. This approach is an advancement over previous work, which did model-fitting on individual neurons. The fitted model in the manuscript represents the pattern observed across the large population in V1, and washes out any particular property of individual neurons. Given the large neuronal population from which the conclusion was drawn, the authors provide solid evidence supporting their conclusion. They also observed consistency across 5 fields of view.

      The experiments were designed and executed appropriately to test their hypothesis. Their data support their conclusion.

      Weaknesses:

      One weakness of their study is that calcium signals can exaggerate the nonlinear properties of neurons. Calcium imaging renders poor responses poorer and strong responses stronger, compared to single-unit recording. In particular, the dramatic change in the population response between monocular stimulation and binocular stimulation could actually be less pronounced when measured with single-unit recording methods. This means their choice of recording method could have accidentally exaggerated the evidence of their finding.

      We discussed the nonlinearity of calcium signals as part of the technical limitations of 2-p imaging calcium. The calcium indicator we use, GCaMP5, has a reasonable range of linear relationship with spike rates. But out of this range, the nonlinearity is indeed a concern.

      (Ln 314) “In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rate within a range of 10-150 Hz (Li et al., 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the changes in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      The implication of their finding is that strong ocular dominance is the result of release from interocular suppression by a monocular stimulus, rather than the lack of binocular combination as many traditional studies have assumed. This could significantly advance our understanding of the binocular combination circuitry of V1. The entire population of neurons could be part of a binocular combination circuitry present in V1.

      This is a very good insight. We added the following sentences to the end of the first paragraph of Discussion:

      (Ln 242) “These findings implicate that at least for neurons in superficial layers of V1, significant ocular dominance may result from a release of interocular suppression during monocular stimulation, an unusual viewing condition as our vision is typically binocular, rather than a lack of binocular combination of inputs from upstream monocular neurons.”

      (Recommendations For The Authors):

      Line 150: "To model interocular response suppression, responses from each eye in Eq. 2 were further normalized by an interocular suppression factor wib or wcb," I recommend the authors improve their explanation of how they arrived at Eq. 3 from Eq. 2. As it stands, my impression is that they have one model for the responses to monocular stimulation, and another model for the responses to binocular stimulation. What I think is missing is that both equations are derived from the same model. Monocular stimulation is a situation in which the stimulus in one eye's contrast is zero. Could the authors clarify whether this situation produces an interocular suppression of zero, and how that leads to Eq. 2?

      We rewrote the modeling part to show that Equations 1-3 are sequential steps of development for the same model. We also added a brief paragraph to discuss how Eq. 3 could lead to Eq. 2 under monocular viewing:

      (Ln 166) “Although not shown in Eq. 3, we also assumed that the nonlinear exponent b also depends on the contrast of the stimulus presented to the other eye (i.e., Sc or Si). Consequently, when Sc or Si = 0 under monocular stimulation, Rc or Ri = 0 (Eq. 1), and interocular suppression wib or wcb = 1, so Eq. 3 changes back to Eq. 2. It is only when Sc and Si are equal and close to 1, as in the current study, that interocular suppression and binocular combination would be in the current Eq. 3 format.”

      Line 225: "However, individually, compared to monocular responses, responses of monocular neurons more preferring the stimulated eye are actually suppressed, and only responses of binocular neurons are increased by binocular stimulation." This sentence is difficult to follow. I recommend the authors improve clarity by breaking up the sentence into several sentences. If I understand correctly, they summarize the pattern in the data that is indicative of interocular divisive normalization, i.e., their final conclusion.

      This sentence no longer exists in the Discussion.

      Line 426: "Third, for those showing significant orientation difference, the trial-based orientation responses of each neuron were fitted with a Gaussian model with a MATLAB nonlinear least squares function:" The choice of using a Gaussian function to fit orientation tuning was probably suboptimal. A Gaussian function provides an adequate fit only for neurons whose tuning is very sharp. The responses outside of the peak fall down to the baseline and the two ends meet. Otherwise, the two ends do not meet. An adequate fit would be achieved with a function of a circular variable, which wraps around 180 deg. I recommend using a Von Mises function for fitting orientation tuning.

      We agree with the reviewer that the Von Mises function is more accurate than Gaussian for fitting orientation tuning functions. Indeed we are using it to fit orientation tuning of V4 neurons, many of which have two peaks. For the current V1 data, the differences between Von Mises and Gaussian fittings are very small, as shown in the orientation functional maps from three macaques below. Because we also use the same Gaussian fitting of orientation tuning in several published and current under-review papers, we prefer to keep the Gaussian fitting results in the manuscript.

      Author response image 1.

      Reviewer #3 (Public Review):

      The authors have made simultaneous recordings of the responses of large numbers of neurons from the primary visual cortex using optical two-photon imaging of calcium signals from the superficial layers of the cortex. Recordings were made to compare the responses of the cortical neurons under normal binocular viewing of a flat screen with both eyes open and monocular viewing of the same screen with one eye's view blocked by a translucent filter. The screen displayed visual stimuli comprising small contrast patches of Gabor function distributions of luminance, a stimulus that is known to excite cortical neurons.

      This is an important data set, given the large numbers of neurons recorded. The authors present a simple model to explain the binocular combination of neuronal signals from the right and left eyes.

      The limitations of the paper as written are as follows. These points can be addressed with some additional analysis and rewriting of sections of the paper. No new experimental data need to be collected.

      (1) The authors should acknowledge the fact that these recordings arise from neurons in the superficial layers of the cortex. This limitation arises from the usual constraints on optical imaging in the macaque cortex. This means that the sample of neurons forming this data set is not fully representative of the population of binocular neurons within the visual cortex. This limitation is important in comparing the outcome of these experiments with the results from other studies of binocular combination, which have used single-electrode recording. Electrode recording will result in a sample of neurons that is drawn from many layers of the cortex, rather than just the superficial layers.

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      (2) Single-neuron recording of binocular neurons in the primary visual cortex has shown that these neurons often have some spontaneous activity. Assessment of this spontaneous level of firing is important for accurate model fitting [1]. The paper here should discuss the level of spontaneous neuronal firing and its potential significance.

      We have noticed previously that at non-optimal spatial frequencies, calcium responses to a moving Gabor grating are close to zero (Guan et al., Prog Neurobiology, 2021, Fig. 1B), but we cannot tell whether this is due to calcium response nonlinearity, or a close-to-zero level of spontaneous neuronal activity. Prince et al (2002) reported low spontaneous responses of V1 neurons with moving grating stimuli (e.g., about 3 spikes/sec in one exemplar neuron, their Fig. 1B), so this appears not a big effect. In our data fitting, we do have an orientation-unspecific component in the Gaussian model, which represents the neuronal response at a non-preferred orientation, but not necessarily the spontaneous activity.

      (3) The arrangements for visual stimulation and comparison of binocular and monocular responses mean that the stereoscopic disparity of the binocular stimuli is always at zero or close to zero. The animal's fixation point is in the centre of a single display that is viewed binocularly. The fixation point is, by definition, at zero disparity. The other points on the flat display are also at zero disparity or very close to zero because they lie in the same depth plane. There will be some small deviations from exactly zero because the geometry of the viewing arrangements results in the extremities of the display being at a slightly different distance than the centre. Therefore, the visual stimulation used to test the binocular condition is always at zero disparity, with a slight deviation from zero at the edges of the display, and never changes. [There is a detail that can be ignored. The experimenters tested neurons with visual stimulation at different real distances from the eyes, but this is not relevant here. Provided the animals accurately converged their eyes on the provided binocular fixation point, then the disparity of the visual stimuli will always be at or close to zero, regardless of viewing distance in these circumstances.] However, we already know from earlier work that neurons in the visual cortex exhibit a range of selectivity for binocular disparity. Some neurons have their peak response at non-zero disparities, representing binocular depths nearer than the fixation depth or beyond it. The response of other neurons is maximally suppressed by disparities at the depth of the fixation point (so-called Tuned Inhibitory [TI] neurons). The simple model and analysis presented in the paper for the summation of monocular responses to predict binocular responses will perform adequately for neurons that are tuned to zero disparity, so-called tuned excitatory neurons [TE], but is necessarily compromised when applied to neurons that have other, different tuning profiles. Specifically, when neurons are stimulated binocularly with a non-preferred disparity, the binocular response may be lower than the monocular response[2, 3]. This more realistic view of binocular responses needs to be considered by the authors and integrated into their modelling.

      We agree and include the following texts when discussing the future work:

      (Ln 298) “In addition, in our experiments, binocular stimuli were presented with zero disparity, which best triggered the responses of neurons with zero-disparity tuning. A more realistic model of binocular combination also requires the consideration of neurons with other disparity-tuning profiles.”

      (4) The data in the paper show some features that have been reported before but are not captured by the model. Notably for neurons with extreme values of ocular dominance, the binocular response is typically less than the larger of the two monocular responses. This is apparent in the row of plots in Figure 2D from individual animals and in the pooled data in Figure 2E. Responses of this type are characteristic of tuned inhibitory [TI] neurons[2]. It is not immediately clear why this feature of the data does not appear in the summary and analysis in Figure 3.

      This difference is indeed captured by the model, which can be more easily appreciated in Fig. 4A where monocular and binocular model simulations are plotted in the same panel. In the text, we also wrote: (Ln 195) “It is apparent that binocular responses cannot be explained by the sum of monocular responses, as binocular responses are substantially lower than the summed monocular responses for both monocular and binocular neurons. Nor can binocular responses be explained by the responses to the preferred eye, as binocular responses are also lower than those to the preferred eye (the larger of the two monocular responses) for monocular neurons.”

      The paper text states that the responses were "first normalized by the median of the binocular responses". This will certainly get rid of this characteristic of the data, but this step needs better justification, or an amendment to the main analysis is needed.

      The relevant sentence has been rewritten as “Monocular and binocular data of each FOV/depth, as well as the pooled data, were first normalized by the respective median of the binocular responses of all neurons in the same FOV/depth.” This normalization would render the overall binocular responses to be around unity, for the purpose of facilitating comparisons among all FOV/depth, but it would not affect the overall characteristic of the data.

      In the present form, the model and analysis do not appear to fit the data in Figure 2 as accurately as needed.

      Thanks for pointing out the problem, as data fitting for FOV C_270 and the pooled data were especially inaccurate. The issue has been mostly fixed when each datum was weighted by its standard deviation (please see the updated Fig. 3).

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Zeng and Staley provide a valuable analysis of the molecular requirements for the export of a reporter mRNA that contains a lariat structure at its 5' end in the budding yeast S. cerevisiae. The authors provide evidence that this is regulated by the main mRNA export machinery (Yra1, Mex67, Nab2, Npl3, Tom1, and Mlp1). Of note, Mlp1 has been mainly implicated in the nuclear retention of unspliced pre-mRNA (i.e. quality control), and relatively little has been done to investigate its role in mRNA export in budding yeast.

      Strengths:

      There is relatively little information in the current literature about the nuclear export of splicing intermediates. This paper provides one of the first analyses of this process and dissects the molecular components that promote this form of RNA export. Overall, the strength of the data presented in the manuscript is solid. The paper is well written and the message is clear and of general interest to the mRNA community.

      We thank the reviewer for highlighting these strengths.

      Weaknesses:

      There are three problems with the paper, although these are not major and likely would not affect the final model as most aspects of the molecular details are confirmed by multiple complementary assays.

      (1) The brG reporter produces both unspliced pre-mRNA and a lariat-containing intermediate RNA. Based on the primer extension assay the authors claim that only 33% of the final product is in pre-mRNA form and that this "is insufficient to account for the magnitude of the cytoplasmic signal from the brG reporter (83%)". Nevertheless, it is possible that primer extension is incomplete or that the lariat-containing RNA is inaccessible for smFISH. The authors could easily perform a dual smFISH experiment (similar to Adivarahan et l., Molecular Cell 2018) where exon 1 is labelled with probes of one color, and the region that overlaps the lariat-containing intermediate is labelled with probes of a second color. If the authors are correct, then one-third of the smFISH foci should have both labels and the rest would have only the second label. This would also confirm that the latter (i.e. the lariat-containing RNAs) are exported to the cytoplasm. Using this approach, the authors could then show that MLP1-depletion (or depletion of any of the other factors) affect(s) one pool of RNAs (i.e. those that are lariat-containing) but not the other (i.e. pre-mRNA). Including these experiments would make the evidence for their model more convincing.

      We appreciate the reviewer’s comments and suggestions. Concerning the primer extension analysis, we are considering alternative assays to quantitate the pre-mRNA and lariat intermediate levels. Concerning the accessibility of the lariat intermediate in smRNA-FISH, in a dbr1∆ strain the only major species from the UAc reporter that is detected by primer extension is the lariat intermediate (Fig. S3), and this reporter is readily detected by smRNA-FISH, indicate that the lariat intermediate is accessible to smRNA-FISH. Concerning discriminating between pre-mRNA and lariat intermediate by smRNA-FISH, we agree with the reviewer that a dual smFISH experiment would directly distinguish between the signals of these species. The brG reporter we used in most smRNA-FISH experiments has a 5’ exon that is too short for smRNA-FISH probes, as is typical of most budding yeast 5’ exons. We have tried to replace the 5’ exon with a longer sequence (GFP) to allow for smRNA-FISH; however, this substitution inhibited splicing. Therefore, to distinguish signals from pre-mRNA versus lariat intermediate, we used additional reporters: G1c and brC reporters, which accumulate pre-mRNA essentially exclusively (Fig. S2A-C), and the UAc reporter, which accumulates lariat intermediate exclusively, in a dbr1∆ strain (Fig. S3). Whereas the mlp1 deletion did not change beta-galactosidase activities of the G1c and brC pre-mRNA-accumulating reporters (Fig. S2E), the mlp1 deletion in a dbr1∆ background did reduce the beta-galactosidase activities of the UAc lariat intermediate-accumulating reporter (Fig. 3D) and did increase smRNA-FISH signal of this reporter in the nucleus (Fig. 3E). These observations corroborate our interpretation based on the brG reporter that Mlp1p is required for efficient export of lariat intermediates but not pre-mRNAs.

      (2) In some cases, the number of smFISH foci appears to change drastically depending on the genetic background. This could either be due to the stochastic nature of mRNA expression between cells or reflect real differences between the genetic backgrounds that could alter the interpretation of the other observations.

      We thank the reviewer for raising this point. We will review our data to distinguish between these possibilities.

      (3) The authors state in the discussion that "the general mRNA export pathway transports discarded lariat intermediates into the cytoplasm". Although this appears to be the case for the reporters that are investigated in this paper, I don't think that the authors should make such a broad sweeping claim. It may be that some discarded lariat intermediates are exported to the cytoplasm while others are targeted for nuclear retention and/or decay.

      The reviewer’s point is well-taken. We will revise the wording accordingly.

      Reviewer #2 (Public Review):

      In this report, Zeng and Staley have used an elegant combination of RNA imaging approaches (single molecule FISH), RNA co-immunoprecipitations, and translation reporters to characterize the factors and pathways involved in the nuclear export of splicing intermediates in budding yeast. Their study notably involves the use of specific reporter genes, which lead to the accumulation of pre-mRNA and lariat species, in a battery of mutants impacting mRNA export and quality control.

      The authors convincingly demonstrate that mRNA species expressed from such reporters are exported to the cytoplasm in a manner depending on the canonical mRNA export machinery (Mex67 and its adaptors) and the nuclear pore complex (NPC) basket (Mlp1). Interestingly, they provide evidence that the export of splicing intermediates requires docking and subsequent undocking at the nuclear basket, a step possibly more critical than for regular mRNAs.

      We thank the reviewer for this overall positive assessment.

      However, their assays do not always allow us to define whether the impacted mRNA species correspond to lariats and/or pre-mRNAs. This is all the more critical since their findings apparently contradict previous reports that supported a role for the nuclear basket in pre-mRNA quality control. These earlier studies, which were similarly based on the use of dedicated yet distinct reporters, had found that the nuclear basket subunit Mlp1, together with different cofactors, prevents the export of unspliced mRNA species. It would be important to clarify experimentally and discuss the possible reasons for these discrepancies.

      It is true that we did not assess export of all reporters in all mutant strains by smFISH; however, we did validate the key conclusion that the export of lariat intermediates requires the nuclear basket gene MLP1: the export of both the brG reporter (mostly lariat intermediate) and the UAc reporter (exclusively lariat intermediate) showed a dependence on MLP1 (Fig. 3). Further, by beta-galactosidase activity, we tested in total five separate reporters – three that accumulated lariat intermediate and two that accumulated exclusively pre-mRNA; only the three reporters accumulating lariat intermediate showed a dependence of export on MLP1 (Fig. 4B,D; Fig S2D); the reporters accumulating pre-mRNA did not show a dependence on MLP1 (Fig. S2E), further validating our main conclusion. We are considering additional experiments to validate this key conclusion even further. Also, see response to comment 1 from reviewer 1.

      We agree that the main conclusion from this manuscript differs from earlier studies. A key difference is that prior studies monitored exclusively pre-mRNA. In our study, we monitored pre-mRNA and lariat intermediate species and in doing so revealed a role for MLP1 in the export of lariat intermediates. This study, our previous study, as well as the previous studies of others have all provided evidence for efficient export of pre-mRNA; all of these studies are in conflict with the studies purporting a general role for the nuclear basked in retaining immature mRNA. Still, these past apparently conflicting studies can be re-interpreted in the context of our model that the export of such species requires docking at the nuclear basket, followed by undocking. In a revised manuscript, we will discuss the possibility that pre-mRNA apparently “retained” by the nuclear basket are stalled in export at the undocking stage.

      Reviewer #3 (Public Review):

      Summary:

      Zeng and Stanley show that in yeast, intron-lariat intermediates that accumulated due to defects in pre-mRNA splicing, are transported to the cytoplasm using the canonical mRNA export pathway. Moreover, they demonstrate that export requires the nuclear basket, a sub-structure of the nuclear pore complex previously implicated with the retention of immature mRNAs. These observations are important as they put into question a longstanding model that the main role of the nuclear basket is to ensure nuclear retention of immature or faulty mRNAs.

      Strengths:

      The authors elegantly combine genetic, biochemical, and single-molecule resolution microscopy approaches to identify the cellular pathway that mediates the cytoplasmic accumulation of lariat intermediates. Cytoplasmic accumulation of such splicing intermediates had been observed in various previous studies but how these RNAs reach the cytoplasm had not yet been investigated. By using smFISH, the authors present compelling, and, for the first time, direct evidence that these intermediates accumulate in the cytoplasm and that this requires the canonical mRNA export pathway, including the RNA export receptor Mex67 as well as various RNA-binding proteins including Yra1, Npl3 and Nab2. Moreover, they show that the export of lariat intermediates, but not mRNAs, requires the nuclear basket (Mlp1) and basket-associated proteins previously linked to the mRNP rearrangements at the nuclear pore. This is a surprising and important observation with respect to a possible function of the nuclear basket in mRNA export and quality control, as it challenges a longstanding model that the role of the basket in mRNA export is primarily to act as a gatekeeper to ensure that immature mRNAs are not exported. As discussed by the authors, their finding suggests a role for the basket in promoting the export of certain types of RNAs rather than retention, a model also supported by more recent studies in mammalian cells. Moreover, their findings also collaborate with a recent paper showing that in yeast, not all nuclear pores contain a basket (PMID: 36220102), an observation that also questioned the gatekeeper model of the basket, as it is difficult to imagine how the basket can serve as a gatekeeper if not all nuclear pore contain such a structure.

      We thank the reviewer for highlighting the importance and surprising nature of our findings.

      Weaknesses:

      One weakness of this study is that all their experiments rely on using synthetic splicing reporter containing a lacZ gene that produces a relatively long transcript compared to the average yeast mRNA.

      We are considering repeating some of our experiments to monitor export of RNAs with more average lengths.

      The rationale for using a reporter containing the brG (G branch point) resulting in more stable lariat intermediates due to them being inefficient substrates for the debranching enzyme Dbr1 could be described earlier in the manuscript, as this otherwise only becomes clear towards the end, what is confusing.

      We thank the reviewer for this comment. We will revise the text to explain sooner the rationale for using the brG reporter to assess the export of lariat intermediates.

      Discussion of their observation in the context that, in yeast, not all pores contain a basket would be useful.

      Thanks for this suggestion. We will raise this point that a nuclear basket is not present on all nuclear pores and discuss the implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes the mechanism of protein disaggregation by the ClpL AAA+ protein of Listeria monocytogenes. Using several model subtrate proteins the authors first show that ClpL possesses a robust disaggregase activity that does not further require the endogenous DnaK chaperone in vitro. In addition, they found that ClpL is more thermostable than the endogenous L. monocytogenes DnaK and has the capacity to unfold tightly folded protein domains. The mechanistic basis for the robust disaggregase activity of ClpL was also dissected in vitro and in some cases, supported by in vivo data performed in chaperonedeficient E. coli strains. The data presented show that the two AAA domains, the pore-2 site and the N-terminal domain (NTD) of ClpL are critical for its disaggregase activity. Remarkably, grafting the NTD of ClpL to ClpB converted ClpB into an autonomous disaggregase, highlighting the importance of such a domain in the DnaK-independent disaggregation of proteins. The role of the ClpL NTD domain was further dissected, identifying key residues and positions necessary for aggregate recognition and disaggregation. Finally, using sets of SEC and negative staining EM experiments combined with conditional covalent linkages and disaggregation assays the authors found that ClpL shows significant structural plasticity, forming dynamic hexameric and heptameric active single rings that can further form higher assembly states via their middle domains.

      Strengths:

      The manuscript is well-written and the experimental work is well executed. It contains a robust and complete set of in vitro data that push further our knowledge of such important disaggregases. It shows the importance of the atypical ClpL N-terminal domain in the disaggregation process as well as the structural malleability of such AAA+ proteins. More generally, this work expands our knowledge of heat resistance in bacterial pathogens.

      Weaknesses:

      There is no specific weakness in this work, although it would have helped to have a drawing model showing how ClpL performs protein disaggregation based on their new findings. The function of the higher assembly states of ClpL remains unresolved and will need further extensive research. Similarly, it will be interesting in the future to see whether the sole function of the plasmid-encoded ClpL is to cope with general protein aggregates under heat stress.

      We thank the reviewer for the positive evaluation. We agree with the reviewer that it will be important to test whether ClpL can bind to and process non-aggregated protein substrates. Our preliminary analysis suggests that the disaggregation activity of ClpL is most relevant in vivo, pointing to protein aggregates as main target.

      We also agree that the role of dimers or tetramers of ClpL rings needs to be further explored. Our initial analysis suggests a function of ring dimers as a resting state. It will now be important to study the dynamics of ClpL assembly formation and test whether substrate presence shifts ClpL assemblies towards an active, single ring state.

      Reviewer #2 (Public Review):

      The manuscript by Bohl et al. is an interesting and carefully done study on the biochemical properties and mode of action of potent autonomous AAA+ disaggregase ClpL from Listeria monocytogenes. ClpL is encoded on plasmids. It shows high thermal stability and provides Listeria monocytogenes food-pathogen substantial increase in resistance to heat. The authors show that ClpL interacts with aggregated proteins through the aromatic residues present in its N-terminal domain and subsequently unfolds proteins from aggregates translocating polypeptide chains through the central pore in its oligomeric ring structure. The structure of ClpL oligomers was also investigated in the manuscript. The results suggest that mono-ring structure and not dimer or trimer of rings, observed in addition to mono-ring structures under EM, is an active species of disaggregase.

      Presented experiments are conclusive and well-controlled. Several mutants were created to analyze the importance of a particular ClpL domain.

      The study's strength lies in the direct comparison of ClpL biochemical properties with autonomous ClpG disaggregase present in selected Gram-negative bacteria and well-studied E. coli system consisting of ClpB disaggregase and DnaK and its cochaperones. This puts the obtained results in a broader context.

      We thank the reviewer for the detailed comments. There are no specific weaknesses indicated in the public review.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript details the characterization of ClpL from L. monocytogenes as a potent and autonomous AAA+ disaggregase. The authors demonstrate that ClpL has potent and DnaKindependent disaggregase activity towards a variety of aggregated model substrates and that this disaggregase activity appears to be greater than that observed with the canonical DnaK/ClpB co-chaperone. Furthermore, Lm ClpL appears to have greater thermostability as compared to Lm DnaK, suggesting that ClpL-expressing cells may be able to withstand more severe heat stress conditions. Interestingly, Lm ClpP can provide thermotolerance to E. coli that have been genetically depleted of either ClpB or in cells expressing a mutant DnaK103. The authors further characterized the mechanisms by which ClpL interacts with protein aggregates, identifying that the N-terminal domain of ClpL is essential for disaggregase function. Lastly, by EM and mutagenesis analysis, the authors report that ClpL can exist in a variety of larger macromolecular complexes, including dimer or trimers of hexamers/heptamers, and they provide evidence that the N-terminal domains of ClpL prevent dimer ring formation, thus promoting an active and substrate-binding ClpL complex. Throughout this manuscript the authors compare Lm ClpL to ClpG, another potent and autonomous disaggregase found in gram-negative bacteria that have been reported on previously, demonstrating that these two enzymes share homologous activity and qualities. Taken together this report clearly establishes ClpL as a novel and autonomous disaggregase.

      Strengths:

      The work presented in this report amounts to a significant body of novel and significant work that will be of interest to the protein chaperone community. Furthermore, by providing examples of how ClpL can provide in vivo thermotolerance to both E. coli and L. gasseri the authors have expanded the significance of this work and provided novel insight into potential mechanisms responsible for thermotolerance in food-borne pathogens.

      Weaknesses:

      The figures are clearly depicted and easy to understand, though some of the axis labeling is a bit misleading or confusing and may warrant revision. While I do feel that the results and discussion as presented support the authors' hypothesis and overall goal of demonstrating ClpL as a novel disaggregase, interpretation of the data is hindered as no statistical tests are provided throughout the manuscript. Because of this only qualitative analysis can be made, and as such many of the concluding statements involving pairwise comparisons need to be revisited or quantitative data with stats needs to be provided. The addition of statistical analysis is critical and should not be difficult, nor do I anticipate that it will change the conclusions of this report.

      We thank the reviewer for the valid criticism. We addressed the major concern of the reviewer and added the requested statistical analysis to all relevant figures. The analysis confirms our conclusions. We also followed the advice of the reviewer and revised axis labeling to increase clarity.

      Reviewer #1 (Recommendations For The Authors):

      • It would really help to have a model showing how ClpL performs protein disaggregation based on their findings.

      We show that ClpL exerts a threading activity that is fueled by ATP hydrolysis in both AAA domains and executed by pore-located aromatic residues. The basic disaggregation mechanism of ClpL therefore does not differ from ClpB and ClpG disaggregases. Similarly, the specificity of ClpL towards protein aggregates is based on simultaneous interactions of multiple N-terminal domains with the aggregate surface. We could recently describe a similar mode of aggregate recognition for ClpG [1]. We therefore prefer not to add a model to the manuscript. We are currently in preparation of a review that includes the characterization of the novel bacterial disaggregases and will present models there as we consider a review article as more appropriate for such illustrations.

      • AAA2 domain of ClpL in Fig 3E should be the same color as in Fig 1A.

      We used light grey instead of dark grey for the ClpL AAA2 domain in Fig 3E, to distinguish between ClpL and ClpB AAA domains. This kind of illustration allows for clearer separation of both AAA+ proteins and the fusion construct LN-ClpB*. We therefore prefer keeping the color code.

      • Partial suppression of the dnaK mutant could be added in the main manuscript Figure.

      The main figure 3 is already very dense and we therefore prefer showing respective data as part of a supplementary figure.

      • It would have been interesting to know if the robust autonomous disaggregation activity of ClpL would be sufficient to rescue the growth of more severe E. coli chaperone mutants, like dnaK tig for example. Did the authors test this?

      We tested whether expression of clpL can rescue growth of E. coli dnaK103 mutant cells at 40°C on LB plates. This experiment is different from the restoration of heat resistance in dnaK103 cells (Figure 3, figure supplement 2A), as continuous growth at elevated temperatures (40°C) is monitored instead of cell survival upon abrupt severe heat shock (49°C). We did not observe rescue of the temperature-sensitive growth phenotype (40°C) of dnaK103 cells upon clpL expression, though expression of clpG complemented the temperature-sensitive growth phenotype (see Author response image 1 below). This finding points to differences in chaperone activities of ClpL and ClpG. It also suggests that ClpL activity is largely restricted to heat-shock generated protein aggregates, enabling ClpL to complement the missing disaggregation function of DnaK but not other Hsp70 activities including folding and targeting of newly synthesized proteins. We believe that dissecting the molecular reasons for differences in ClpG and ClpL complementation activities should be part of an independent study and prefer showing the growth-complementation data only in the response letter.

      Author response image 1.

      Serial dilutions (10-1 – 10-6) of E. coli dnaK103 mutant cells expressing E. coli dnaK, L. monocytogenes clpL or P. aeruginosa clpG were spotted on LB plates including the indicated IPTG concentrations. Plates were incubated at 30°C or 40°C for 24 h. p: empty vector control.

      Reviewer #2 (Recommendations For The Authors):

      Based on results presented in Fig. 2B the authors conclude "that stand-alone disaggregases ClpL and ClpG but not the canonical KJE/ClpB disaggregase exhibit robust threading activities that allow for unfolding of tightly folded domains" (page 5 line 209). In this experiment, the threading power of disaggregases was assessed by monitoring YFP fluorescence during the disaggregation of aggregates formed by fusion luciferase-YFP protein. In my opinion, the results of the experiment depend not only on the threading power of disaggregases but also on the substrate recognition by analyzed disaggregating systems and/or processivity of disaggregases. N-terminal domain in the case of ClpL and KJE chaperones in the case of the KJE/ClpB system are involved in recognition. This is not discussed in the manuscript and the obtained result might be misinterpreted. The authors have created the LN-ClpB* construct (N-terminal domain of ClpL fused to derepressed ClpB) (Fig. 3 E and F). In my opinion, this construct should be used as an additional control in the experiment in Fig. 2 B. It possesses the same substrate recognition domain and therefore the direct comparison of disaggregases threading power might be possible.

      We performed the requested experiment (new Figure 3 - figure supplement 2D). We did not observe unfolding of YFP by LN-ClpB. Sínce ClpL and LN-ClpB do not differ in their aggregate targeting mechanisms, this finding underlines the differences in threading power between ClpL and activated (derepressed) ClpB. It also suggests that the AAA threading motors and the aggregate-targeting NTD largely function independently.

      Presented results suggest that tetramer and dimer of rings might be a "storage form" of disaggregase. It would be interesting to analyze the thermotolerance and/or phenotype of ClpL mutants that do not form tetramer and dimer (E352A). This variant possesses similar to WT disaggregation activity but does not form dimers and tetramers. If in vivo the differences are observed (for example toxicity of the mutant), the "storage form" hypothesis will be probable.

      When testing expression of clpL-MD mutants (E352A, F354A), which cannot form dimers and tetramers of ClpL rings, in E. coli ∆clpB cells, we observed reduced production levels as compared to ClpL wildtype and speculated that reduced expression might be linked to cellular toxicity. We therefore compared spotting efficiencies of E. coli ∆clpB cells expression clpL, ∆NclpL or the clpL-MD mutants at different temperatures. Expression of clpL at high levels abrogated colony formation at 42°C (new Figure 6 - figure supplement 3). ClpL toxicity was dependent on its NTD as no effect was observed upon expression of ∆N-clpL. ClpL-MD mutants (E352A, F354A) were expressed at much lower levels and exhibited strongly increased toxicity as compared to ClpL-WT when produced at comparable levels (new Figure 6 – figure supplement 3). This implies a protective role of ClpL ring dimers and tetramers in the cellular environment by downregulating ClpL activity. We envision that the formation of ClpL assemblies restricts accessibility of the ClpL NTDs and reduces substrate interaction. Increased toxicity of ClpL-E352A and ClpL-F354A points to a physiological relevance of the dimers and tetramers of ClpL rings and is in agreement with the proposed function as storage forms. We added this potential role of ClpL ring assemblies to the discussion section. Due to the strongly reduced production levels of ClpL MD mutants and their enhanced toxicity at elevated temperatures we did not test for their ability to restore thermotolerance in E. coli ∆clpB cells.

      Figure 6G and Figure 6 -figure supplement 2 - it is not clear what is the difference in the preparation of WT and WTox forms of ClpL.

      ClpL WT was purified under reduced conditions (+ 2 mM DTT), whereas WTox was purified in absence of DTT, thus serving as control for ClpL-T355C, which forms disulfide bonds upon purification without DTT. We have added respective information to the figure legend and the materials and methods section.

      Page 5 line 250 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2A should be Figure 3 - Figure Supplement 2A.

      Page 5 line 251 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2B/C should be Figure 3 - Figure Supplement 2B/C.

      Page 7 line 315 - wrong figure citation. Instead of Figure 4F, it should be Figure 4G Figure 1 - Figure Supplement 2E - At first glance, this Figure does not correspond to the text and is confusing. It would be nice to have bars for Lm ClpL activity in the figure. Alternatively, the description of the y-axis might be changed to "relative to Lm ClpL disaggregation activity" instead of "relative disaggregation activity". One has to carefully read the figure legend to find out that 1 corresponds to Lm ClpL activity.

      We have corrected all mistakes and changed the description of y-axis (Figure 1 - figure Supplement 2E) as suggested.

      Reviewer #3 (Recommendations For The Authors):

      (1) While the authors make many experimental comparisons throughout their study, no statistical tests are described or presented with their results or figures, nor are these statistical tests described in the methods. While the data as presented does appear to support the author's conclusions, without these statistical tests no meaningful conclusions from paired analysis can be drawn. Critically, please report these statistical tests. As a general suggestion please include the statistics (p-values) in the results section when presenting this data, as well as in the figure legends, as this will allow the reader to better understand the authors' presentation and interpretation of the data.

      We have added statistical tests to all relevant figures. The analysis is confirming our former statements. We have further clarified our approach for the statistical analysis in the methods section. We report p-values in the results section, however, due to the volume of comparisons we did not add individual p-values to the figure legends but used standard labeling with stars.

      (2) Some of the axis labels for the presented graphs are a bit misleading or confusing. Many describe a relative (%) disaggregation rate, but it is not clear from the methods or figure legends what this rate is relative to. Is it relative to non-denatured substrates, to no chaperone conditions, etc.? Is it possible to present the figures with the raw data rates/activity (ex. luciferase activity / time) vs. relative rates? I think that labeling these figure axes with "disaggregation rate" is a bit misleading as none of these experiments measure the actual rate of disaggregation of these model substrates per se (say by SEC-MALS or other biophysical measurements), but instead infer the extent of disaggregation by measuring a property of these substrates, i.e. luciferase activity or fluorescence intensity over time. Thus, labeling these figures with the appropriate axis for what is being measured, and then clarifying in the methods and results what is being inferred by these measurements, will help solidify the author's conclusions.

      Relative (%) disaggregation rate usually refers to the disaggregation activity of ClpL wildtype serving as reference. We clarified this point in the revised text and respective figure legends. We now also refer to the process measured (e.g. relative refolding activity of aggregated Luciferase instead of relative disaggregation activity) as suggested by the reviewer and added clarifications to text and materials and methods.

      Since we have many measurements for our most frequently used assays and have a reasonable estimate for the general variance within these assays, we found it reasonable to show activity data in relation to fixed controls. This reduces the impact of unspecific variance and thereby makes more accurate comparisons between different repetitions. The reference is now indicated in the axis title.

      (3) The figures are well presented, clutter-free, and graphically easy to understand. Figure legends have sufficient information aside from the aforementioned statistical information and should include the exact number of independent replicates for each panel/experiment (ex. n=4), not just a greater than 3. While the figures do show each data point along with the mean and error, in some figures it is difficult to determine the number of replicate data points. Example figures 2c, 2d, and 3a. Also, please state whether the error is std. error or SEM.

      While we agree, that this is valuable information, we fear that overloading the figure legends with information may take a toll on the readability. We therefore decided to append the number of replicates for each experiment in a separate supplementary table (Table S2). The depicted error is showing the SD and not the SEM, which we also specified in the figure legends.

      (4) There are various examples throughout the results where qualitative descriptors are used to describe comparisons. Examples of this are "hardly enhanced" (Figure 1) and "partially reduced" (Figure 6). While this is not necessarily wrong, qualitative descriptions of comparisons in this manner would require further explanation. What is the definition of "hardly" or "partially"? My recommendation is to just state the data quantitatively, such as "% enhanced" or "reduced by x", this way there is no misinterpretation. Examples of this can be found in Figures 6C-G. This would require a full statistical overview and presentation of these stats in the results.

      We followed the reviewer`s advice and no longer use the terms criticized (e.g. “hardly enhanced”). We instead provide the requested quantifications in the text.

      Questions for Figures:

      Figures 1B and 1C:

      (1) Is the disaggregase activity of ClpL towards heat-denatured luciferase and GFP ATPdependent? While the authors later in the manuscript show that mutations within the Walker B domains dramatically impair reactivation (disaggregation) of denatured luciferase, this does not rule out an ATP-independent effect of these mutations. Thus, the authors should test whether disaggregase activity is observed when wild-type ClpL is incubated with denatured substrates without ATP present or in the presence of ADP only.

      We tested for ClpL disaggregation activity in absence of nucleotide and presence of ADP only (new Figure 1 – figure supplement 2A). We did not observe any activity, demonstrating that ClpL activity depends on ATP binding and hydrolysis (see also Figure 3 – figure supplement 1D: ATPase-deficient ClpL-E197A/E530A is lacking disaggregation activity).

      (2) The authors suggest that a reduction in disaggregase activity observed in samples combining Lm ClpL and KJE (Figure 1C, supp. 1C-E) could be due to competition for protein aggregate binding as observed previously with ClpG. Did the authors test this directly by pulldown assay or another interaction-based assay? While ClpL and ClpG appear to work in a similar manner, it would be good to confirm this. Also, clarification on how this competition operates would be useful. Is it that ClpL prevents aggregates from interacting with KJE, or vice versa?

      We probed for binding of ClpL to aggregated Malate Dehydrogenase in the presence of L. monocytogenes or E. coli Hsp70 (DnaK + respective J-domain protein DnaJ) by a centrifugation-based assay. Here, we used the ATPase-deficient ClpL-E197A/E530A (ClpLDWB) mutant, ensuring stable substrate interaction in presence of ATP. We observe reduced binding of ClpL-DWB to protein aggregates in presence of DnaK/DnaJ (new Figure 1 – figure supplement 2G). This finding indicates that both chaperones compete for binding to aggregated proteins and explains inhibition of ClpL disaggregation activity in presence of Hsp70.

      (3) Related to the above, while incubation of aggregated substrates with ClpL and KJE does appear to reduce aggregase activity towards GFP (Figure 1c), α-glucosidase (Supp. 1C), and MDH (Supp. 1D), this doesn't appear to be the case towards luciferase (Figure 1b, Supp. 1b). Furthermore, ClpL aggregase activity is reduced towards luciferase when combined with E. coli KJE (Supp. 1e) but not with Lm KJE (Figure 1b). The authors provide no commentary or explanation for these observations. Furthermore, these results complicate the concluding statement that "combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity ... ".

      We suggest that the differing inhibitory degrees of the KJE system on ClpL disaggregation activities reflect diverse binding affinities of KJE and ClpL to the respective aggregates. While we usually observe strong inhibition of ClpL activity in presence of KJE, this is different for aggregated Luciferase. This points to specific structural features of Luciferase aggregates or the presence of distinct binding sites on the aggregate surface that favour ClpL binding. We have added a respective comment to the revised manuscript.

      The former statement that “combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity” referred to aggregated GFP, MDH and α-Glucosidase for which a strong inhibition of ClpL activity was observed. We have specified this point.

      Figures 1D and 1E:

      (1) The authors conclude that the heat sensitivity of ΔClpL L. gasseri cells is because they do not express the canonical ClpB disaggregase. A good test to validate this would be to express KJE/ClpB in these Lg ΔClpL cells to see if heat-sensitivity could be fully or partially rescued.

      We agree that such experiment would further strengthen the in vivo function of ClpL as alternative disaggregase. However, such approach would demand for co-expression of E. coli ClpB with the authentic E. coli DnaK chaperone system (KJE), as ClpB and DnaK cooperate in a species-specific manner [2-4]. This makes the experiment challenging, also because the individual components need to be expressed at a correct stochiometry. Furthermore, the presence of the authentic L. gasseri KJE system, which is likely competing with the E. coli KJE system for aggregate binding, will hamper E. coli KJE/ClpB disaggregation activity in L. gasseri. In view of these limitations, we would like to refrain from conducting such an experiment.

      (2) The rationale for investigating Lg ClpL, and the aggregase activity assays are compelling and support the hypothesis that ClpL contributes to thermotolerance in multiple grampositive species. Though, from Figure 1d, why was only Lg ClpL investigated? It appears that S. thermophilus also lacks the canonical ClpB disaggregase and demonstrates ΔClpL heat sensitivity. There is also other Lactobacillus sp. presented that lack ClpB but were not tested for heat sensitivity. Why only test and move forward with L. gasseri? Lastly, L. mesenteroides is ClpB-negative but doesn't demonstrate ΔClpL heat sensitivity. Why?

      We wanted to document high, partner-independent disaggregation activity for another ClpL homolog. We chose L. gasseri, as (i) this bacterial species lacks a ClpB homolog and (ii) a ∆clpL mutant exhibit reduced survival upon severe heat shock (thermotolerance phenotype), which is associated with defects in cellular protein disaggregation. The characterization of L. gasseri ClpL as potent disaggregase in vitro represents a proof-of-concept and allows to generalize our conclusion. We therefore did not further test S. thermophilus ClpL. L. mesenteroides encodes for ClpL but not ClpB, yet, a ∆clpL mutant has not yet been characterized in this species to the best of our knowledge. As we wanted to link ClpL in vitro activity with an in vivo phenotype, we did not characterize L. mesenteroides ClpL.

      We agree with the reviewer that the characterization of additional ClpL homologs is meaningful and interesting, however, we strongly believe that such analysis should be part of an exhaustive and independent study.

      Figures 2A and 2B:

      (1) Figure 2B demonstrates that both ClpL and ClpG, but not the canonical KJE/ClpB, are able to unfold YFP during the luciferase disaggregation process, suggesting that ClpL and ClpG exhibit stronger threading activity. A technical question, can luciferase activity be measured alongside in the same assay sample? If so, would you expect to observe a concomitant increase in luciferase activity as YFP fluorescence decreases?

      KJE/ClpB can partially disaggregate and refold aggregated Luciferase-YFP without unfolding YFP during the disaggregation reaction [5]. YFP unfolding is therefore not linked to refolding of aggregated Luciferase-YFP. On the other hand, unfolding of YFP during disaggregation can hamper the refolding of the fused Luciferase moiety as observed for the AAA+ protein ClpC in presence of its partner MecA [5]. These diverse effects make the interpretation of LuciferaseYFP refolding experiments difficult as the degree of YFP unfolding activity does not necessarily correlate with the extend of Luciferase refolding. We therefore avoided to perform the suggested experiment.

      Figure 2C and 2D:

      (1) Thermal shift assays for ClpL, ClpG, and DnaK were completed with various nucleotides. Were these experiments also completed with samples in their nucleotide-free apo state? Also, while all these chaperones are ATPases, the nucleotides used differ, but no explanation is provided. Comparison should be made of these ATPases bound to the same molecules.

      We did not monitor thermal stabilities of chaperones without nucleotide as such state is likely not relevant in vivo. We used ATPγS in case of ClpL to keep the AAA+ protein in the ATPconformation. ATP would be rapidly converted to ADP due to the high intrinsic ATPase activity of ClpL. In case of DnaK ATPγS cannot be used as it does not induce the ATP conformation [6]. The low intrinsic ATPase activity of DnaK allows determining the thermal stability of its ATP conformation in presence of ATP. This is confirmed by calculating a reduced thermal stability of ADP-bound DnaK.

      (2) The authors suggest that incubation at 55⁰C will cause unfolding of Lm DnaK, but not ClpL, providing ClpL-positive Lm cells disaggregase activity at 55⁰C. While the thermal shift assays in Figures 2C and 2D support this, an experiment to test this would be to heat-treat Lm DnaK and ClpL at 55⁰C then test for disaggregase activity using either aggregated luciferase or GFP as in Figure 1.

      We followed the suggestion of the reviewer and incubated Lm ClpL and DnaK at 55-58°C in presence of ATP for 15 min prior to their use in disaggregation assays. We compared the activities of pre-heated chaperones with controls that were incubated at 30°C for 15 min. Notably, we did not observe a loss of DnaK disaggregation activity, suggesting that thermal unfolding of DnaK at this temperature is reversible. We provide these data as Figure 2 -figure supplement 1 and added a respective statement to the revised manuscript.

      Figure 3B:

      (1) The authors state that ATPase activity of ΔN-ClpL was "hardly affected", but from the data provided it appeared to result in an approximate 35% reduction. As discussed above, no stats are provided for this figure, but given the error bars, it is highly likely that this reduction is significant. Please perform this statistical test, and if significant, please reflect this in the written results as well as the figure. Lastly, if this reduction in ATPase activity is significant, why would this be so, and could this contribute to the reduction in aggregase activity towards luciferase and MDH observed in Figure 3A?

      We applied statistical tests as suggested by the reviewer, showing that the reduction in ATPase activity of ∆N-ClpL is statistically significant. N-terminal domains of Hsp100 proteins can modulate ATPase activity as shown for the family member ClpB, functioning as auxiliary regulatory element for fine tuning of ClpB activity [7]. We speculate that the impact of the ClpL-NTD on the assembly state (stabilization of ClpL ring dimers) might affect ClpL ATPase activity. We would like to point out that other ClpL mutants (e.g. NTD mutant ClpL-Y51A; MDmutant ClpL-F354A) have a similarly reduced ATPase activity, yet exhibit substantial disaggregation activity (approx. 2-fold reduced compared to ClpL wildtype). In contrast ∆NClpL does not exhibit any disaggregation activity. This suggests that the loss of disaggregation activity is caused by a substrate binding defect but not by a partial reduction in ATPase activity. We added a comment on the reduced ATPase activity and also discuss its potential reasons in the discussion section.

      (2) I think the authors' conclusion that deletion of the ClpL NTD does not contribute to structural defects of ClpL is premature given the apparent reduction in ATPase activity. Did the authors perform any biophysical analysis of ΔN-ClpL to confirm this conclusion? Thermal shift assays, Native-PAGE, or size-exclusion chromatography for aggregates would all be good assays to demonstrate that the wild-type and ΔN-ClpL have similar structural properties. Surprisingly, Figure 6 describes significant macromolecular changes associated with ΔN-ClpL such that it preferentially forms a dimer of rings. Furthermore, in Supp. Figure 6D the authors report that ΔN-ClpL appears to have an increased Tm as compared to WT- or ΔM-ClpL. The authors should reflect these observations as deletion of the ClpL NTD does appear to contribute to structural changes, though perhaps only at the macromolecular scale, i.e. dimerization of the rings.

      We have characterized the oligomeric state of ∆N-ClpL by size exclusion chromatography (Figure 6 – figure supplement 1A) and negative staining electron microscopy (Figure 6C), both showing that it forms assemblies similar to ClpL wildtype. We did not observe an increased tendency of ∆N-ClpL to form aggregates and the protein remained fully soluble after several cycles of thawing and freezing. EM data reveal that ∆N-ClpL exclusively form ring dimers, suggesting that the NTDs destabilize MD-MD interactions. The stabilized interaction between two ∆N-ClpL rings can explain the increased thermal stability (Figure 6 – figure supplement 1D). We speculate that the ClpL NTDs either affect MD-MD interactions through steric hindrance or by directly contacting MDs. We have added a respective statement to the discussion section.

      Figure 3C and 3D:

      (1) Given the larger error in samples expressing ClpG (100) or ClpL (100) statistical analysis with p-values is required to make conclusions regarding the comparison of these samples vs. plasmid-only control. The effect of ΔN-ClpL vs. wild-type ClpL looks compelling and does appear to attenuate the ClpL-induced thermotolerance. This is nicely demonstrated in Figure 3D.

      We quantified respective spot tests (new Figure 3E) and tested for statistical significance as suggested by the reviewer. We show that restoration of heat resistance is significant for the first 30 min. While we always observe rescue at later timepoints significance is lost here due to larger deviations in the number of viable cells and thus the degree of complementation.

      Figure 3F:

      (1) What is the role of the ClpB NTD? It appears to be dispensable for disaggregase activity, assuming that ClpB is co-incubated with KJE. A quick explanation of this domain in ClpB could be useful.

      The ClpB NTD is not required for disaggregation activity, as ClpB is recruited to protein aggregates by DnaK, which interacts with the ClpB MDs. Still, two functions have been described for the ClpB NTD. First, it can bind soluble unfolded substrates such as casein [8]. This substrate binding function can increase ClpB disaggregation activity towards some aggregated model substrates (e.g. Glucose-6-phosphate dehydrogenase) [9]. However, NTD deletion usually does not decrease ClpB disaggregation activity and can even lead to an increase [7, 10, 11]. An increased disaggregation activity of ∆N-ClpB correlates with an enhanced ATPase activity, which is explained by NTDs stabilizing a repressing conformation of the ClpB MDs, which function as main regulators of ClpB ATPase activity [7]. We added a short description on the role of the ClpB NTD to the respective results section.

      (2) The result of fusing the ClpL NTD to ClpB supports a role for this NTD in promoting autonomous disaggregase activity. What would you expect to observe if the fused Ln-ClpB protein was co-incubated with KJE? Would this further promote disaggregase activity, or potentially impair through competition? This experiment could potentially support the authors' hypothesis that ClpL and ClpB/KJE can compete with each other for aggregated substrates as suggested in Figure 1.

      We have performed the suggested experiment using aggregated MDH as model substrate. We did not observe an inhibition of LN-ClpB disaggregation activity in presence of KJE. In contrast ClpL disaggregation activity towards aggregated MDH is inhibited upon addition of KJE due to competition for aggregate binding (Figure 1 – figure supplement 2D/F). Disaggregation activity of LN-ClpB in presence of KJE can be explained by functional cooperation between both chaperone systems, which involves interactions between aggregate-bound DnaK and the ClpB MDs of the LN-ClpB fusion construct. We prefer showing these data only in the response letter but not including them in the manuscript, as respective results distract from the main message of the LN-ClpB fusion construct: the ClpL NTD functions as autonomous aggregatetargeting unit that can be transferred to other Hsp100 family members.

      Author response image 2.

      LN-ClpB cooperates with DnaK in protein disaggregation. Relative MDH disaggregation activities of indicated disaggregation systems were determined. KJE: DnaK/DnaJ/GrpE. The disaggregation activity of Lm ClpL was set to 1. Statistical Analysis: Oneway ANOVA, Welch’s Test for post-hoc multiple comparisons. Significance levels: **p < 0.001. n.s.: not significant.

      Figures 4E and 4F:

      (1) While the effect of various NTD mutations follows a similar trend in regard to the impairment of ClpL-mediated disaggregation of luciferase and MDH, the degree of these effects does appear different. For example, patch A and C mutations reduce ClpL disaggregase activity towards luciferase (~60% / 50% reduction) vs. MDH (>90%) respectively. While these results do suggest a critical role for residues in patches A and C of ClpL, these substrate-specific differences are not discussed. Why would we expect a difference in the effect of these patch A/C ClpL mutations on different substrates?

      We speculate that the aggregate structure and the presence or distributions of ClpL NTD binding sites differ between aggregated Luciferase and MDH. A difference between both aggregated model substrates was also observed when testing for an inhibitory effect of Lm KJE (and Ec KJE) on ClpL disaggregation activity (see comment above). We speculate that the mutated NTD residues make specific contributions to aggregate recognition. The severity of binding defects (and reduction of disaggregation activities) of these mutants will depend on specific features of the aggregated model substrates. We now point out that ClpL NTD patch mutants can differ in disaggregation activities depending on the aggregated model substrate used and refer to potential differences in aggregate structures.

      (2) The authors suggest that the loss of disaggregation activity of selected NTD mutants could be linked to reduced binding to aggregated luciferase. While this is likely given that these mutations do not appear to affect ATPase activity (Supp. 4), it could be possible that these mutants can still bind to aggregated luciferase and some other mechanism may impair disaggregation. A pull-down assay would help to prove whether reduced binding is observed in these NTD ClpL mutants. This also needs to be confirmed for Supp. Figure 4.2H.

      We have shown a strong correlation between loss of aggregate binding and disaggregation activity for several NTD mutants (Fig. 4G, Figure 4 – figure supplement 2H). We decided to perform the aggregate binding assay only with mutants that show a full but not a partial disaggregation defect as we made the experience that the centrifugation-based assay provides clear and reproducible results for loss-of-activity mutants but has limitations in revealing differences for partially affected mutants. This might be explained by the use of nonhydrolyzable ATPγS in these experiments, which strongly stabilizes substrate interactions, potentially covering partial binding defects. We agree with the reviewer that some ClpL NTD mutants might have additional effects on disaggregation activity by e.g. controlling substrate transfer to the processing pore site. We have added a respective comment to the revised manuscript.

      (3) Supp. Figure 4.2H has no description in the figure legend. The Y-axes states % aggregate bound to chaperone. How was this measured? See the above comments for Figures 4E and 4F.

      We apologize and added the description to the figure legend. The determination of % aggregate bound chaperone is based on the quantifications of chaperones present in the supernatant and pellet fractions after sample centrifugation. Background levels of chaperones in the pellet fractions in absence of protein aggregates were subtracted. We added this information to the materials and methods section.

      Figure 6G:

      The authors observed reduced disaggregase activity and ATPase activity of mutant T355C under both oxidative and reducing conditions. While this observation under oxidative conditions supports the authors' hypothesis, under reducing conditions (+DTT) we would expect the enzyme to behave similarly to wild-type ClpL unless this mutation has other effects. Can the authors please comment on this and provide an explanation or hypothesis?

      The reviewer is correct, ClpL-T355C exhibit a reduced disaggregation activity (Figure 6 – figure supplement 2B). We observe a similar reduction in disaggregation activity for the ClpL MD mutant F354A, pointing to an auxiliary function of the MD in protein disaggregation. We have made a respective comment in the discussion section of the revised manuscript. How exactly ClpL MDs support protein disaggregation is currently unclear and will be subject of future analysis in the lab. We strongly believe that such analysis should be part of an independent study.

      Discussion:

      In the fourth feature, it is discussed that one disaggregase feature of ClpL is that it does not cooperate with the ClpP protease. While a reference is provided for the canonical ClpB, no data in this paper, nor a reference, is provided demonstrating that ClpL does not interact with ClpP. As discussed, it is highly unlikely that ClpL interacts with ClpP given that ClpL does not contain the IGL/F loops that mediate the interaction of ClpP with cochaperones, such as ClpX, but data or a reference is needed to make such a factual statement.

      The absence of the IGL/F loop makes an interaction between ClpL and ClpP highly unlikely. However, the reviewer is correct, direct evidence for a ClpP-independent function of ClpL, though very likely, is not provided. We have therefore rephrased the respective statement: “Forth, novel disaggregases lack the specific IGL/F signature motif, which is essential for cooperation of other Hsp100 proteins with the peptidase ClpP. This feature is shared with the canonical ClpB disaggregase [12] suggesting that protein disaggregation is primarily linked to protein refolding.”.

      References

      (1) Katikaridis P, Simon B, Jenne T, Moon S, Lee C, Hennig J, et al. Structural basis of aggregate binding by the AAA+ disaggregase ClpG. J Biol Chem. 2023:105336.

      (2) Glover JR, Lindquist S. Hsp104, Hsp70, and Hsp40: A novel chaperone system that rescues previously aggregated proteins. Cell. 1998;94:73-82.

      (3) Krzewska J, Langer T, Liberek K. Mitochondrial Hsp78, a member of the Clp/Hsp100 family in Saccharomyces cerevisiae, cooperates with Hsp70 in protein refolding. FEBS Lett. 2001;489:92-6.

      (4) Seyffer F, Kummer E, Oguchi Y, Winkler J, Kumar M, Zahn R, et al. Hsp70 proteins bind Hsp100 regulatory M domains to activate AAA+ disaggregase at aggregate surfaces. Nat Struct Mol Biol. 2012;19:1347-55.

      (5) Haslberger T, Zdanowicz A, Brand I, Kirstein J, Turgay K, Mogk A, et al. Protein disaggregation by the AAA+ chaperone ClpB involves partial threading of looped polypeptide segments. Nat Struct Mol Biol. 2008;15:641-50.

      (6) Theyssen H, Schuster H-P, Bukau B, Reinstein J. The second step of ATP binding to DnaK induces peptide release. J Mol Biol. 1996;263:657-70.

      (7) Iljina M, Mazal H, Goloubinoff P, Riven I, Haran G. Entropic Inhibition: How the Activity of a AAA+ Machine Is Modulated by Its Substrate-Binding Domain. ACS chemical biology. 2021;16:775-85.

      (8) Rosenzweig R, Farber P, Velyvis A, Rennella E, Latham MP, Kay LE. ClpB N-terminal domain plays a regulatory role in protein disaggregation. Proc Natl Acad Sci U S A. 2015;112:E6872-81.

      (9) Barnett ME, Nagy M, Kedzierska S, Zolkiewski M. The amino-terminal domain of ClpB supports binding to strongly aggregated proteins. J Biol Chem. 2005;280:34940-5.

      (10) Beinker P, Schlee S, Groemping Y, Seidel R, Reinstein J. The N Terminus of ClpB from Thermus thermophilus Is Not Essential for the Chaperone Activity. J Biol Chem. 2002;277:47160-6.

      (11) Mogk A, Schlieker C, Strub C, Rist W, Weibezahn J, Bukau B. Roles of individual domains and conserved motifs of the AAA+ chaperone ClpB in oligomerization, ATP-hydrolysis and chaperone activity. J Biol Chem. 2003;278:15-24.

      (11) Weibezahn J, Tessarz P, Schlieker C, Zahn R, Maglica Z, Lee S, et al. Thermotolerance Requires Refolding of Aggregated Proteins by Substrate Translocation through the Central Pore of ClpB. Cell. 2004;119:653-65.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The authors identified that genetically and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.<br /> Strengths:

      The study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia. Overall, the article it's well written and clear.<br /> Weaknesses:

      Many of the experiments confirmed previous published data, which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line. The mechanistic insights of how the increased amount of long ceramides (cer c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed.

      We thank the reviewer for the assessment and would like to point out that Cers1 had not previously been studied in the context of aging. Moreover, our unbiased pathway analyses in human skeletal muscle implicate CERS1 for the first time with myogenic differentiation, which we validate in cell culture systems. To improve mechanistic insights, as suggested by Reviewer #1, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. Hence, we believe that reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition, sphingosine is forced towards the production of other, potentially less toxic or myogenesis-impairing ceramides. We added these new data to the revised manuscript as new Fig 5D-E and new Fig S5G-I.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wohlwend et al. investigates the implications of inhibiting ceramide synthase Cers1 on skeletal muscle function during aging. The authors propose a role for Cers1 in muscle myogenesis and aging sarcopenia. Both pharmacological and AAV-driven genetic inhibition of Cers1 in 18month-old mice lead to reduced C18 ceramides in skeletal muscle, exacerbating age-dependent features such as muscle atrophy, fibrosis, and center-nucleated fibers. Similarly, inhibition of the Cers1 orthologue in C. elegans reduces motility and causes alterations in muscle morphology.<br /> Strengths:

      The study is well-designed, carefully executed, and provides highly informative and novel findings that are relevant to the field.

      Weaknesses:

      The following points should be addressed to support the conclusions of the manuscript.

      (1) It would be essential to investigate whether P053 treatment of young mice induces age-dependent features besides muscle loss, such as muscle fibrosis or regeneration. This would help determine whether the exacerbation of age-dependent features solely depends on Cers1 inhibition or is associated with other factors related to age- dependent decline in cell function. Additionally, considering the reported role of Cers1 in whole-body adiposity, it is necessary to present data on mice body weight and fat mass in P053treated aged-mice.

      We thank the reviewer to suggest that we study Cers1 inhibition in young mice. In fact, a previous study shows that muscle-specific Cers1 knockout in young mice impairs muscle function (PMID: 31692231). Similar to our observation, these authors report reduced muscle fiber size and muscle force. Therefore, we do not believe that our observed effects of Cers1 inhibition in aged mice are specific to aging, although the phenotypic consequences are accentuated in aged mice. As requested by the reviewer, we attached the mice body weights and fat mass (Author response image 1A-B). The reduced fat mass upon P053 treatment is in line with previously reported reductions in fat mass in chow diet or high fat diet fed young mice upon Cers1 inhibition (PMID: 30605666, PMID: 30131496), again suggesting that the effect of Cers1 inhibition might not be specific to aging.

      Author response image 1.

      (A-B) Body mass (A) and Fat mass as % of body mass (B) were measured in 22mo C57BL/6J mice intraperitoneally injected with DMSO or P053 using EchoMRI (n=7-12 per group). (C-D) Grip strengh measurements in all limbs (C) or only the forelimbs (D) in 24mo C57BL/6J mice intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (n=8 per group). (E-F) Pax7 gene expression in P053 or AAV9 treated mice (n=6-7 per group) (E), or in mouse C2C12 muscle progenitor cells treated with 25nM scramble or Cers1 targeting shRNA (n=8 per group) (F). (G) Proliferation as measured by luciferase intensity in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=24 per group). Each column represents one biological replicate. (H) Overlayed FACS traces of Annexin-V (BB515, left) and Propidium Iodide (Cy5, right) of mouse C2C12 muscle myotubes treated with 25nM scramble or Cers1 targeting shRNA (n=3 per group). Quantification right: early apoptosis (Annexin+-PI-), late apoptosis (Annexin+-PI+), necrosis (Annexin--PI+), viability (Annexin--PI-). (I) Normalized Cers2 gene expression in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=6-7 per group). (J-K) Representative mitochondrial respiration traces of digitonin-permeablized mouse C2C12 muscle muscle cells treated DMSO or P053 (J) with quantification of basal, ATP-linked, proton leak respiration as well as spare capacity and maximal capacity linked respiration (n=4 per group). (L) Reactive oxygen production in mitochondria of mouse C2C12 muscle muscle cells treated DMSO or P053. (M) Enriched gene sets related to autophagy and mitophagy in 24mo C57BL/6J mouse muscles intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (left), or intraperitoneally injected with DMSO or P053 (right). Color gradient indicates normalized effect size. Dot size indicates statistical significance (n=6-8 per group). (N) Representative confocal Proteostat® stainings with quantifications of DMSO and P053 treated mouse muscle cells expressing APPSWE (top) and human primary myoblasts isolated from patients with inclusion body myositis (bottom). (O) Stillness duration during a 90 seconds interval in adult day 5 C. elegans treated with DMSO or 100uM P053. (P) Lifespan of C. elegans treated with DMSO or P053. (n=144-147 per group, for method details see main manuscript page 10).

      (2) As grip and exercise performance tests evaluate muscle function across several muscles, it is not evident how intramuscular AAV-mediated Cers1 inhibition solely in the gastrocnemius muscle can have a systemic effect or impact different muscles. This point requires clarification.

      The grip strength measurements presented in the manuscript come from hindlimb grip strength, as pointed out in the Methods section. We measured grip strength in all four limbs, as well as only fore- (Author response image 1C-D). While forelimb strength did not change, only hindlimb grip strength was significantly different in AAV-Cers1KD compared to the scramble control AAV (Fig 3I), which is in line with the fact that we only injected the AAV in the hindlimbs. This is similar to the effect we observed with our previous data where we saw altered muscle function upon IM AAV delivery in the gastrocnemius (PMID: PMID: 34878822, PMID: 37118545). The gastrocnemius likely has the largest contribution to hindlimb grip strength given its size, and possibly even overall grip strength as suggested by a trend of reduced grip strength in all four limbs (Author response image 1C). We also suspect that the hindlimb muscles have the largest contribution to uphill running as we could also see an effect on running performance. While we carefully injected a minimal amount of AAV into gastrocnemius to avoid leakage, we cannot completely rule out that some AAV might have spread to other muscles. We added this information to the discussion of the manuscript as a potential limitation of the study.

      (3) To further substantiate the role of Cers1 in myogenesis, it would be crucial to investigate the consequences of Cers1 inhibition under conditions of muscle damage, such as cardiotoxin treatment or eccentric exercise.<br /> While it would be interesting to study Cers1 in the context of muscle regeneration, and possibly mouse models of muscular dystrophy, we think such work would go beyond the scope of the current manuscript.

      (4) It would be informative to determine whether the muscle defects are primarily dependent on the reduction of C18-ceramides or the compensatory increase of C24-ceramides or C24-dihydroceramides.

      To improve mechanistic insights, as suggested by Reviewer #2, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. We added these data to the manuscript as new Fig 5D-E, new Fig S5G-I. These data, together with our previous results showing that Degs1 knockout reduces myogenesis (PMID: 37118545, Fig. 6s-x and Fig. 7) suggest that C24/dhC24 might contribute to the age-related impairments in myogenesis. We added the new results to the revised manuscript.

      (5) Previous studies from the research group (PMID 37118545) have shown that inhibiting the de novo sphingolipid pathway by blocking SPLC1-3 with myriocin counteracts muscle loss and that C18-ceramides increase during aging. In light of the current findings, certain issues need clarification and discussion. For instance, how would myriocin treatment, which reduces Cers1 activity because of the upstream inhibition of the pathway, have a positive effect on muscle? Additionally, it is essential to explain the association between the reduction of Cers1 gene expression with aging (Fig. 1B) and the age-dependent increase in C18-ceramides (PMID 37118545).

      Blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore seems beneficial for muscle aging. While most enzymes in the ceramide pathway that we studied so far (SPTLC1, CERS2) revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects. This is also visible in the direction of CERS1 expression compared to the other enzymes in one of our previous published studies (PMID: 37118545, Fig. 1e and Fig. 1f). In the current study, we show that Cers1 inhibition indeed exacerbates age-related myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. As the reviewer points out, both C18- and C24-ceramides seem to accumulate upon muscle aging. We think this is due to an overall overactive ceramide biosynthesis pathway. Blocking C18-ceramides via Cers1 inhibition results in the accumulates C24-ceramides and worsens muscle phenotypes (see reply to question #4). On the other hand, blocking C24-ceramides via Cers2 inhibition improves muscle differentiation. These observations together with the finding that Cers1 mediated inhibition of muscle differentiation is dependent on proper Cers2 function (new Fig 5D-E, new Fig S5G-I) points towards C24-ceramides as the main culprit of reduced muscle differentiation. Hence, at least a significant part of the benefits of blocking SPTLC1 might have been related to reducing very long-chain ceramides. We believe that reduced Cers1 expression in skeletal muscle upon aging, observed by us and others (PMID: 31692231), might reflect a compensatory mechanism to make up for an overall overactive ceramide flux in aged muscles. Reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition (new Fig 5E-D, new Fig S5G-I), sphingosine is forced towards the production of other, potentially less toxic, or myogenesis-impairing ceramides. These data are now added to the revised manuscript (see page 7). Details were added to the discussion of the manuscript (see page 8).

      Addressing these points will strengthen the manuscript's conclusions and provide a more comprehensive understanding of the role of Cers1 in skeletal muscle function during aging.

      Reviewer #1 (Recommendations For The Authors):

      The authors identified that genetical and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.

      Even though many of the experiments only confirmed previous published data (ref 21, 11,37,38), which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line, the study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia and opens new questions on understanding how inhibition of SPTLC1 (upstream CERS1) have beneficial effects in healthy aging (ref 15 published by the same authors).

      Overall, the article it's well written and clear. However, there is a major weakness. The mechanistic insights of how the increased amount of long ceramides (c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed. At the present stage the manuscript is descriptive and confirmatory of CERS1 mediated function in preserving muscle mass. The authors should consider the following points:

      Comments:

      (1) Muscle data

      (a) The effect of CERS1 inhibition on myotube formation must be better characterized. Which step of myogenesis is affected? Is stem cell renewal or MyoD replication/differentiation, or myoblast fusion or an increased cell death the major culprit of the small myotubes? Minor point: Figure S1C: show C14:00 level at 200 h; text of Fig S2A and 1F: MRF4 and Myogenin are not an early gene in myogenesis please correct, Fig S2B and 2C: changes in transcript does not mean changes in protein or myotube differentiation and therefore, authors must test myotube formation and myosin expression.

      Cers1 inhibition seems to affect differentiation and myoblast fusion. To test other suggested effects we performed more experiments as delineated. Inhibiting Cers1 systemically with the pharmacological inhibitor of Cers1 (P053) or with intramuscular delivery of AAV expressing a short hairpin RNA (shRNA) against Cers1 in mice did not affect Pax7 transcript levels (Author response image 1E). Moreover, we did also not observe an effect of shRNA targeting Cers1 on Pax7 levels in mouse C2C12 muscle progenitor cells (Author response image 1F). To characterize the effect of Cers1 inhibition on muscle progenitor proliferation/renewal, we used scramble shRNA, or shRNA targeting Cers1 in C2C12 muscle progenitors and measured proliferation using CellTiter-Glo (Promega). Results showed that Cers1KD had no significant effect on cell proliferation (Author response image 1G). Next, we assayed cell death in differentiating C2C12 myotubes deficient in Cers1 using FACS Analysis of Annexin V (left) and propidium iodide (right). We found no difference in early apoptosis, late apoptosis, necrosis, or muscle cell viability, suggesting that cell death can be ruled out to explain smaller myotubes (Author response image 1H). These findings support the notion that the inhibitory effect of Cers1 knockdown on muscle maturation are primarily based on effects on myogenesis rather than on apoptosis. Our data in the manuscript also suggests that Cers1 inhibition affects myoblast fusion, as shown by reduced myonucleation upon Cers1KD (Fig S3H right, Fig S5I).

      (b) The phenotype of CESR1 knockdown is milder than 0P53 treated mice (Fig S5D and Figure 3F, 3H are not significant) despite similar changes of Cer18:0, Cer24:0, Cer 24:1 concentration in muscles . Why?

      Increases in very long chain ceramides were in fact larger upon P053 administration compared to AAVmediated knockdown. For example, Cer24:0 levels increased by >50% upon P053 administration, compared to 20% by AAV injections. Moreover, dhC24:1 increased by 6.5-fold vs 2.5-fold upon P053 vs AAV treatment, respectively. These differences might not only explain the slightly attenuated phenotypes in the AA- treated mice but also underlines the notion that very long chain ceramides might cause muscle deterioration. We believe inhibiting the enzymatic activity of Cers1 (P053) as compared to degrading Cers1 transcripts is a more efficient strategy to reduce ceramide levels. However, we cannot completely rule out multi-organ, systemic effects of P053 treatment beyond its direct effect on muscle. We added these details in the discussion of the revised manuscript (see page 8 of the revised manuscript).

      (c) The authors talk about a possible compensation of CERS2 isoform but they never showed mRNA expression levels or CERS2 protein levels aner treatment. Is CERS2 higher expressed when CERS1 is downregulated in skeletal muscle?

      We appreciate the suggestion of the reviewer. We found no change in Cers2 mRNA levels upon Cers1 inhibition in mouse C2C12 myoblasts (Author response image 1I). We would like to point out that mRNA abundance might not be the optimal measurement for enzymes due to enzymatic activities. Therefore, we think metabolite levels are a better proxy of enzymatic activity. It should also be pointed out that “compensation” might not be an accurate description as sphingoid base substrate might simply be more available upon Cers1KD and hence, more substrate might be present for Cers2 to synthesize very long chain ceramides. This “re-routing” has been previously described in the literature and hypothesized to be related to avoid toxic (dh)sphingosine accumulation (PMID: 30131496). Therefore, we changed the wording in the revised manuscript to be more precise.

      (d) Force measurement of AAV CERS1 downregulated muscles could be a plus for the study (assay function of contractility)

      In the current study we measured grip strength in mice, which had previously been shown to be a good proxy of muscle strength and general health (PMID: 31631989). Indeed, our results of reduced muscle grip strength are in line with previous work that shows reduced contractility in muscles of Cers1 deficient mice (PMID: 31692231).

      (e) How are degradation pathways affected by the downregulation of CERS1. Is autophagy/mitophagy affected? How is mTOR and protein synthesis affected? There is a recent paper that showed that CerS1 silencing leads to a reduction in C18:0-Cer content, with a subsequent increase in the activity of the insulin pathway, and an improvement in skeletal muscle glucose uptake. Could be possible that CERS1 downregulation increases mTOR signalling and decreases autophagy pathway? Autophagic flux using colchicine in vivo would be useful to answer this hypothesis

      Cers1 in skeletal muscle has indeed been linked to metabolic homeostasis (see PMID: 30605666). In line with their finding in young mice we also find reduced fat mass upon P053 treatment in aged mice (Author response image 1A-B). We also looked into mitochondrial bioenergetics upon blocking Cers1 with P053 treatment using an O2k oxygraphy (Author response image 1J-L). Results show that Cers1 inhibition in mouse muscle cells increases mitochondrial respiration, similar to what has been shown before (PMID: 30131496). However, we also found that reactive oxygen species production in mouse muscle cells is increased upon P053 treatment, suggesting the presence of dysfunctional mitochondria upon inhibiting Cers1 with P053.We next looked into the mitophagy/autophagy degradation pathways suggested by the reviewer and do not find convincing evidence supporting that Cers1 has a major impact on autophagy or mitophagy derived gene sets in mice treated with shRNA against Cers1, or the Cers1 pharmacological inhibitor P053 (Author response image 1M).

      We then assessed the effect of Cers1 inhibition on transcripts levels related to the mTORC1/protein synthesis, as suggested by the reviewer. Cers1 knockdown in differentiating mouse muscle cells showed only a weak trend to reduce mTORC1 and its downstream targets (new Fig S4A). In line with this, there was no notable difference in protein synthesis in differentiating, Cers1 deficient mouse C2C12 myoblasts as assessed by L-homopropargylglycine (HPG) amino acid labeling using confocal microscopy (new Fig S4B) or FACS analyses (new Fig S4C). However, Cers1KD increased transcripts related to the myostatin-Foxo1 axis as well as the ubiquitin proteasome system (e.g. atrogin-1, MuRF1) (new Fig S4D), suggesting Cers1 inhibition increases protein degradation. We added these details to the revised manuscript on page 7. We recently implicated the ceramide pathway in regulating muscle protein homeostasis (PMID: 37196064). Therefore, we assessed the effect of Cers1 inhibition with the P053 pharmacological inhibitor on protein folding in muscle cells using the Proteostat dye that intercalates into the cross-beta spine of quaternary protein structures typically found in misfolded and aggregated proteins. Interestingly, inhibiting Cers1 further increased misfolded proteins in C2C12 mouse myoblasts expressing the Swedish mutation in APP and human myoblasts isolated from patients with inclusion body myositis (Author response imageure 1N). These findings suggest that deficient Cers1 might upregulate protein degradation to compensate for the accumulation of misfolded and aggregating proteins, which might contribute to impaired muscle function observed upon Cers1 knockdown. Further studies are needed to disentangle the underlying mechanstics.

      (f) The balances of ceramides have been found to play roles in mitophagy and fission with an impact on cell fate and metabolism. Did the authors check how are mitochondria morphology, mitophagy or how dynamics of mitochondria are altered in CERS1 knockdown muscles? (fission and fusion). There is growing evidence relating mitochondrial dysfunction to the contribution of the development of fibrosis and inflammation.

      Previously, CERS1 has been studied in the context of metabolism and mitochondria (for reference, please see PMID: 26739815, PMID: 29415895, PMID: 30605666, PMID: 30131496). In summary, these studies demonstrate that C18 ceramide levels are inversely related to insulin sensitivity in muscle and mitochondria, and that Cers1 inhibition improves insulin-stimulated suppression of hepatic glucose production and reduced high-fat diet induced adiposity. Moreover, improved mitochondrial respiration, citrate synthase activity and increased energy expenditure were reported upon Cers1 inhibition. Lack of Cers1 specifically in skeletal muscle was also reported to improve systemic glucose homeostasis. While these studies agree on the effect of Cers1 inhibition on fat loss, results on glucose homeostasis and insulin sensitivity differ depending on whether a pharmacologic or a genetic approach was used to inhibit Cers1. The current manuscript describes the effect of CERS1 on muscle function and myogenesis because these were the most strongly correlated pathways with CERS1 in human skeletal muscle (Fig 1C) and impact of Cers1 on these pathways is poorly studied, particularly in the context of aging. Therefore, we would like to refer to the mentioned studies investigating the effect of CERS1 on mitochondria and metabolism.

      (2) C.elegans data:

      (a) The authors checked maternal RNAi protocol to knockdown lagr-1 and showed alteration of muscle morphology at day 5. They also give pharmacological exposure of P053 drug at L4 stage. Furthermore, the authors also used a transgenic ortholog lagr-1 to perform the experiments. All of them were consistent showing a reduced movement. It would be important to show rescue of the muscle phenotype by overexpressing CERS1 ortholog in knockdown transgenic animals.

      We used RNAi to knockdown the Cers1 orthologue, lagr-1, in C.elegans. Therefore, we do not have transgenic animals. Overexpressing lagr-1 in the RNAi treated animals would also not be possible as the RNA from the overexpression would just get degraded.

      (b) The authors showed data about distance of C.elegans. It would be interesting to specify if body bends, reversals and stillness are affected in RNAi and transgenic Knockdown worms.

      As suggested, we measured trashing and stillness as suggested by the reviewer and found reduced trashing (new Fig S5B) and a trend towards an increase in stillness (Author response image 1O) in P053 treated worms on day 5 of adulthood, which is the day we observed significant differences in muscle morphology and movement (Fig 4D-E, Fig S5A). These data are now included in the revised manuscript.

      (c) Is there an effect on lifespan extension by knocking down CERS1?

      We performed two independent lifespan experiments in C.elegans treated with the Cers1 inhibitor P053 and found reduced lifespan in both replicate experiments (for second replicate, see Author response image 1P). We added these data to the revised manuscript as new Fig 4H.

      How do the authors explain the beneficial effect of sptlc1 inhibition on healthy aging muscle? Discuss more during the article if there is no possible explanation at the moment.

      We believe that blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore is more beneficial for muscle aging. Our current work suggests that at least a significant part of Sptlc1-KD benefits might stem from blocking very long chain ceramides. While SPTLC1 and CERS2 revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects, which is also visible in Fig 1e and Fig 1f of PMID: 37118545. In the current study, we show that Cers1 inhibition indeed exacerbates aging defects in myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. The fact that the effect of Cers1 on inhibiting muscle differentiation is dependent on the clearance of Cers2-derived C24-ceramides suggests that reducing very long chain ceramides might be crucial for healthy muscle aging. We added details to the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reports a novel measurement for the chemotactic response to potassium by Escherichia coli. The authors convincingly demonstrate that these bacteria exhibit an attractant response to potassium and connect this to changes in intracellular pH level. However, some experimental results are incomplete, with additional controls/alternate measurements required to support the conclusions. The work will be of interest to those studying bacterial signalling and response to environmental cues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper shows that E. coli exhibits a chemotactic response to potassium by measuring both the motor response (using a bead assay) and the intracellular signaling response (CheY phosporylation level via FRET) to step changes in potassium concentration. They find increase in potassium concentration induces a considerable attractant response, with an amplitude larger than aspartate, and cells can quickly adapt (but possibly imperfectly). The authors propose that the mechanism for potassium response is through modifying intracellular pH; they find both that potassium modifies pH and other pH modifiers induce similar attractant responses. It is also shown, using Tar- and Tsr-only mutants, that these two chemoreceptors respond to potassium differently. Tsr has a standard attractant response, while Tar has a biphasic response (repellent-like then attractant-like). Finally, the authors use computer simulations to study the swimming response of cells to a periodic potassium signal secreted from a biofilm and find a phase delay that depends on the period of oscillation.

      Strengths:

      The finding that E. coli can sense and adapt to potassium signals and the connection to intracellular pH is quite interesting and this work should stimulate future experimental and theoretical studies regarding the microscopic mechanisms governing this response. The evidence (from both the bead assay and FRET) that potassium induces an attractant response is convincing, as is the proposed mechanism involving modification of intracellular pH.

      Weaknesses:

      The authors show that changes in pH impact fluorescent protein brightness and modify the FRET signal; this measurement explains the apparent imprecise adaptation they measured. However, this effect reduces confidence in the quantitative accuracy of the FRET measurements. For example, part of the potassium response curve (Fig. 4B) can be attributed to chemotactic response and part comes from the pH modifying the FRET signal. Measuring the full potassium response curve of the no-receptor mutants as a control would help quantify the true magnitude of the chemotactic response and the adaptation precision to potassium.

      Response: We thank the reviewer for the suggestion. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The measured response may also be impacted by adaptation. For other strong attractant stimuli, the response typically shows a low plateau before it recovers (adapts). However, in the case of Potassium, the FRET signal does not have an obvious plateau following the stimuli. Do the authors have an explanation for that? One possibility is that the cells may have already partially adapted when the response reaches its minimum, which could indicate a different response and/or adaptation dynamics from that of a regular chemo-attractant? In any case, directly measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels would shed more light on the problem.

      Response: We appreciate the reviewer’s insightful questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (Sourjik & Berg, PNAS 99:123, 2002), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1 below, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      Author response image 1.

      The response of the cheRcheB mutant (HCB1382-pVS88) to different concentrations of KCl. The blue solid line denotes the original signal, while the red dots represent the pH-corrected signal. The vertical purple (green) dashed lines indicate the moment of adding (removing) 0.01 mM, 0.1 mM, 0.3 mM, 1 mM, 3 mM, 10 mM and 30 mM KCl, in chronological order.

      There seems to be an inconsistency between the FRET and bead assay measurements, the CW bias shows over-adaptation, while the FRET measurement does not.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      Now we clarified it at line 315.

      The small hill coefficient of the potassium response curve and the biphasic response of the Tar-only strain, while both very interesting, require further explanation since these are quite different than responses to more conventional chemoattractants.

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5) and the biphasic response of the Tar-only strain (Fig. 5C). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspected that this Hill coefficient of slightly less than 1 resulted from the different responses of Tar and Tsr receptors to potassium.

      The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al investigated the biophysical mechanism of potassium-mediated chemotactic behavior in E coli. Previously, it was reported by Humphries et al that the potassium waves from oscillating B subtilis biofilm attract P aeruginosa through chemotactic behavior of motile P aeruginosa cells. It was proposed that K+ waves alter PMF of P aeruginosa. However, the mechanism was this behaviour was not elusive. In this study, Zhang et al demonstrated that motile E coli cells accumulate in regions of high potassium levels. They found that this behavior is likely resulting from the chemotaxis signalling pathway, mediated by an elevation of intracellular pH. Overall, a solid body of evidence is provided to support the claims. However, the impacts of pH on the fluorescence proteins need to be better evaluated. In its current form, the evidence is insufficient to say that the fluoresce intensity ratio results from FRET. It may well be an artefact of pH change. Nevertheless, this is an important piece of work. The text is well written, with a good balance of background information to help the reader follow the questions investigated in this research work.

      In my view, the effect of pH on the FRET between CheY-eYFP and CheZ-eCFP is not fully examined. The authors demonstrated in Fig. S3 that CFP intensity itself changes by KCl, likely due to pH. They showed that CFP itself is affected by pH. This result raises a question of whether the FRET data in Fig3-5 could result from the intensity changes of FPs, but not FRET. The measured dynamics may have nothing to do with the interaction between CheY and CheZ. It should be noted that CFP and YFP have different sensitivities to pH. So, the measurement is likely confounded by the change in intracellular pH. Without further experiments to evaluate the effect of pH on CFP and YFP, the data using this FRET pair is inconclusive.

      Response: We thank the reviewer for pointing this out. We have now measured the full potassium response curve for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. We characterized the pH effects on CFP and YFP channels at different concentrations of KCl, and the relationship between the ratio of the signal post- to pre-KCl addition and the KCl concentration was established for both channels, as shown in Fig. S4C. The pH-corrected signal after KCl addition for strains with receptors was obtained by dividing the original signal after KCl addition by this ratio at the specific KCl concentration. This was done for both CFP and YFP channels. The pH-corrected responses for the Tar-only and Tsr-only strains are represented by red dots in Fig. 5BC. The recalculated response curve and adaptation curve for the wild-type strain are shown in Fig. S5. The same correction was applied to Fig. 3 as well. We also re-performed the simulations using the corrected dose-response curve and replotted Fig. 6, though the simulation results did not change much.

      We have now added a subsection “Revised FRET responses by correcting the pH effects on the brightness of eCFP and eYFP” at line 296 in “Results” to describe this.

      The data in Figure 1 is convincing. It would be helpful to include example videos. There is also ambiguity in the method section for this experiment. It states 100mM KCl was flown to the source channel. However, it is not clear if 100 mM KCl was prepared in water or in the potassium-depleted motility buffer. If KCl was prepared with water, there would be a gradient of other chemicals in the buffer, which confound the data.

      Response: We apologize for the ambiguity. The KCl solution used in this work was prepared in the potassium-depleted motility buffer. We have now clarified this at both lines 116 and 497. We now provided an example video, Movie S1, with the relevant text added at line 123.

      The authors show that the FRET data with both KCl and K2SO4, and concluded that the chemotactic response mainly resulted from potassium ions. However, this was only measured by FRET. It would be more convincing if the motility assay in Fig1 is also performed with K2SO4.

      Response: We thank the reviewer for the suggestion. The aim of comparing the responses to KCl and K2SO4 was to determine the role of chloride ions in the response and to prove that the chemotactic response of E. coli to KCl comes primarily from its response to potassium ions. It is more sensitive to compare the responses to KCl and K2SO4 by using the FRET assay. In contrast, the microfluidic motility assay is less sensitive in revealing the difference in the chemotactic responses, making it difficult to determine the potential role of chloride ions.

      Methods:

      • Please clarify the promotes used for the constitutive expression of FliCsticky and LacI.

      Response: The promoters used for the constitutive expression of LacIq and FliCsticky were the Iq promoter and the native promoter of fliC, respectively (ref. 57).

      Now these have been clarified at line 471.

      • Fluorescence filters and imaging conditions (exposure time, light intensity) are missing.

      Response: Thank you for the suggestion. We have now added more descriptions at lines 535-546: The FRET setup was based on a Nikon Ti-E microscope equipped with a 40× 0.60 NA objective. The illumination light was provided by a 130-W mercury lamp, attenuated by a factor of 1024 with neutral density filters, and passed through an excitation bandpass filter (FF02-438/24-25, Semrock) and a dichroic mirror (FF458-Di02-25x36, Semrock). The epifluorescent emission was split into cyan and yellow channels by a second dichroic mirror (FF509-FDi01-25x36, Semrock). The signals in the two channels were then filtered by two emission bandpass filters (FF01-483/32-25 and FF01-542/32-25, Semrock) and collected by two photon-counting photomultipliers (H7421-40, Hamamatsu, Hamamatsu City, Japan), respectively. Signals from the two photomultipliers were recorded at a sampling rate of 1 Hz using a data-acquisition card installed in a computer (USB-1901(G)-1020, ADlink, New Taipei, Taiwan).

      • Please clarify if the temperature was controlled in motility assays.

      Response: All measurements in our work were performed at 23 ℃. It was clarified at line 496.

      • L513. It is not clear how theta was selected. Was theta set to be between 0 and pi? If not, P(theta) can be negative?

      Response: The θ was set to be between 0 and π. This has now been added at line 581.

      • Typo in L442 (and) and L519 (Koff)

      Response: Thank you. Corrected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) From the motor measurements the authors find that the CW bias over-adapts to a level larger than prestimulus, but this is not seen in the FRET measurements. What causes this inconsistency? Fig. 2D seems to rule out any change in CheY binding to the motor.

      Response: We thank the reviewer for pointing this out. We have now demonstrated that the imprecise adaptation shown in the FRET assay primarily resulted from the pH-induced intensity change of the fluorescent proteins. As shown in Fig. S5A&C, the FRET signal also shows over-adaptation, similar to the bead assay, when we recalculated the response by correcting the CFP and YFP channels.

      We now clarified it at line 315.

      (2) It would be useful to compare the response amplitude for potassium (Fig. 3C) to a large concentration of both MeAsp and serine. This is a fairer comparison since your work shows potassium acts on both Tar and Tsr. Alternatively, testing a much larger concentration (~10^6 micromolar) at which MeAsp also binds to Tsr would also be useful.

      Response: We thank the reviewer for pointing this out. We have now recalculated the response to potassium by correcting the pH-induced effects on fluorescence intensity of CFP and YFP. The response to 30 mM KCl was 1.060.10 times as large as that to 100 μM MeAsp. The aim of the comparison between the responses to potassium and MeAsp was to provide an idea of the magnitude of the chemotactic response to potassium. The stimulus of 100 μM MeAsp is already a saturating amount of attractant and induces zero-kinase activity, thus using a higher stimulus (adding serine or a larger concentration of MeAsp) is probably not needed. Moreover, a larger concentration (~10^6 micromolar) of MeAsp would also induce an osmotactic response.

      (3) The fitted Hill coefficient (~0.5) to the FRET response curve is quite small and the authors suggest this indicates negative cooperativity. Do they have a proposed mechanism for negative cooperativity? Have similar coefficients been measured for other responses?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We suspect that this Hill coefficient of slightly less than 1 results from the differing responses of Tar and Tsr receptors to potassium.

      (3a) The authors state a few times that the response to potassium is "very sensitive", but the low Hill coefficient indicates that the response is not very sensitive (at least compared to aspartate and serine responses).

      Response: We apologize for the confusion. We described the response to potassium as “very sensitive” due to the small value of K0.5. This has now been clarified at line 236.

      (3b) Since the measurements are performed in wild-type cells the response amplitude following the addition of potassium may be biased if the cell has already partially adapted. This seems to be the case since the FRET time series does not plateau after the addition of the stimulus. The accuracy of the response curve and hill coefficient would be more convincing if the experiment was repeated with a cheR cheB deficient mutant.

      Response: We thank the reviewer for raising these questions. To observe the low plateau before adaptation, a saturating amount of attractant should be added in a stepwise manner. According to the dose-response curve we measured for potassium, a saturating amount of potassium would be close to 100 mM. In fact, there is a small segment of the low plateau in the step response to 30 mM KCl (Fig. 4C or Fig. S5A). To observe more of this low plateau, we could have used a higher concentration of KCl. However, a stimulation higher than 30 mM KCl will induce substantial physiological changes in the cell, resulting in a significant decrease in fluorescence for both channels (Fig. S7). Therefore, the range of KCl concentration that can be reliably applied in FRET measurements is limited.

      The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, demonstrating a faster adaptation than 0.1 mM MeAsp, which induced a similar magnitude of response. Nevertheless, this is still significantly slower than the time required for medium exchange in the flow chamber, which takes less than 10 s to replace 99% of the medium. Thus, the effect on the measured response magnitude due to adaptation should be small (less than 10%).

      We thank the reviewer for the suggestion of measuring the response to potassium in mutants without adaptation enzymes (CheR, CheB) and with the receptors in different methylation levels. However, these mutants are typically less sensitive than the wild-type, exhibiting higher values of K0.5 (ref. 46), and thus require an even higher KCl concentration to see the low plateau. Consistent with this, we attempted to measure the response to potassium in a cheRcheB mutant (HCB1382-pVS88). As shown in Fig. R1, there is no response to up to 30 mM KCl, suggesting that the sensitive region of the mutant is beyond 30 mM KCl.

      The relevant text was added at line 413-424.

      (4) The authors show that the measured imprecise adaptation can be (at least partially) attributed to pH impacting the FRET signal by changing eCFP and eYFP brightness.

      (4a) Comparing Fig. 5C and D, the chemosensing and pH response time scales look similar. Therefore, does the pH effect bias the measured response amplitude (just as it biases the adapted FRET level)?

      Response: We agree with the reviewer that the pH effect on CFP and YFP biases the measured response amplitude. We have now performed the measurement of dose-response curve to potassium for the no-receptor mutant (HCB1414-pVS88), as shown in Fig. S4. The pH effects on CFP and YFP were corrected. The dose-response curve and adaptation curve were recalculated and plotted in Fig. S5.

      (4b) It would help to measure a full response curve (at many concentrations) for the no-receptor strain as a control. This would help distinguish, as a function of concentration, how much response can be attributed to pH impacting the FRET signal versus the true chemotactic response.

      Response: We thank the reviewer for the suggestion. We have now performed the measurements for the no-receptor strain. The impact of pH on CFP and YFP has been corrected. The pH-corrected results, previously in Fig.3-5, are now presented in Fig. 3, Fig. S5 and Fig. 5, respectively.

      (5) The biphasic response of Tar is strange and warrants further discussion. Do the authors have any proposed mechanisms that lead to this behavior? For the 10mM and 30mM KCl measurements there is a repellent response followed by an attractant response for both adding and removing the stimuli, why is this?

      Response: We thank the reviewer for pointing this out. The Tar-only strain exhibits a repellent response to stepwise addition of low concentrations of potassium less than 10 mM, and a biphasic response above (Fig. 5C). This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA, which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5a) The fact that Tar and Tsr are both attractant (after the initial repellant response in Tar) appears to be inconsistent with previous work on pH response (Ref 52, Yang and Sourjik Molecular Microbiology (2012) 86(6), 1482-1489). This study also didn't see any biphasic response.

      Response: We thank the reviewer for pointing this out. The Tar-only strain shows a repellent response to stepwise addition of low concentrations of potassium, specifically less than 10 mM. This is consistent with previous observations of the response of Tar to changes in intracellular pH (refs. 44,45) and also with the work of Yang and Sourjik (new ref. 53), although the work in ref. 53 dealt with the response to external pH change, and bacteria were known to maintain a relatively stable intracellular pH when external pH changes (Chen & Berg, Biophysical Journal (2000) 78:2280-2284). Interestingly, the Tar-only strain exhibits a biphasic response to high potassium concentrations of 10 mM and above. This biphasic response might result from additional pH-effects on the activity of intracellular enzymes such as CheRB and CheA (ref. 56), which may have a different timescale and response from the Tar receptor. We have now added the penultimate paragraph in “Discussion” to talk about the response of the Tar-only strain.

      (5b) The response of Tar to the removal of sodium benzoate (Fig. S2) seems to be triphasic, is there any explanation for this?

      Response: We thank the reviewer for pointing this out. We have now acknowledged in the legend of Fig. S2 that this response is interesting and warrants further exploration: “The response to the removal of sodium benzoate seems to be a superposition of an attractant and a repellent response, the reason for which deserves to be further explored.”

      (6) Fitting the MWC model leads to N=0.35<1. It is fine to use this as a phenomenological parameter, but can the authors comment on what might be causing such a small effective cluster size for potassium response?

      Response: We thank the reviewer for pointing this out. We have now recalculated the pH-corrected results for the dose-response curve (Fig. S5). The new Hill coefficient is 0.880.14 (meanSD), which is close to the response to MeAsp (1.2) (ref. 46). We now refit the MWC model to the pH-corrected dose-response curve, obtaining N of 0.85. We think the small N is due partly to the fact that we are fitting the curve with four parameters: N, Kon, Koff, and fm, while only three features of the sigmoid does-response curve are relevant (the vertical scale, the midpoint concentration, and the slope of the sigmoid). Future experiments may determine these parameters more accurately, but they should not significantly affect the simulation results as long as the wild-type dose-response curve is accurate.

      (7) The results of the modeling are closely related to Zhu et. al. Phys. Rev. Lett. 108, 128101. Is the lag time for large T related to the adaptation time?

      Response: We thank the reviewer for pointing this out. We used a similar framework of modeling as Zhu et. al. The potassium response was also analogous to the chemotactic response to MeAsp. Thus, the results are closely related to Zhu et al. We have now cited Zhu et al. (Ref. 52) and noted this at line 366.

      The lag time for large T is related to the adaptation time. We have now simulated the chemotaxis to potassium for large T with different adaptation time by varying the methylation rate kR. The results are shown in Fig. S8. The simulated lag time decreases with the methylation rate kR, but levels off at high values of kR. Now this has been added at line 603.

      Minor issues:

      • Fig. 1C: should the axis label be y?

      Response: Yes, thank you. Now corrected.

      • Line 519: Koff given twice, the second should be Kon.

      Response: Thank you. Corrected.

      • When fitting the MWC model (Eq. 3 and Fig. 6B) did you fix a particular value for m?

      Response: m was treated as a fitting parameter, grouped in the parameter fm.

      Reviewer #2 (Recommendations For The Authors):

      Minor points: - I suggest explaining the acronyms when they first appear in the text (eg CMC, CW, CCW).

      Response: Thank you. Now they have been added.

      • L144. L242. "decrease" is ambiguous since membrane potential is negative. I understand the authors meant less negative (which is an increase). I suggest to avoid this expression.

      Response: Thank you for the suggestion. Now they have been replaced by “The absolute value of the transmembrane electrical potential will decrease”.

      • For Fig 1b - it says the shaded area is SEM in the text, but SD in the legend. Please clarify.

      Response: Thank you. The annotation in the legend has now been revised as SEM.

      • Fig 1C label of x axis should be "y" instead of "x" to be consistent with Fig 1A.

      Response: Thank you. It has now been revised.

      • In Figure 2, the number of independent experiments as well as the number of samples should be included.

      Response: Thank you. The response in Fig. 2C is the average of 83 motors from 5 samples for wild-type strain (JY26-pKAF131). The response in Fig. 2D is the average of 22 motors from 4 samples for the chemotaxis-defective strain (HCB901-pBES38). They have now been added to the legend.

      • Regarding the attractant or repelling action of potassium and sucrose, it would be important to have a move showing the cells' behaviours.

      Response: We thank the reviewer for the suggestion. We have now provided Movie S1 to show the cells’ behavior to potassium. As shown in Fig. 3B, the chemotactic response to 60 mM sucrose is very small compared to the response to 30 mM KCl. This implies that a noticeable response to sucrose necessitates higher concentrations of stimulation. However, Jerko et al. [Rosko, J., Martinez, V. A., Poon, W. C. K. & Pilizota, T. Proc. Natl Acad. Sci. USA 114, E7969-E7976 (2017).] have shown that high concentrations of sucrose lead to a significant reduction in the speed of the flagella motor. Thus, in a motility assay for sucrose, the osmolarity-induced motility effect may overwhelm the minor repellent-like response.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The weaknesses are the brevity of the simulations, the concomitant lack of scope of the simulations, the lack of depth in the analysis, and the incomplete relation to other relevant work.

      A 1 µs simulation of CCh (Video 1, part 2) shows that m3 (ACHA) is stable, throughout. The DG comparisons, in silico versus in vitro, indicate that 200 ns simulations are sufficient to identify LA versus HA conformational populations. Figure 6-table supplement 1 shows distances. New citations have been added.

      Reviewer #2 (Public Review):

      Weaknesses:

      After carrying out all-atom molecular dynamics, the authors revert to a model of binding using continuum Poisson-Boltzmann, surface area, and vibrational entropy. The motivations for and limitations associated with this approximate model for the thermodynamics of binding, rather than using modern atomistic MD free energy methods (that would fully incorporate configurational sampling of the protein, ligand, and solvent) could be provided. Despite this, the authors report a correlation between their free energy estimates and those inferred from the experiment. This did, however, reveal shortcomings for two of the agonists. The authors mention their trouble getting correlation to experiment for Ebt and Ebx and refer to up to 130% errors in free energy. But this is far worse than a simple proportional error, because -24 Vs -10 kcal/mol is a massive overestimation of free energy, as would be evident if the authors were to instead express results in terms of KD values (which would have an error exceeding a billion fold). The MD analysis could be improved with better measures of convergence, as well as a more careful discussion of free energy maps as a function of identified principal components, as described below. Overall, however, the study has provided useful observations and interpretations of agonist binding that will help understand pentameric ligand-gated ion channel activation.

      The objective of the calculations was to identify structural populations, not to estimate binding free energies. We knew the actual LA and HA energies (for all 4 agonists) from real-world electrophysiology experiments. We conclude that the simple PBSA method worked as a tool for identification because the calculated efficiencies match those from experiments (Figure 4B, Figure 4-Source Data 1). We discuss the mismatches in absolute G in the Results and Discussion. Methods for estimating experimental binding free energies are described in a cited, eLife companion paper. The G ratio relates to agonist efficiency.

      Main points:

      Regarding the choice of model, some further justification of the reduced 2 subunit ECD-only model could be given. On page 5 the authors argue that, because binding free energies are independent of energy changes outside the binding pocket, they could remove the TMD and study only an ECD subunit dimer. While the assumption of distant interactions being small seems somewhat reasonable, provided conformational changes are limited and localised, how do we know the packing of TMD onto the ECD does not alter the ability of the alpha-delta interface to rearrange during weak or strong binding? They further write that "fluctuations observed at the base of the ECD were anticipated because the TMD that offers stability here was absent.". As the TMD-ECD interface is the "gating interface" that is reshaped by agonist binding, surely the TMD-ECD interface structure must affect binding. It seems a little dangerous to completely separate the agonist binding and gating infrastructure, based on some assumption of independence. Given the model was only the alpha and delta subunits and not the pentamer with TMD, I am surprised such a model was stable without some heavy restraints. The authors state that "as a further control we carried out MD simulation of a pentamer docked with ACh and found similar structural changes at the binding pocket compared to the dimer." Is this sufficient proof of the accuracy of the simplified model? How similar was the model itself with and without agonist in terms of overall RMSD and RMSD for the subunit interface and the agonist binding site, as well as the free energy of binding to each model to compare?

      The statement that distant interactions are small is not an "assumption", but rather a conclusion based on data. Mutant cycle analysis of 83 pairs shows (with a few exceptions) non-additivity of free energy change prevails only with separations <~15 A (Fig.3 in Gupta et al 2017). Regardless, the adequacy of dimers and convergence by 200 ns are supported by the calculated and experimental agonist efficiencies match (Figure 4B) and the 1 ms simulation (Video 1 part 2). Apo 200ns simulation of the ECD dimer is now added (Figure 2-figure supplement 2) and the dimer interface seems to be adequate (stable).

      Although the authors repeatedly state that they have good convergence with their MD, I believe the analysis could be improved to convince us. On page 8 the authors write that the RMSD of the system converged in under 200 ns of MD. However, I note that the graph is of the entire ECD dimer, not a measure for the local binding site region. An additional RMSD of local binding site would be much more telling. You could have a structural isomerisation in the site and not even notice it in the existing graph. On page 9 the authors write that the RMSF in Figure S2 showed instability mainly in loops C and F around the pocket. Given this flexibility at the alpha-delta interface, this is why collecting those regions into one group for the calculation of RMSD convergence analysis would have been useful. They then state "the final MD configuration (with CCh) was well-aligned with the CCh-bound cryo-EM desensitized structure (7QL6)... further demonstrating that the simulation had converged." That may suggest a change occurred that is in common with the global minimum seen in cryo EM, which is good, but does not prove the MD has "converged". I would also rename Figure S3 accordingly.

      The description is now changed to “aligns well” with desensitized structure (7QL6.PDB)”. RMSD of not just the binding pocket but the whole ECD dimer is well aligned with first apo (m1) and with desensitized state (m3).

      The authors draw conclusions about the dominant states and pathways from their PCA component free energy projections that need clarification. It is important first to show data to demonstrate that the two PCA components chosen were dominant and accounted for most of the variance. Then when mapping free energy as a function of those two PCA components, to prove that those maps have sufficient convergence to be able to interpret them. Moreover, if the free energies themselves cannot be used to measure state stability (as seems to be the case), that the limitations are carefully explained. First, was PCA done on all MD trajectories combined to find a common PC1 & PC2, or were they done separately on each simulation? If so, how similar are they? The authors write "the first two principal components (PC-1 and PC-2) that capture the most pronounced C. displacements". How much of the total variance did these two components capture? The authors write the changes mostly concern loop C and loop F, but which data proves this? e.g. A plot of PC1 and PC2 over residue number might help.

      The PCA analyses have been enriched. Figure 3-Source Data 1. shows the dominance of PC1 and PC2. Because the binding energy match was sufficient to identify affinity states, we did not explore additional PCs. Residue-wise PC1 and PC2 analysis and comparison with RMSF are in Figure 2-figure supplement 2. PC1 and PC2 both correlate with fluctuations in loops C and F. Overlap analysis in different runs is shown in Figure 3-figure supplement 1. Lower variance in a particular region of the PCA landscape indicates that the system frequently visits these states, suggesting stability (a preference for these conformations).

      The authors map the -kTln rho as a free energy for each simulation as a function of PC1 & PC2. It is important to reveal how well that PC1-2 space was sampled, and how those maps converged over time. The shapes of the maps and the relative depths of the wells look very different for each agonist. If the maps were sampled well and converged, the free energies themselves would tell us the stabilities of each state. Instead, the authors do not even mention this and instead talk about "variance" being the indicator of stability, stating that m3 is most stable in all cases. While I can believe 200ns could not converge a PC1-2 map and that meaningful delta G values might not be obtained from them, the issue of lack of sampling must be dealt with. On page 12 they write "Although the bottom of the well for 3 energy minima from PCA represent the most stable overall conformation of the protein, they do not convey direct information regarding agonist stability or orientation". The reasons why not must be explained; as they should do just that if the two order parameters PC1 and PC2 captured the slowest degrees of freedom for binding and sampling was sufficient. The authors write that "For all agonists and trajectories, m3 had the least variance (was most stable), again supporting convergence by 200 ns." Again the issue of actual free energy values in the maps needs to be dealt with. The probabilities expressed as -kTln rho in kcal/mol might suggest that m2 is the most stable. Instead, the authors base stability only on variance (I guess breadth of the well?), where m3 may be more localised in the chosen PC space, despite apparently having less preference during the MD (not the lowest free energy in the maps).

      The motivations and justifications for the use of approximate PBSA energetics instead of atomistic MD free energies should be dealt with in the manuscript, with limitations more clearly discussed. Rather than using modern all-atom MD free energy methods for relative or absolute binding free energies, the author selects clusters from their identified states and does Poisson-Boltzmann estimates (electrostatic, vdW, surface area, vibrational entropy). I do believe the following sentence does not begin to deal with the limitations of that method: "there are limitations with regard to MM-PBSA accurately predicting absolute binding free energies (Genheden & Ryde, 2015; Hou et al., 2011) that depends on the parameterization of the ligand (Oostenbrink et al., 2004)." What are the assumptions and limitations in taking continuum electrostatics (presumably with parameters for dielectric constants and their assignments to regions after discarding solvent), surface area (with its assumptions and limitations), and of course assuming vibration of a normal mode can capture entropy. On page 30, regarding their vibrational entropy estimate, they write that the "entropy term provides insights into the disorder within the system, as well as how this disorder changes during the binding process". It is important that the extent of disorder captured by the vibrational estimate be discussed, as it is not obvious that it has captured entropy involving multiple minima on the system's true 3N-dimensional energy surface, and especially the contribution from solvent disorder in bound Vs dissociated states.

      As discussed above, errors in the free energy estimates need to be more faithfully represented, as fractional errors are not meaningful. On page 21 the authors write "The match improved when free energy ratios rather than absolute values were compared." But a ratio of free energies is not a typical or expected measure of error in delta G. They also write "For ACh and CCh, there is good agreement between.Gm1 and GLA and between.Gm3 and GHA. For these agonists, in silico values overestimated experimental ones only by ~8% and ~25%. The agreement was not as good for the other 2 agonists, as calculated values overestimated experimental ones by ~45%(Ebt) and ~130% (Ebt). However, the fractional overestimation was approximately the same for GLA and GHA." See the above comment on how this may misrepresent the error. On page 21 they write, in relation to their large fractional errors, that they "do not know the origin of this factor but speculate that it could be caused by errors in ligand parameterization". However the estimates from the PBSA approach are, by design, only approximate. Both errors in parameterisation (and their likely origin) and the approximate model used, need discussion.

      Again, the goal of calculating binding free energy was to identify structural correspondence to LA and HA and not to obtain absolute binding free energy values. Along with the least variance (distribution) for the principle component for m3, it also had the highest binding free energy. An association of m1 to LA and m3 to HA was done after comparing them to experimental values (efficiencies). This comparison not only validates our approach but also underscores the utility of PBSA in supplementing MD and PCA analyses with broader energetics perspectives.

      Reviewer #3 (Public Review):

      Weaknesses:

      Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for two other ligands were significantly different than the experiment. It is unclear to what extent the choice of method for the energy calculations influenced the results. See above.

      A control simulation, such as for an apo site, is lacking. Figure 2-figure supplement 2. shows the results of 200 ns MD simulations of the apo structure (n=2).

      Reviewer #4 (Public Review):

      Weaknesses:

      Timescales (200 ns) do not capture global rearrangements of the extracellular domain, let alone gating transitions of the channel pore, though this work may provide a launching point for more extended simulations. A more general concern is the reproducibility of the simulations, and how representative states are defined. It is not clear whether replicates were included in principal component analysis or subsequent binding energy calculations, nor how simulation intervals were associated with specific states.

      We are interested eventually in using MD to study the full isomerization, but these investigations are for the future and likely will involve full length pentamers and longer timescales. However, in response to this query we have in the Discussion raised this issue and offer speculations. See above, PCA has be compared between replicates (Figure 3-figure supplement 1).

      Structural analysis largely focuses on snapshots, with limited direct evidence of consistency across replicates or clusters. Figure legends and tables could be clarified.

      Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories. Incorporated in the legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study gives interesting insights into the possible dynamics of ligand binding in ACh receptors and establishes some prerequisites for necessary and urgent further work. The broad interest in this receptor class means this work will have some reach.

      Suggestions:

      (1) I found the citation of relevant literature to be rather limited. In the following paper, the agonist glutamate was shown to bind in two different orientations, and also to convert. These are much longer simulations than what is presented here (nearly 50 µs), which allowed a richer view of conformational changes and ligand binding dynamics in the AMPA Receptor. Albert Lau has published similar work on NMDA, delta, and kainate receptors, including some of it in eLife. Perhaps the authors could draw some helpful comparisons with this work.

      Yu A et al. (2018) Neurotransmitter Funneling Optimizes Glutamate Receptor Kinetics. Neuron

      Likewise, the comparison to a similar piece of work on glycine receptors (not cited, https://pubs.acs.org/doi/10.1021/bi500815f) could be instructive. Several similar computational techniques were used, and interactions observed (in the simulations) between the agonist and the receptor were tested in the context of wet experiments. In the absence of an equivalent process in this paper (no findings were tested using an orthogonal approach, only compared against known results, from perhaps a narrow spectrum of papers), we have to view the major findings of the paper (docking in cis that leads to a ligand somersault) with some hesitancy.

      The Gharpure 2019 paper is cited in the context of the delta subunit but this paper was about a3b4 neuronal nicotinic receptors. This could be tidied up. Also, the simulations from that paper could be used as an index of the stability of the HA state (if ligand orientation is being cited as transferrable, other observations could be too).

      New citations have been added. It is difficult to generalize from Yu A and Yu R eta al, because in neither study was the ligand orientation associated with LA versus HA binding energy.

      (2) "To start, we associated the agonist orientation in the hold end states as cis in AC-LA versus trans in AC-HA."

      I think this a valid start, but one is left with the feeling that this is all we have and the validity of the starting state is not tested. What was really shown here? Is the docking reliable? What evidence can the authors summon for the ligand orientation that they use as a starting structure? In addition to docking energies, the match between PBSA and electrophysiology Gs and temporal sequence (m1-m2-m3) support the assignment.

      Given that these simulations cover a circumscribed part of the binding process, I think the limitations should be acknowledged. Indeed the authors do mention a number of remaining open questions.

      Paragraphs regarding 'catch' have been added to the Discussion.

      (3) Results around line 90. Hypothetical structures and states that were determined from Markov analyses are discussed as if they are well understood and identified. Plausible though these are, I think the text should underline at least the source of such information. In these simulations, a further intermediate has been identified.

      The model in Figure 1B was first published in 2012 and has been used and extended over the intervening years. In our lab, catch-and-hold is standard. We have published many papers (in top journals), plus reviews, regarding this scheme. We made presentations that are on Youtube. Here, at the end of the Introduction we now cite a new review article (Biophysical Journal, 2024). I am not sure what more we can do to raise awareness regarding catch and hold.

      (4) The figures are dense and could be better organised. Figure 2 is key but has a muddled organization. The placement of the panel label (C) makes it look like the top row (0 ns) is part of (A). Panel B- what is shown in the oval inset (not labeled or in legend). Why not show more than one view, perhaps a sequence of time points? It is confusing to change the colour of the loops in (C). Please show the individual values in D.

      Figure 2 has been redone.

      (5) A lot is made of the aK145 salt bridge with aD200 and the distances - but I didn't see any measurements, or time course. This part is vague to the point of having no meaning ("bridge tightening").

      We present a Table of distance measurements in the SI (Figure 6-table supplement 1).

      Reviewer #2 (Recommendations For The Authors):

      All main comments have been given in the above review. There are a few other minor comments below.

      The 4 agonists examined were acetylcholine (ACh), carbamylcholine (CCh), epibatidine (Ebt), and epiboxidine (Ebx). Could the choices be motivated for the reader?

      New in Methods: the agonists are about the same size yet represent different efficiency classes (citation to companion eLife paper). One of our (unmet) objectives was to understand the structural correlates of agonist efficiency.

      The authors write that state structures generated in the MD simulation were identified by aligning free energy values with those from experiments. It would be good to explain to the reader, in the introduction, how LA and HA free energies were extracted from experiments, rather than relying on them to read older papers.

      In the Introduction, we say that to get G, just measure an equilibrium constant and take the log. We think it is excessive to explain in detail in this paper how to measure the equilibrium binding constants (several methods suffice). However, we have added in Methods our basic approach: measure KLA and L2 by using electrophysiology, and compute KHA from the thermodynamic cycle using L0. We think this paper is best understood in the context of its companion, also in eLife.

      In all equilibrium equations of the type A to B (e.g. on page 5), rather than using "=" signs it would be much better to use equilibrium reversible arrow symbols.

      It is incorporated.

      Reviewer #3 (Recommendations For The Authors):

      (1) Although the match in simulated vs experimental energies for two ligands was very good, the calculated energies for Ebt and Ebx were significantly different than the experiment. Are there any alternative methods for calculating binding energies from the MD simulations that could be readily compared to?

      See above. We did not use more sophisticated energy calculations because we already knew the answers. Our objective was to identify states, not to calculate energies.

      (2) It would be nice to see control simulations of an apo site to ensure that the conformational changes during the MD are due to the ligands and not an artifact of the way the system is set up. I am primarily asking about this as the simulation of the isolated ECDs for the binding site interface seems like it may be unhappy without the neighboring domains that would normally surround it. On that note, was the protein constrained in any way during the MD?

      Apo simulation results are presented in Figure 2-figure supplement 2. The dimer interface seems to be adequate (stable).

      (3) Figure 4A-B: Should the colors for m1 and m3 be reversed?

      Colors have been changed and a bar chart has been added.

      Reviewer #4 (Recommendations For The Authors):

      (1) Although simulations are commendably run in triplicate, it is difficult in some places to discern their consistency.

      (1a) Table S1 provides important quantification of deviations in different replicates and with different agonists. Please confirm that the reported values are accurate. All values reported for the epibatidine system are identical to those reported for carbamylcholine, which seems statistically improbable. Similarly, runs 1 and 3 with epiboxidine seem identical to one another, and runs 1 and 2 with acetylcholine are nearly the same.

      Figure 2-Source Data 1 has been corrected.

      (1b) In reference to Figure S3, the authors comment that the simulated system (one replicate with carbamylcholine) converges within 0.5 Å RMSD of a desensitized experimental structure. This seems amazing; please specify over what atoms this deviation was calculated and with reference to what alignment. It would be interesting to know the reproducibility of this remarkable convergence in additional replicates or with other ligands; for example, Figure 5 indicates that loop C transitions to a lesser extent in the context of epibatidine than other agonists.

      The comparison was for the entire dimer ECD; 0.5 Å is the result. It may be worthwhile to pursue this remarkable convergence, but not in this paper. Here, we are concerned with identifying ACLA and ACHA. Similarity between ACHA and AD structures is for a different study.

      (1c) For principal-component and subsequent analyses, it appears that only one trajectory was considered for each system. Please clarify whether this is the case; if so, a rationale for the selection would be helpful, and some indication of how reproducible other replicates are expected to be.

      We have added new PCA results (Results, Figure 3-figure supplement 1) that show comparable principal components in other replicates.

      (2) Figure 3 shows free energy landscapes defined by principal components of fluctuation in Cα positions.

      (2a) Do experimental structures (e.g. PDB IDs 6UWZ, 7QL6u) project onto any of these landscapes in informative ways?

      6UWZ.pdb matches well with the apo (7QKO.pdb), comparable to m1, and 7QL6.pdb with the m3.

      (2b) Please indicate the meaning of colored regions in the righthand panels.

      The color panels in the top left panel indicate the colored regions in the righthand panel also, which is indicative of direction and magnitude of changes with PC1 and PC2.

      (2c) Please also check the legend; do the porcupine plots really "indicate the direction and magnitude of changes between PC1 and PC2," or rather between negative and positive values of each principal component?

      It indicates the direction and magnitude of changes with PC1 and PC2.

      (3) It would be helpful to clarify how trajectory segments were assigned to specific minima, particularly m2 and m3.

      (3a) Please verify the timeframes associated with the m2 minima, reported as "20-50 ns [with acetylcholine], 50-60 ns [with carbamylcholine], 60-100 ns [with epibatidine, and] 100-120 ns [with epiboxidine]." It seems improbable that these intervals would interleave so precisely in independent systems. Furthermore, the intervals associated with acetylcholine and epiboxidine do not appear to correspond to the m2 regions indicated in Figure S8.

      Times are given in Figure 4-Source Data 1 and Figure 3-figure supplement 2. The m2 classification is based on loop displacement as well as agonist orientation. For all agonists, the selection was strictly from PCA and cluster analysis.

      (3b) The text (and legend to Figure 3) indicate that 180+ ns of each trajectory was assigned to m3, which seems surprisingly consistent. However, Figure S5 indicates this minimum is more variable, appearing at 160 ns with acetylcholine but at 186 ns with carbamylcholine. Please clarify.

      see above: the selection was from PCA and cluster analysis. Times are in Figure 3-figure supplement 2 and also in Figure 4-Source Data 1 (none in Fig. 3 legend).

      (3c) Figures 5, 6, S6, and S7 illustrate structural features of free-energy minima in each ligand system. Please clarify what is shown, e.g. a representative snapshot, centroid, or average structure from a particular prominent cluster associated with a given minimum.

      They are all representative snapshots (now in Methods). Snapshots and distance measurements (Figure 6-table supplement 1) were extracted from m1, m2 and m3 plateau regions of trajectories.

      (4) Figure S4 helpfully shows the behavior of a pentameric control system; however, some elements are unclear.

      (4a) The 2.5-6.5 Å jump in RMSD at ~40 ns seems abrupt; can it be clarified whether this corresponds to a transition to either m2 or m3 poses, or to another feature of e.g. alignment?

      Figure 2-figure supplement 4 left bottom is just the ligand. The jump is the flip, m1 to m2.

      (4b) It seems difficult to reconcile the apparently bimodal distribution of states with the proposed 3-state model. Into which RMSD peak would the m2 intermediate fall?

      The simulations are only to 100 ns, where we found a complete flip of the agonist represented in the histograms. This confirmed that dimer showed similar pattern as the pentamer. In depth analysis was only done only on dimers.

      (4c) The top panel is labeled "Com" with a graphical legend indicating "ACh." Does this indicate the ligand or, as described in the text legend, "the pentamer" (i.e. the receptor)? For both panels, please verify whether they are calculated on the basis of center-of-mass, heavy atoms, Cα, etc.

      "Com" (for complex) has been changed to system (protein+ligand).

      (5) Minor concerns:

      (5a) In Figures 1 and S3, correct the PDB references (6UWX and 7QL7 are not nAChRs).

      They are now corrected.

      (5b) In Figure 4, do all panels represent mean {plus minus} standard deviation calculated across all cluster-frames reported in Table 1?

      Yes.

      Also check the graphical legend in panel A: presumably the red bars correspond to m1/LA, and the blue to m3/HA?

      Corrected

      (5c) In the legend to Figure S1, please clarify that panel B is reproduced from Indurthi & Auerbach 2023.

      This figure has been deleted.

      (5d) As indicated in Figure S2, it seems surprising that the RMSF is so apparently low at the periphery, where the subunits should contact neighbors in the extracellular domain; how might the authors account for this? Specify whether these results apply to all replicates of each system.

      The redness in the periphery for all four systems indicates the magnitude of fluctuation. As we focus on the orthosteric site, we highlight the loops around the agonist binding pocket and kept other regions 75% transparent. We now include Apo simulations and the dimer appears to be stable even without an agonist present.

      (5e) Within each minimum in Figure S5, three "prominent" clusters appear to be colored (by heteroatom) with carbons in cyan, pink, and yellow respectively. If this is correct, note these colors in the text legend.

      Colors have been added to the legend.

      (5f) In Figure S6, note in the legend that key receptor sidechains are shown as spheres, with the ligand as balls-and-sticks, and that ligand conformations in both low- and high-affinity complexes are shown in both receptor states for comparison.

      This is now added in the legend.

      (5g) The legend to Figure S6 also notes "The agonists are as in Fig S4," but that figure contains a single replicate of a different system; please check this reference.

      This has been updated to Figure 5.

      (5h) In Figure S8, the colors in the epibatidine system appear different from the others.

      The colors are the same for m1, m2 and m3 in all systems including epibatidine.

      (5i) In Table 1, does "n clusters" indicate the number of simulation frames included in the three prominent clusters chosen for MM-PBSA analysis? Perhaps "n frames" would be more clear.

      It was a good suggestion. It has now been changed to ‘n frames’

      (5j) Pg 24-ln 453 presumably should read "...that separate it from m1 and m3..."

      This sentence is now changed in the discussion.

    1. Thus the worth of any object to be acquired by our action is always conditional. Beings the exis­tence of which rests not on our will but on nature, if they are beings with­out reason, still have only a relative worth, as means, and are therefore called things, whereas rational beings are called persons because their nature already marks them out as an end in itself, that is, as something that may not be used merely as a means, and hence so far limits all choice (and is an

      I think he is saying that we as humans are not be to used as means to end. To do so would be ethically wrong.

  3. small-tech.org small-tech.org
    1. Personal Small Technology are everyday tools for everyday people. They are not tools for startups or enterprises. Easy to use Personal technology are everyday things that people use to improve the quality of their lives. As such, in addition to being functional, secure, and reliable, they must be convenient, easy to use, and inclusive. If possible, we should aim to make them delightful. Related aspects: inclusive Non-colonial Small technology is made by humans for humans. They are not built by designers and developers for users. They are not built by Western companies for people in African countries. If our tools specifically target a certain demographic, we must ensure that our development teams reflect that demographic. If not, we must ensure people from a different demographic can take what we make and specialise it for their needs. Related aspects: share alike, non-commercial, interoperable Private by default A tool respects your privacy only if it is private by default. Privacy is not an option. You do not opt into it. Privacy is the right to choose what you keep to yourself and what you share with others. “Private” (i.e., for you alone) is the default state of small technologies. From there, you can always choose who else you want to share things with. Related aspects: zero knowledge, peer to peer Zero knowledge Zero-knowledge tools have no knowledge of your data. They may store your data, but the people who make or host the tools cannot access your data if they wanted to. Examples of zero-knowledge designs are end-to-end encrypted systems where only you hold the secret key, and peer-to-peer systems where the data never touches the devices of the app maker or service provider (including combinations of end-to-end encrypted and peer-to-peer systems). Related aspects: private by default, peer to peer Peer to peer Peer-to-peer systems enable people to connect directly with one and another without a person (or more likely a corporation or a government) in the middle. They are the opposite of client/server systems, which are centralised (the servers are the centres). On peer to peer systems, your data – and the algorithms used to analyze and make use of your data – stay in spaces that you own and control. You do not have to beg some corporation to not abuse your data because they don’t have it to begin with. Related aspects: zero knowledge, private by default Share alike Most people’s eyes cloud over when technology licenses are mentioned but they’re crucial to protecting your freedom. Small Technology is licensed under Copyleft licenses. Copyleft licenses stipulate that if you benefit from technology that has been put into the commons, you must share back (“share alike”) any improvements, changes, or additions you make. If you think about it, it’s only fair: if you take from the commons, you should give back to the commons. That’s how we cultivate a healthy commons. Related aspects: interoperable, non-colonial, non-commercial Interoperable Interoperable systems can talk to one another using well-established protocols. They’re the opposite of silos. Interoperability ensures that different groups can take a technology and evolve it in ways that fit their needs while still staying compatible with other tools that implement the same protocols. Interoperability, coupled with share alike licensing, helps us to distribute power more equally as rich corporations cannot “embrace and extend” commons technology, thereby creating new silos. Interoperability also means we don’t have to resort to colonialism in design: we can design for ourselves and support other groups who design for themselves while allowing all of us to communicate with each other within the same global network. Related aspects: share alike, non-colonial Non-commercial The primary purpose for Small Technology is not to make a profit but to increase human welfare. As such, they are built by not-for-profit organisations. Eventually, we hope that small technologies will be recognised for their contribution to the common good and therefore supported from the commons (e.g., from our taxes). In the interim, some methods for monetising Small Technology include: Charging for hosting and maintenance services Sales on App Stores (for native apps) Donations and patronage Grants and awards Equity-based / Venture Capital investment is incompatible with Small Technology as the success criterion is the sale of the organisation (either to a larger organisation or to the public at large via an IPO). Small Technology is not about startups (temporary companies designed to either fail fast or grow exponentially and get sold), it’s about stayups (sustainable organisations that contribute to the common good). Related aspects: non-colonial, share alike, interoperable Inclusive Being inclusive in technology is ensuring people have equal rights and access to the tools we build and the communities who build them, with a particular focus on including people from traditionally marginalised groups. Accessibility is the degree to which technology is usable by as many people as possible, especially disabled people Small Technology is inclusive and accessible. With inclusive design, we must be careful not to assume we know what’s best for others, despite us having differing needs. Doing so often results in colonial design, creating patronising and incorrect solutions.

      Small Technology Small Technology are everyday tools for everyday people designed to increase human welfare, not corporate profits.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary Maintenance of the histone H3 variant CENP-A at centromeres is necessary for proper kinetochore assembly and correct chromosome segregation. The Mis18 complex recruits the CENP-A chaperone HJURP to centromeres to facilitate CENP-A replenishment. Here the authors characterise the Mis18 complex using hybrid structural biology, and determine complex interface separation-of-function mutants.

      Major Comments The SAXS and EM data on the full-length Mis18 components must be included in the main Figures, either as an additional figure or by merging/rearranging the existing figures. The authors discuss these results in three whole paragraphs, which are a very important part of the paper.

      We thank the reviewer for this constructive suggestion. We have now included an additional figure (new Fig. 2, attached below), that highlights the fit of the integrative model against the SAXS and EM data.

      Could the authors also compare the theoretical SAXS scattering curves generated by their final model(s) with the experimental SAXS curves? This would provide some additional evidence for the overall shape of their complex model beyond the consistency with the Dmax/Rg.

      We acknowledge the importance of this suggestion. We have now compared the theoretical SAXS scattering curve of the Mis18a/b core complex (named Mis18a/b DN), which lacks the flexible elements (disordered regions and the helical region flexibility connected to the Yippee domains). The theoretically calculated SAXS scattering curve of the model matches nicely with the experimental data with c2 value of 1.36. This data is now included in new Fig. 2 (Fig. 2f) and is referenced on page 9 line 21.

      Minor Comments

      While the introduction is clearly written, an additional cartoon schematic, representing the system/question would be helpful to a non-specialist reader to interpret the context of the study.

      We have now included a cartoon in the revised Fig. 1 to support the introduction on centromere maintenance and the central role of the Mis18a/b/BP1 complex in this process. Please find the new Fig. 1 below.

      No doubt the authors had a reason for choosing their figure allocation, but I wonder if more material couldn't be brought from the supplementary into the main figures?

      As addressed in our response to one of the major comments, we have now moved key CLMS, SAXS and EM data from the supplemental figure into the main figure, new Fig. 2.

      Page 6 "Mis18-alpha possesses an additional alpha-helical domain" - please make it clear in addition to what (I assume it's in addition to Mis18-beta).

      Apologies for the lack of clarity. We have now rephrased this sentence to highlight that this difference is in comparison with Mis18b on page 6 line 15.

      Page 7 - Report the RMSD of the Pombe vs. Human Mis18-alpha yipee structures?

      The S. pombe Mis18 Yippee structure superposes on to the Human Mis18a Yippee domain with an RMSD of 0.92 angstroms with is now mentioned on page 7 line 9.

      Page 7 - "We generated high-confidence structural models...." is there a metric for the confidence as reported by RaptorX? Perhaps includinging the PAE plots in the supplementary for the AlphaFold generated models would be useful?

      We thank the reviewer for the valid suggestion. We have now included the PAE plot corresponding to the AlphaFold model in the supplementary Fig. S1d and reference on page 7 line 18. RaptorX ranks models based on estimated error. We have now included this information in the new figure legend for Supplementary Fig. S1.

      Figure 1 - Perhaps label figure 1b as being experimentally determined, with the R values (as for Figure 1d), and 1c being a predicted model.

      We have included Rfree and Rwork values for the Mis18a Yippee homo dimer structure and labelled Mis18a/b Yippee hetero-dimer as the predicted model in Fig. 1c and 1d.

      Page 8 "This observation is consistent with the theoretically calculated pI of the Mis18alpha helix" This is a circular argument, of course this region has a low pI due to the amino acid composition. Please remove this statement.

      We have now removed this statement as suggested.

      Page 8 "...reveals tight hydrophobic interactions" these are presumably shown in Figure 1d rather than in the referenced 1e.

      We apologise for the oversight. We have now referred to the correct figure (Fig. 1f in the revised Fig. 1).

      Page 8 - The authors should briefly somewhere discuss why there is a difference between their results and those in Pan et al 2009. As I understand it, the Pan et al paper was based in part on modelling with CLMS data as restraints.

      We thank the reviewer for this suggestion. According to Pan et al., 2009, the model shown by them was generated using CCBuilder, and their CLMS data could not differentiate the two models with the 2nd Mis18a C-terminal helix in either parallel or anti-parallel orientation. We now briefly discuss this on page 8 and line 22 as follows: "Although the Pan et al., 2019 model presented the 2nd Mis18a in a parallel orientation, they did not rule out the possibility of this assembling in an anti-parallel orientation within the Mis18a/b C-terminal helical assembly (Pan et al., 2019)."

      Figure 1 - The labelling of the residues for Mis18-alpha in Figure 1d is problematic, they are black on dark purple (might be my printer/screen/eyes) suggest amending.

      We have now rearranged the label positions to overcome this issue. For clarity, the labels that could not be moved appropriately are shown in white.

      Figure S3a - Do the authors have some data to show the mass of the cross-linked complex that was loaded onto grids is consistent with what is expected?

      Unfortunately, the amount of material that we recover after performing GraFix is not sufficient enough to determine the molecular weight of the crosslinked sample by techniques such as SEC-MALS. However, GraFix fractions were analysed by SDS PAGE, and fractions that ran around the expected molecular weight were selected for EM analysis. We have now included the corresponding SDS-PAGE showing the migration of the crosslinked sample analysed by EM (Supplementary Fig. S3a).

      Figure S3b - scale bar

      Revised Fig. 2d now includes the scale bar shown.

      Figure S3c - Could the authors show or explain the differences between these different 3D reconstructions?

      The models mainly differ in the relative orientations of the bulkier structural features that are referred to as 'ear' and 'mouth' pieces of a telephone handset. This has been mentioned in the text, but we note that the figure is not referenced right next to this statement. We have now amended this (Page 9 line 19), and to make it clear, we have also highlighted the difference using an arrowhead in Fig. 2e and S3b. The different orientations are also stated in the corresponding figure legends.

      Page 9 - The use of "AFM" for AlphaFoldMultimer" is a little confusing since AFM is the established acronym for Atomic Force Microscopy. Perhaps AF2M?

      We have now replaced AFM with AF2M on page 9 to avoid confusion.

      Figure S4a - Control missing for Mis18-alpha wild-type

      Apology for the confusion, this control is present in Fig. 4a. We have now stated this in the figure legend of S4a for clarity.

      Figure S4 d and e - The contrast between the bands and the background is very bad (at least in my copy).

      We have now adjusted the contrast of the blots in Fig. S4d and S4e response to this comment.

      Page 13 "Our structural analysis suggests that two Mis18BP1 fragments.....". How did you arrive at this conclusion? Is this based on the AlphaFold/RaptorX model? What additional evidence do you have that the positioning of the Mis18BP1 is correct? Does the CLMS data support this?

      We confirm that this statement is based on AlphaFold model. We have now explicitly highlighted this on page 14, line 5. As noted in the same paragraph (page 14, line 19), this model agrees with the contacts suggested by the cross-linking mass spectrometry data presented here.

      Figure 4a - Would the authors like to consider using a different colour for Mis18BP1? The contrast is not great, especially in the electrostatic surface inset.

      In response to this suggestion, the Mis18BP1 helix is now shown in grey in the inset of Fig. 5a.

      Reviewer #1 (Significance (Required)):

      General Assessment The paper is extremely clearly written. Likewise the figures are beautifully presented and the data extremely clean and fully supportive of the authors conclusions. Indeed it is seldom that one sees the depth of the structural approaches (X-ray, CLMS, EM, SAXS) in one paper which is a huge strength of the manuscript. In addition the translation of this data into very clean cell biological experiments, makes the paper truly outstanding.

      Advance The authors provide the first model of the Mis18 complex, with extensive evidence to back up this model. The authors provide additional evidence as to how the deposition/renewal of CENP-A might be mediated by the Mis18 complex. The advance comes from both the level of clarity, detail, and scope achieved in this paper.

      Audience This will likely be of great interest to anyone with an interest in chromosome biology, plus be of interest to structural biologists as an outstanding example of hybrid structural biology.

      Expertise I am a biochemist with a background in structural biology with some familiarity with centromere biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The manuscript "structural basis for Mis18 complex assembly: implications for centromere maintenance" by Thamkachy and colleagues describes a study that uses structural analysis to test essential candidate residues in Mis18 complex components in CENP-A loading. For chromosomes to faithfully segregate during cell division, CENP-A levels must be maintained at the centromere. How CENP-A levels are maintained is therefore important to understand at the mechanistic level. The Mis18 complex has been found to be important, but how exactly the various Mis18 complex components interact and how they regulate new CENP-A loading remains not fully understood. This study set out to characterize the critical residues using X-ray crystallography, negative staining EM, SEC analysis, molecular modeling (Raptorx, AlphaFold2, and AlphaFold-multimer) to identify the residues of Mis18a and Mis18b that are critical for the formation of the Mis18a/b hetero-hexamer and which residues are important for Mis18a and Mis18BP1 interactions. A complex beta-sheet interface dictates the Mis18a and Mis18b interactions. Mutating the Mis18a residues that are important for the Mis18a/b interactions resulted in impaired pull-down of Mis18b and reduced centromeric levels of mutated Mis18a. The functional consequences of mutating residues that impair Mis18a/b interactions is that with reduced centomeric levels of Mis18a, also impaired new CENP-A loading. Interestingly, mutated Mis18b did not impact centromeric Mis18a levels and only modestly impaired new CENP-A loading. These data were interpreted that Mis18a is critical for new CENP-A loading, whereas Mis18b might be involved in finetuning how much new CENP-A is loaded. Overall, it is a very well described and well written study with exciting data.

      Major comments:

      • Overall, the structural data and the IF data support the importance of Mis18a residues 103-105 are critical for centromeric localization and new CENP-A loading, whereas Mis18b residues L199 and I203 are critical for centromeric localization, but only very modestly impair centromeric Mis18a localization and new CENP-A loading. In the discussion the authors argue that the N-terminal helical region of Mis18a mediate HJURP binding. This latter is postulated based on published work, but not tested in this work. This should be clarified as such.

      We thank the reviewer for this comment. Our very recent study aimed at understanding the licencing role of Plk1, independent of the work reported here, serendipitously has now validated this suggestion and demonstrates that a Plk1-mediated phosphorylation cascade activates the Mis18a/b complex via a conformational switch of the N-terminal helical region of Mis18a, which facilitates a robust HJURP-Mis18a/b interaction (Parashara et al. bioRxiv 2024). An independent study from the Musacchio lab (Conti et al. bioRxiv, 2024) also reports similar findings, mutually strengthening our independent conclusions. Overall, these studies highlight the importance of the critical structural insights into the Mis18 complex this study reports. We now explicitly discuss the validation of our original hypothesis by citing our recent work along with that of the Musacchio lab. The corresponding section of the last paragraph now reads as follows (page 17 line 10): "Previously published work identified amino acid sequence similarity between the N-terminal region of Mis18a and R1 and R2 repeats of the HJURP that mediates Mis18a/b interaction (Pan et al., 2019). Deletion of the Mis18a N-terminal region enhanced HJURP interaction with the Mis18 complex (Pan et al., 2019). Here, we show that the N-terminal helical region of Mis18a makes extensive contact with the C-terminal helices of Mis18a and Mis18b, which had previously been shown to mediate HJURP binding by Pan et al., 2019. Collectively these observations suggest that the N-terminal region of Mis18a might directly interfere with HJURP - Mis18 complex interaction. Two independent recent studies (Parashara et al., 2024, Conti et al., 2024) reveal that this is indeed the case and a Plk1-mediated phosphorylation cascade involving several phosphorylation and binding events of the Mis18 complex subunits relieve the intramolecular interactions between the Mis18a N-terminal helical region and the HJURP binding surface of the Mis18a/b C-terminal helical bundle. This facilitates robust HJURP-Mis18a/b interaction in vitroand efficient HJURP centromere recruitment and CENP-A loading in cells. Overall, these studies also highlight the importance of the critical structural insights into the Mis18 complex we report here."

      • Overall, the authors clearly describe their data and methodology and use adequate statistical analyses. The structural data of the Mis18a/b complex being a hetero-hexamer is convincing, but the validation in vivo is missing. As structural experiment are not performed under physiological conditions, it is important to establish the stoichiometry in vivo to further support the totality of the findings of the structural experiments and modeling. The data for the hierarchical assembly of Mis18a and Mis18b at the centromere and its importance in new CENP-A loading is convincing. An additional open question is whether "old" centromeric CENP-A or HJURP:new CENP-A complex is needed to recruit Mis18a to the centromere and whether the identified residues have a role in Mis18a centromeric localization. These data would provide a solid link between the Mis18 complex and how it is directly linked to new CENP-A loading.

      We agree that establishing the stoichiometry of Mis18 subunits of the Mis18 complex in vivo would be insightful. However, considering that the Mis18 complex assembles in a specific window of the cell cycle (late Mitosis and early G1), we think characterising the stoichiometry in cells is extremely difficult and technically challenging. However, consistent with our structural model, several lines of independent evidence (Pan et al., 2017 and Spiller et al., 2017) using different biophysical methods (Analytical Ultra Centrifugation (Pan et al., 2017), SEC-MALS (Spiller et al., 2017)) showed that recombinantly purified Mis18 complex (irrespective of the expression host, from both E. Coli or insect cells) is a hetero-octamer made of a hetero-hexameric Mis18a/b (4 Mis18a and 2 Mis18 b) complex bound to two copies of Mis18BP1. These observations suggested that hetero-hexamerisation of the Mis18a/b complex may be needed to bind and dimerise Mis18BP1 in cells. Previously published cellular studies support the in vivo requirement of the hetero-octameric Mis18 assembly as: (i) Perturbing the hetero-hexamerisation of the Mis18a/b complex (by introducing mutations at the Mis18a/b Yippee dimerisation interface, which while did not disrupt Mis18a/b complex formation, perturbed its hetero-hexamerisation and resulted in a hetero-trimeric Mis18a/b complex made of 2 Mis18aand 1 Mis18b) abolished Mis18BP1 binding in vitro and in cells, consequently abolished CENP-A deposition (Spiller et al., 2017) and (ii) artificial dimerisation of Mis18BP1, by expressing Mis18BP1 as a GST-tagged protein, enhanced the centromere localisation of Mis18BP1 highlighting the requirement of Mis18a/b hexameric assembly mediated dimerization of Mis18BP1 in cells (Pan et al., 2017). While these studies highlighted the importance of maintaining the right stoichiometry (hetero-octamer of 4 Mis18a, 2 Mis18b and 2 Mis18BP1), lack of structural information on how this essential biological assembly is established remained a major knowledge gap. Our work presented here fills this critical knowledge gap by showing that a segment of Mis18BP1 (aa 20-51) also binds at the Yippee dimerisation interface. To highlight this, we have included the following statements in the introduction on page 5 and 20 "Perturbing the Yippee domain-mediated hexameric assembly of Mis18a/b (that resulted in a Mis18a/b hetero-trimer, 2 Mis18a and 1 Mis18b) abolished its ability to bind Mis18BP1 in vitro and in cells (Spiller et al., 2017), emphasising the requirement of maintaining correct stoichiometry of Mis18a/b subunits. Consistent with this, artificial dimerisation of Mis18BP1, by expressing Mis18BP1 as a GST-tagged protein, enhanced the centromere localisation of Mis18BP1 (Pan et al., 2017)." and in the Results section on page 14 line 12: "Mis18BP120-51 contains two short b strands that interact at Mis18a/b Yippee interface extending the six-stranded-b sheets of both Mis18a and Mis18b Yippee domains. This provides the structural rationale for why Yippee domains-mediated Mis18a/b hetero-hexamerisation is crucial for Mis18BP1 binding (Spiller et al., 2017)."

      Regarding the question "whether 'old' centromeric CENP-A or HJURP:new CENP-A complex is needed to recruit Mis18a centromere localisation and whether identified residues have a role in Mis18a centromere localisation": According to the published literature, the Mis18 complex associates with centromeres through interaction with CCAN components CENP-C and CENP-I (Shono et al., 2015, Dambacher et al., 2012, Moree et al., 2011, Hoffmann et al., 2020). Considering CCAN assembles on CENP-A nucleosomes, and HJURP:new CENP-A centromere recruitment depends on the Mis18 complex, it will be reasonable to argue that the 'old' centromeric CENP-A contributes to the centromere localisation of the Mis18 complex. Amongst the components of the Mis18 complex, Mis18BP1 and Mis18bhave previously been suggested to interact with CENP-C. Within the Mis18 complex, we (Spiller et al., 2017) and others (Pan et al., 2017) have shown that Mis18a can directly interact with Mis18BP1, but it does so more efficiently when Mis18a hetero-oligomerises with Mis18b via their Yippee domains. Here, our structural analysis mapped the interaction interfaces and showed that Mis18a residues E103, D104 and T105 contribute to Mis18BP1 binding, as mutating these residues abolishes centromere localisation of Mis18a (Fig. 5c and 5d). To accentuate our findings, we have now included the following paragraph in the discussion section (page 17 line 26): "One of the key outstanding questions in the field is how does the Mis18 complex associate with the centromere. Previous studies identified CCAN subunits CENP-C and CENP-I as major players mediating the centromere localisation of the Mis18 complex mainly via Mis18BP1 (Shono et al., 2015, Dambacher et al., 2012, Moree et al., 2011), although Mis18b subunit has also been suggested to interact with CENP-C (Stellfox et al., 2016). Within the Mis18 complex, we and others have shown that the Mis18a/b Yippee hetero-dimers can directly interact with Mis18BP1. Here our structural analysis allowed us to map the interaction interface mediating Mis18a/b-Mis18BP1 binding. Perturbing this interface on Mis18a completely abolished Mis18a centromere localisation and reduced Mis18BP1 centromere levels. These observations show that Mis18a associates with the centromere mainly via Mis18BP1, and assembly of the Mis18 complex itself is crucial for its efficient centromere association, as previously suggested. Future work aimed at characterising the intermolecular contact points between the subunits of the Mis18 complex, centromeric chromatin and CCAN components and understanding if the Mis18 complex undergoes any conformational and/or compositional variations upon centromere association and/or during CENP-A deposition process, will be crucial to delineate the mechanisms underpinning the centromere maintenance."

      Minor comments:

      • The bar graphs shown ideally also show the individual data points for the authros to appreciate the spread of the data. These figures can be replicated in the Supplemental to avoid making the main figures look too busy.

      We thank the reviewers for this suggestion. Reviewer #3 made a similar comment and suggested we use Superplot, which allows visualisation of individual data points of independent experiments. We have now revised all bar graphs using Superplot to address both reviewers' suggestions.

      Reviewer #2 (Significance (Required)):

      • This study uses a broad range of structural techniques, including molecular modeling which were subsequently validated by in vitro pull-down assays, co-IP, and IF. This combination of these techniques is important because many structural techniques cannot be performed under physiological conditions. Validating the main findings of the structural results by IF and co-IP is therefore critical.
      • This work greatly advances our structural understanding how Mis18a, Mis18b, and Mis18BP1 form the Mis18 complex and how the critical residues in especially Mis18a help the Mis18 complex localize to the centromere and influence new CENP-A loading. This study also provides the first strong evidence in hierarchical assembly of the Mis18 complex.
      • How centromere identity is maintained is a critical question in chromosome biology and genome integrity. The Mis18 complex has been identified as an important complex in the process. Several structural and mutational studies (all adequately cited in this manuscript) have tried to address which residues guide the assembly and functional regions of the Mis18 complex. This work builds and expands our understanding how especially Mis18a holds a pivotal role in both Mis18 complex formation and its impact on maintaining centromeric CENP-A levels.
      • This work will be of interest to the chromosome field in general and anyone studying the mechanism of cell division.
      • Chromatin, centromere, CENP-A, cell division. This reviewer has limited expertise in structural biology.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Centromere identity is defined by CENP-A loading to specific sites on genomic DNA. CENP-A loading is known to rely on the Mis18 complex, and several regulators are known; yet how the Mis18 complex achieves this complex process has remained puzzle. By elucidating the structural basis of Mis18 complex assembly using integrative structural approaches the authors show that multiple homo and heterodimeric interfaces of Mis18alpha, beta and Mis18BP1 are involved in centromere maintenance. The authors show that Mis18alpha can associate with centromeres and deposit CENP-A independent of Mis18 β. Mis18α functions in CENP-A deposition at centromeres independent of Mis18β. Mis18β is required for maintaining a specific level of CENP-A occupancy at centromeres. Thus, using structure-guided and separation-of-function mutants the study reveals how Mis18 complex ensures centromere maintenance. Major comments: This is an excellent study on centromere inheritance, combining structural and cell biology techniques. The comments here primarily refer to Cell biology aspect of the work.

      Figures show that new CENP-A deposits in Mis18βL199D/I203D mutants, but the level was reduced moderately. Based on this observation, the authors make a strong conclusion that Mis18β licenses the optimal levels of CENP-A at centromeres. Mis18α may be essential for both CENP-A incorporation and depositing a specific amount of CENP-A, as Mis18α and CENP-A levels are both reduced in Mis18βL199D/I203D mutants which failed to form the triple helical assembly with Mis18α as shown in Figure 3B and 3C. The authors may want to qualify some of these claims as preliminary or speculative.

      We thank the reviewer for this suggestion. We agree that although the reduction in CENP-A levels upon replacing WT Mis18b with Mis18b L199D/I203D is more prominent than the reduction in centromere localised Mis18a, one cannot completely rule out the contribution of reduced Mis18a on CENP-A loading. This also raises an interesting possibility where Mis18b ensures the correct amount of CENP-A deposition by facilitating the optimal level of Mis18a at centromeres. We now explicitly discuss this in the discussion as follows (page 16 line 26): "Whilst proteins involved in CENP-A loading have been well established, the mechanism by which the correct levels of CENP-A are controlled is yet to be thoroughly explored and characterised. The data presented here suggest that Mis18b mainly contributes to the quantitative control of centromere maintenance - by ensuring the right amounts of CENP-A deposition at centromeres - and maybe one of several proteins that control CENP-A levels. We also note that the Mis18b mutant, which cannot interact with Mis18a, moderately reduced Mis18a levels at centromeres, and hence, it is possible that Mis18b ensures the correct level of CENP-A deposition by facilitating optimal Mis18a centromere recruitment. Future studies will focus on dissecting the mechanisms underlying the Mis18b-mediated control of CENP-A loading amounts along with any other mechanisms involved."

      This work and others show that phosphorylation of Mis18BP1 by CDK1 can interfere with complex function (Spiller et al., 2017, Pan et al., 2017). Does the structure provide any insight into PLK1-mediated phosphorylation surfaces for activation of the complex? If yes, a brief discussion would help to link CDK1 and PLK1 mediated opposing actions will strengthen the work.

      As described in our response to the first major comment of Reviewer 2, our very recent study aimed at understanding the licencing role of Plk1, independent of the work reported here, identified and evaluated the functional contribution of Plk1 phosphorylation on the subunits of the Mis18 complex (Parashara et al., bioRxiv 2024). Serendipitously, this recent work has now validated our hypothesis proposed based on the structural characterisation reported here and demonstrates that a Plk1-mediated phosphorylation cascade activates the Mis18a/b complex via a conformational switch of the N-terminal helical region of Mis18a which facilitates a robust HJURP-Mis18a/b interaction (Parashara et al. bioRxiv 2024). An independent study from the Musacchio lab (Conti et al., bioRxiv 2024) also reports similar findings, mutually strengthening our independent conclusions. Overall, these studies highlight the importance of the critical structural insights into the Mis18 complex this study reports. We now explicitly discuss the validation of our original hypothesis by citing our recent work along with that of the Musacchio lab. The corresponding section of the last paragraph now reads as follows (page 17 line 10): "Previously published work identified amino acid sequence similarity between the N-terminal region of Mis18a and R1 and R2 repeats of the HJURP that mediates Mis18a/binteraction (Pan et al., 2019). Deletion of the Mis18a N-terminal region enhanced HJURP interaction with the Mis18 complex (Pan et al., 2019). Here, we show that the N-terminal helical region of Mis18a makes extensive contact with the C-terminal helices of Mis18a and Mis18b, which had previously been shown to mediate HJURP binding by Pan et al., 2019. Collectively these observations suggest that the N-terminal region of Mis18a might directly interfere with HJURP - Mis18 complex interaction. Two independent recent studies (Parashara et al., 2024, Conti et al., 2024) reveal that this is indeed the case and a Plk1-mediated phosphorylation cascade involving several phosphorylation and binding events of the Mis18 complex subunits relieve the intramolecular interactions between the Mis18a N-terminal helical region and the HJURP binding surface of the Mis18a/b C-terminal helical bundle. This facilitates robust HJURP-Mis18a/b interaction in vitro and efficient HJURP centromere recruitment and CENP-A loading in cells. Overall, these studies also highlight the importance of the critical structural insights into the Mis18 complex we report here."

      I am happy with the way cell biology data and the methods are presented so that they can be reproduced. The experiments are adequately replicated and the statistical analysis adequate. It will help to include sample size of cells or centromeres used for building the graphs.

      We have now included this information in figure legends of Fig. 3a, 3c, 4b, 4c, 5b, 5c and 5d.

      This is a strong interdisciplinary study using a variety of in vitro and in vivo techniques. Can the authors discuss if they expect chromatin associated Mis18 complex to host a similar structure as the soluble one? In other words, are they able to comment on any key differences between chromatin and non-chromatin associated Mis18 complexes.

      We thank the reviewer for the suggestion. We agree that one cannot rule out the possibility of the Mis18 complex undergoing compositional and/or conformational variations during the processes of CENP-A loading at centromeres. We now explicitly discuss this possibility in the last paragraph of the discussion section (page 18 line 10): "Future work aimed at characterising the intermolecular contact points between the subunits of the Mis18 complex, centromeric chromatin and CCAN components and understanding if the Mis18 complex undergoes any conformational and/or compositional variations upon centromere association and/or during CENP-A deposition process, will be crucial to delineate the mechanisms underpinning the centromere maintenance."

      Minor comments: -

      In cell biology experiments, fluorescence intensities could be presented as a superplot for added value across cells and repeats (instead of bar graphs). More on superplot:https://doi.org/10.1083/jcb.202001064.

      We thank the reviewers for this kind suggestion. We have now included graphs made using 'superplot' as suggested.

      In general, ACA levels do not appear to change significantly between WT and mutant expressing cells although new CENP-A loading is significantly absent in the presence of a few mutants - please comment if ACA used here can recognise CENP-A. Would this mean that old CENP-A remains normally?

      We thank the reviewer for this comment. While new CENP-A incorporated at centromeres is selectively labelled using the SNAP-tag, the ACA antibody used in these experiments can recognise CENP-A, CENP-B and CENP-C, with CENP-B being the primary target (Kallenberg, Clinical Rheumatology,1990). We would also like to note that ACA has commonly been used to locate the centromere in CENP-A loading assays where new CENP-A levels are assessed via selective labelling (e.g. McKinley 2014).

      It is unclear whether any of the mutant acted in a dominant negative fashion in the presence of endogenous Mis18 proteins. It would have been useful to test this particularly in the context of mis18alpha mutants that seem to fully abolish new CENP-A recruitment.

      As Mis18 subunits oligomerise (homo and hetero), we thought expressing these mutants in the presence of endogenous proteins might interfere with endogenous protein in a heterogenous manner and might make the interpretation difficult. Hence, we did not test this. Instead, as described in the manuscript we have tested these mutants in siRNA rescue experiments (Fig. 3, 4 and 5).

      In figure 3a, GFP panel (input lane, 1) is shown to mark a band corresponding to GFP. Is this expected? Please comment.

      Yes, as a control, an empty vector was transfected to express just GFP along with Mis18a-mCherry. These were used to show that there was no unspecific interaction between the beads used for IP or Mis18a-mCherry and GFP tag, and that any interaction seen was due to Mis18b. A similar control was used in S4b, where mCherry was expressed along with Mis18b-GFP. We have now clarified this in the corresponding legends of Fig. 4a and S4b.

      Would be useful to have the scale for the cropped images presented as insets. Figure 4B should read YFP and not YPF.

      We apologise for this typographical error. We have now corrected this.

      The authors may want to explain whether the tag differences matter for their study (Case in point: His-SUMO-Mis18a191-233 WT and mutant His-MBP-Mis18b188-229 proteins).

      The MBP tag was chosen to perform amylose pull-down assays, whereas the SUMO tag was chosen to increase the protein size. This is crucial as the C-terminal fragments of Mis18a and Mis18b are less than 50 amino acids long and are not easy to visualise by the band intensity in the Coomassie-stained SDS PAGE gels.

      Reviewer #3 (Significance (Required)):

      This work elucidates the structural basis of Mis18 complex assembly and the intermolecular interfaces essential for Mis18 functions. This is a significant advance in the field as it helps researchers in the field better understand CENP-A deposition and mechanism underpinning the maintenance of centromere identity. This is a broad area of research benefitting those studying cell division, genome stability, centromere identity and epigenetics might all be interested in and influenced by these findings. Novelty and strength lies in combining structural and cell biology work. Strengths of the work are structural details of the Mis18 complex. Minor weakness is the link between Mis18 structure and Centromere inheritance is limited to one immunostaining assay (I have mentioned this as a minor comment because addressing this may not be within the scope of this manuscript and is likely to require a repeat of a vast majority of the work with additional reagents which may not directly add value to the current manuscript).

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We will make the suggested change of adding “sequence”. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. And the example given is not helpful to ascertain what other changes may be necessary, since we cannot see any problem with the sentence “we assembled a chromosome-scale genome of” as this phrase is widely used in many similar publications.

      As for panel E of figure 2, it is not missing. The panel located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      • Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced, individual tissues derived from juvenile fish were used for the genome annotation and whole larval fish for the developmental analysis. We will specify in the figures and text that the results shown are from whole larvae, and add more detail to the material and methods section about which type of sample was analysed in which way.

      • The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We will upload the code used to assemble and annotate this genome to a public repository or add it to the supplementary material.

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material. The annotation workflow has been described in a previous publication already, but an in-depth description can also be found in the Material and methods section, including parameters used for specific steps. The RNA-seq mapping and analysis part has also been described in the Material and Methods section, including parameters and models for DESEq2.

      • Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications. This would really help (general) readers.

      We will add a comment on this in the manuscript as suggested. Basically, the T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormone-mediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      • Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      As the reviewer notes, there are a large number of differentially expressed genes due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60.

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). Normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/): “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods; an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the possible interaction of corticoids and TH during metamorphosis, a question that is certainly not settled yet in teleost fishes, and 2) the existence of an early larval developmental step also involving TH and corticosteroids.

      Especially 2) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of our study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remain a contentious issue.

      We wrote that “there is contrasting evidence of communication between these two pathways [TH and corticosteroids] in teleost fish with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish (ref 19), golden sea bream (ref 100) and silver sea bream (ref 101). Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder (ref 20). TH exposure increases MR and GR genes expression in zebrafish embryo (ref 55). It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner (ref 56) On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp, decreasing cortisol levels (ref 57), whereas cortisol exposure decreases TH levels in European eel (ref 58). Given this scattered evidence, the existence of a crosstalk active during teleost metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis and cortisol synthesis are activated (i) during early development and (ii) during metamorphosis. This may suggest that in some aspect cortisol synthesis can work in concert with TH, as has been shown in several different contexts in amphibians (ref 17).” In the revised manuscript, we will also add the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field.

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiment in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period.

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. We will make this clear in the revised manuscript.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption, 2) that there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) that we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. To reinforce our point, we plan to add a figure to the revised manuscript, which puts our data in the context of earlier studies done in grouper. This will clearly show that our inference is correct. Additionally, we would like to point out that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. We will make sure to improve the clarity of the revised version of the manuscript to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here.

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We will improve the clarity of the manuscript to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared line 195 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction.

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We will correct this in the revised version.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). We are planning to add a figure reconciling all these data together. However, the main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we will pay attention to improve that part too.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message is unclear, which we will address in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We will also make sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. Author Response

      We provide here a provisional response to the Public Comments and main issues raised by the reviewers. We appreciate the opportunity to submit a revision and will give all of the reviewers’ comments careful consideration when modifying the manuscript.

      (1) BioRxiv version history.

      Reviewer 1 correctly noted that we have posted different versions of the paper on bioRxiv and that there were significant changes between the initial version and the one posted as part of the eLife preprint process. Here we provide a summary of that history.

      We initially posted a bioRxiv preprint in November, 2021 (Version 1) that included the results of two experiments. In Experiment 1, we compared conditions in which the stimulation frequency was at 2 kHz, 3.5 kHz, or 5.0 kHz. In Experiment 2, we replicated the 3.5 kHz condition of Experiment 1 and included two amplitude-modulated (AM) conditions, with a 3.5 kHz carrier signal modulated at 20 Hz or 140 Hz. Relative to the sham stimulation, non-modulated kTMP at 2 kHz and 3.5 kHz resulted in an increase in cortical excitability in Experiment 1. This effect was replicated in Experiment 2.

      In the original posting, we reported that there was an additional boost in excitability in the 20 Hz AM condition above that of the non-modulated condition. However, in re-examining the results, we recognized that the 20 Hz AM condition included an outlier that was pulling the group mean higher. We should have caught this outlier in the initial submission given that the resultant percent change for this individual is 3 standard deviations above the mean. Given the skew in the distribution, we also performed a log transform on the MEPs (which improves the normality and homoscedasticity of MEP distributions) and repeated the analysis. However, even here the participant’s results remained well outside the distribution. As such, we removed this participant and repeated all analyses. In this new analysis, there was no longer a significant difference between the 20 Hz AM and nonmodulated conditions in Experiment 2. Indeed, all three true stimulation conditions (nonmodulated, AM 20 Hz, AM 140 Hz) produced a similar boost in cortical excitability compared to sham. Thus, the results of Experiment 2 are consistent with those of Experiment 1, showing, in three new conditions, the efficacy of kHz stimulation on cortical excitability. But the results fail to provide evidence of an additional boost from amplitude modulation.

      We posted a second bioRxiv preprint in May, 2023 (Version 2) with the corrected results for Experiment 2, along with changes throughout the manuscript given the new analyses.

      Given the null results for the AM conditions, we decided to run a third experiment prior to submitting the work for publication. Here we used an alternative form of amplitude modulation (see Kasten et. al., NeuroImage 2018). In brief, we again observed a boost in cortical excitability in from non-modulated kTMP at 3.5 kHz, but no additional effect of amplitude modulation. This work is included in the third bioRrxiv preprint (Version 3), the paper that was submitted and reviewed at eLife.

      (2) Statistical analysis.

      Reviewer 1 raised a concern with the statistical analyses performed on aggregate data across experiments. We recognize that this is atypical and was certainly not part of an a priori plan. Here we describe our goal with the analyses and the thought process that led us to combine the data across the experiments.

      Our overarching aim is to examine the effect of corticospinal excitability of different kTMP waveforms (carrier frequency and amplitude modulated frequency) matched at the same estimated cortical E-field (2 V/m). Our core comparison was of the active conditions relative to a sham condition (E-field = 0.01 V/m). We included the non-modulated 3.5 kHz condition in Experiments 2 and 3 to provide a baseline from which we could assess whether amplitude modulation produced a measurable difference from that observed with non-modulated stimulation. Thus, this non-modulated condition as well as the sham condition was repeated in all three experiments. This provided an opportunity to examine the effect of kTMP with a relatively large sample, as well as assess how well the effects replicate, and resulted in the strategy we have taken in reporting the results.

      As a first step, we present the data from the 3.5 kHz non-modulated and sham conditions (including the individual participant data) for all three experiments in Figure 4. We used a linear mixed effect model to examine if there was an effect of Experiment (Exps 1, 2, 3) and observed no significant difference within each condition. Given this, we opted to pool the data for the sham and 3.5 kHz non-modulated conditions across the three experiments. Once data were pooled, we examined the effect of the carrier frequency and amplitude modulated frequency of the kTMP waveform.

      (3) Carry-over effects

      As suggested by Reviewer 1, we will examine in the revision if there is a carry-over effect across sessions (for the most part, 2-day intervals between sessions). For this, we will compare MEP amplitude in baseline blocks (pre-kTMP) across the four experimental sessions.

      Reviewer 1 also commented that mixing the single- and paired-pulse protocols might have impacted the results. While our a priori focus was on the single-pulse results, we wanted to include multiple probes given the novelty of our stimulation method. Mixing single- and different paired-pulse protocols has been relatively common in the noninvasive brain stimulation literature (e.g., Nitsche 2005, Huang et al, 2005, López-Alonso 2014, Batsikadze et al 2013) and we are unaware of any reports suggested that mixed designs (single and paired) distort the picture compared to pure designs (single only).

      (4) Sensation and Blinding

      Reviewer 2 bought up concerns about the sham condition and blinding of kTMP stimulation. We do think that kTMP is nearly ideal for blinding. The amplifier does emit an audible tone (at least for individuals with normal hearing) when set to an intensity to produce a 2 V/m E-field. For this reason, the participants and the experimenter wore ear plugs. Moreover, we played a 3.5 kHz tone in all conditions, including the sham condition, which effectively masked the amplifier sound. We measured the participant’s subjective rating of annoyance, pain, and muscle twitches after each kTMP session (active and sham). Using a linear mixed effect model, we found no difference between active and sham for each of these ratings suggesting that sensation was similar for active and sham (Fig 8). This matches our experience that kHz stimulation in the range used here has no perceptible sensation induced by the coil. To blind the experimenters (and participants) we used a coding system in which the experimenter typed in a number that had been randomly paired to a stimulation condition that varied across participants in a manner unknown to the experimenter.

      Reviewer 1 asked why we did not explicitly ask participants if they thought they were in an active or sham condition. This would certainly be a useful question. However, we did not want to alert them of the presence of a sham condition, preferring to simply describe the study as one testing a new method of non-invasive brain stimulation. Thus, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02352

      Corresponding author(s): Elise, Belaidi

      1. General Statements

      We would like to thank the reviewers for their constructive suggestions and comments. We hope that the “point by point answer” and the revision plan proposed below will convince the reviewers and the editor.

      We thank the reviewer 1 for his/her comments. We would like to specify that the involvement of HIF-1 in IH-induced mitochondrial remodeling has indeed been initiated by RNA-seq analysis and confirmed in a cell-based model as well as in wild-type and HIF-1a+/- heterozygous mice subjected to intermittent hypoxia (IH). In vivo, we originally demonstrated that Metformin reversed IH-induced increase in myocardial infarct size through AMPKa2 and, we proposed that metformin could modify HIF-1 activity. Then, we validated our hypothesis in an in vitro model allowing to demonstrate that Metformin, by increasing HIF-1a phosphorylation decreases its activity. We acknowledge that we used several models and this is the reason why we detailed as much as possible the Materials and Methods section including all models, experimental sets designed and methods details. We hope that the point by point response that we made for the reviewer 1 will increase the clarity of our work and we hope that the new results provided will strengthen the evidences concerning the mechanisms by which metformin can inhibit and modulate the deleterious impact of HIF-1 on IH-induced an increase in myocardial infarct size.

      We thank the reviewer 2 for his/her conclusion highlighting that “our work opens new avenues for exploring the potential effects of metformin as a modulatory of HIF-1𝛂 activity in obstructive sleep apnea syndrome”. We hope that the clarifications and/or justifications brought will convince him/her.<br /> We thank the reviewer 3 for having underlined that “metformin induces HIF-1α phosphorylation, decreases its nuclear localization and subsequently HIF-1 transcriptional activity are very much interesting” and for having highlighting that “our study is convincing”. We hope that the justification and the corrections brought in the point by point answer will convince him/her.

      Alltogether, as underlined by the 3 reviewers, our study is very interesting for translational science in the fields of cardiovascular, respiratory and sleep medicine. We hope that the point by point answer and the revision plan proposed will allow the publication of our article in EMBO Molecular Medicine.

      2. Description of the planned revisions

      • *

      Please find below the revision that we plan to address to answer to the questions of the reviewer 1.


      Figure 1 - it would be helpful to list all of the DEGs (what genes are changed?). Including the expression of HIF-1α and PHD isoforms would be informative. If there is a robust HIF-1α signal, changes in the expression of HIF and PHD isoforms would be anticipated. Fig 1F - with regards to glycolysis and hypoxia pathway analysis, most of the DEGs are not canonical HIF-1α/hypoxia targets.

      Figure 1 aimed at better understanding and manipulate the well-recognized involvement of HIF-1 in response to our specific IH stimulus (Semenza, Physiology 2009; Belaidi, Pharmacol & Ther. 2016). __The results provided by the RNA-seq analysis shows that IH induces cardiac oxidative and metabolic stress which are inter-related with HIF-1 activation. __We did not claim that these genes are HIF-1 targets genes. The RNA seq analysis did not allow to reveal HIF-1a and PHD1-3 transcript as the most dysregulated genes of the panel. In case of publication, bulk data and DEGS will be provided in an online file. We agree with the reviewer that the list of the 40 up and down-regulated genes would be very informative and would increase the value of the paper. Thus, we plan to add the name of the 40 up and down-regulated genes on Figure 1B.

      Figure 5G-I, show cytoplasmic HIF1a as well as nuclear.

      Alternatively, why not use IHC for subcellular localization?

      We think that the comments of the reviewer 1 concern Fig.4G-I and not 5G-I. In this figure, we showed that IH increases nuclear HIF-1____a____ expression compared to N condition and that this IH-effect is abolished in mice treated with Metformin, suggesting that, upon IH, Metformin impacts HIF-1__a __nuclear content and subsequently, its activity. The nuclear localization of HIF-1a is the most relevant mean to indicate its activation. We agree with the reviewer that IHC also allows for the indication of the nuclear localization of HIF-1a. Indeed, we previously performed IHC on nuclear HIF-1a localization and demonstrated that IH increased HIF-1a nuclear localization by IHC that was corroborated by Western-blot (Moulin S, TACD, 2020). Western-blot and IHC are both semi-quantitative techniques with different process of analyses. In this study, we choose Western-blot because we have the material to perform this technique and because IHC is associated with an analysis process (size of a slice, areas to analyze, colorimetry…) that is more complex than the analysis process of Western-blot (densitometry solely).

      While the nuclear localization of HIF-1a is the most relevant mean to indicate its activation; it could be interesting to see that HIF-1a cytosolic content was neither modify by IH nor by Metformin. This would also corroborate the results of the RNA-seq that did not demonstrate any difference in DEGs of HIF-1a or of other members of the HIF family. This would also confirm that Metformin plays a major role on HIF-1 activaty regulation (and not transcription) in the context of IH.

      Thus, we plan to perform a Western-blot of HIF-1a on cytosolic extracts of hearts from mice exposed to N or IH and treated or not with Metformin. These extracts are already available and Western-blot would be performed and replicated in 3 weeks. We could also provide a Western-blot in order to show the purity of our extraction protocol (nucleus vs cytosol).

      Figure 5F, it would be important to show the levels of expression of HIF1a in these experiments. Are there positive and negative controls that the authors could use for HIF21a activity in this experiment?

      In our manuscript, we aimed at demonstrating that Metformin decreases HIF-1 activity in a context of strong HIF-1____a____expression and/or stabilizion those mimics what happens after chronic IH in mice (Belaidi E, Int J Cardiol 2016, Moulin S, Ther Adv Chronic Dis 2020) and in apneic patients (Moulin S, Can J Cardiol 2020). Thus, we used a transfection allowing to overexpress HIF-1a that is one of the best means to increase HIF-1 activity. In the Figure 1 below, HA-HIF-1α-WT Addgene AmpR and 5 HRE GFP AmpR plasmids co-transfection induced a decrease in H9c2 viability and an increase in GFP-positive cells that were not observed in H9c2 transfected with pcDNA 3.1 HA-C AmpR (negative control). __This validates our in vitro model as a good positive control to mimic IH consequences. __ However, we agree with the reviewer that we could add a supplemental figure or a panel demonstrating that our transfection induced an increase in HIF-1a expression. Thus, we will perform a Western-blot targeting HIF-1a on H9c2 transfected with the control plasmid (pcDNA 3.1 HA-C AmpR) or the plasmid allowing the overexpression of HIF-1a (HA-HIF-1α-WT AmpR). This work would be performed in 2 months.

      Moreover, we already improved the lisibility of the Figure 5F to clarify the experimental conditions (table inserted under the graphic); we also completed the Materials and Methods section to specify the plasmid used (modifications are in red in the manuscript).

      Figure to see on the downloaded file.



      Figure 1 : GFP fluorescence in H9c2 cells transfected with pcDNA 3.1 HA-C AmpR (control condition) or HA-HIF-1α-WT AmpR (positive control, overexpression of HIF-1a) and 5 HRE-GFP AmpR plamids and treated with CoCl2 (1mM, 2h); magnification x100.

      • *

      This paragraph concerns only the point 4 of the fifth question.

      In these experiments, as well as subsequent studies, it would be very informative to use a specific AMPK activator e.g. MK-8772, to compare with metformin. It is well known that metformin has a number of other targets in addition to AMPK.

      We agree with the reviewer that metformin has pleiotropic effect. Very interestingly, we demonstrated that the reduced-infarct size is not related to the metabolic systemic effect of metformin since it failed to improve the IH-induced insulin resistance while it improves the answer to insulin in normoxic mice (supplemental Figure S3B). This demonstrates that in our model, the cardioprotective effects of Metformin are independent of a potential systemic effect. Then, we demonstrated that metformin protects the heart against ischemia-reperfusion through AMPK____a____2 activation by using AMPK____a____2KO exposed to IH in which Metformin failed to decrease infarct size (Fig.4N). MK-8772 is not widely used in vivo models. Moreover a recent study indicates that chronic treatment with MK-8772 (14 days 1 month in mice and rats, respectively) induces cardiac hypertrophy characterized by an increase in heart weight (Myers R, Science 2017). In vivo experiments with MK-8772 would be not clinically relevant as the use of metformin that is already used in clinic. However, in order to improve the mechanistic investigation concerning the role of AMPKa2 activation on inhibiting HIF-1 activity, we propose to perform the in vitroexperiments performed in Figure 5 with a specific allosteric small-molecule activator of AMPKa2 such as 991.

      We plan to:

      -Expose H9c2 to CoCl2 and treat them with 991 in order to measure HIF-1a phosphorylation.

      -Transfect H9c2 with our plasmids HA-HIF-1α-WT AmpR and 5 HRE GFP AmpR and treat them or not with 991 in order to measure HIF-1 activity (GFP fluorescence). These experiments would be performed in 2 months.

      __Please find below the revision that we plan to address to answer to the second question of the reviewer 2. __

      • *

      2) The WB images were cut and pasted. Please add the original images

      We acknowledge the reviewer's comment and will address it by submitting a supplementary file containing the uncropped immunoblot images. Since this file already exists, our plan is to standardize it by providing, for each slide (immunoblot), all relevant information pertaining to our experiments, including groups, molecular weight markers, cutting, membrane stripping, and other pertinent details.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      • *

      __Please find below the answers or the revisions that have already been incorporated in response to the comments of the reviewer 1. Please note that we provided new results and new figures at the discretion of the reviewer, but we are ready to insert them as figures or supplemental figures in a new revised manuscript if the reviewers and the editor think that it would improve our message. __

      Was mitochondrial content in the hearts after IH experiment measured e.g. mtDNA measurements? IH results in mitochondrial dysfunction/reduced mitochondrial content. It would have been good to show mitochondrial dysfunction by doing basic functional experiments (e.g. TMRM/MitoROS imaging etc.) by isolating cardiomyocytes from the N and IH experiments.

      We thank the reviewer for these questions about the mitochondrial function and content. The impact of IH on mitochondrial function has already been demonstrated in heart (Moulin S, Antioxydants 2022, Wei Q, Am J Physiol 2012). __Indeed, we previously showed that mitochondria isolated from hearts of mice exposed to IH had a decrease in maximal respiration in complex I and II that was not observed in HIF-1_a_+/- ____mice (Moulin S, Antioxidants 2022), indicating that HIF-1 is responsible for IH-induced mitochondrial dysfunction. __

      Figure 4 shows that Metformin abolished IH-induced mitochondrial remodeling similar to what we observed in HIF-1a+/-(Figure 2). This means that treating with Metformin or partially deleting the gene encoding for HIF-a induce the same impact on IH. Then, we demonstrated that Metformin can control HIF-1 activation and we concluded that metformin could be cardioprotective through inhibiting HIF-1 activation and subsequent mitochondrial stress and remodeling. In this study, we focused on the effects of Metformin on HIF-1 and we did not aim at directly test the effect of metformin on mitochondrial function. Actually, metformin exhibits biphasic effects on bioenergetics of cardiac tissue depending on the modality of administration (i. e. single injection, time of administration during an ischemia-reperfusion procedure); the dose administered and the tissue studied (i. e. hiPSC-CMs, isolated mitochondria…) (Emelyanova N, Transl Res 2021). But, we collected some data that we would like to submit at the discretion of the reviewer. Using oximetry, we measured maximal respiration in complex 1 and 2 on isolated mitochondria from hearts of mice exposed to N, IH and treated or not with metformin during the exposure__. While we observed that IH decreases maximal respiration in complex 1 and 2, we did not find any effect of metformin on mitochondrial respiration alteration induced by IH (Figure 2A, B). Using spectrofluorometry, we measured the mitochondrial membrane potential using TMRM; __we did not find any modification of membrane potential in IH or Metformin-treated mice (Figure 2C). Because we previously did not observe any impact of IH on mtDNA/gDNA ratio (Figure 2D), we did not test metformin on this parameter.

      To conclude, we think that these results are not directly in the scope of our work but if the reviewer thinks that they deserve to be discussed, we could add them in a supplementary figure.

      Please, see the figure on the dowloaded file

      Figure 2 : Mice were exposed to 21 days of Normoxia (N) or Intermittent Hypoxia (IH) (1-min cycle of FiO2 5%-21%) and treated with vehicle (Vh, CmCNa 0.01%, 0,1ml.10g-1) or Metformin (Met, 300mg.kg-1.d-1). (A, B) Mitochondrial function __was measured by oximetry with sequential addition of substrate (state 2), ADP (200mM, state 3, maximal respiration) and oligomycin (12.5 mM, state 4) ; quantification of O2 consumption for NADH-linked mitochondrial respiration (complex I-glutamate-malate, GM, 20mM) (A), and for FADH2-linked mitochondrial respiration (complex II-succinate, S, 5mM, in presence of complex I inhibition by rotenone, 6.25mM) (B) (n=8). (C) Mitochondrial membrane potential measured by spectrofluorometry after Tetramethylrhodamine Methyl Ester (TMRM, 0.2mM) in presence of GM (basal condition), maximal respiration (ADP) and uncoupling condition Carbonyl cyanide 4-(trifluoromethoxy)phenylhydrazone, FCCP, 3mM); fluorescence intensity is expressed relative to fuorescence at baseline (before GM) (n=8). (D__) Mitochondrial content assessed by the expression of mitochondrial DNA (mtDNA, COX1) relative to genomic DNA (gDNA, ApoB) measured after PCR (n=6); *p

      Fig 2A-D - CoCl2 is not a good model to mimic hypoxia due its effect on disrupting iron homeostasis in cells, which can mean that some of the effects are due to changes in iron levels and not HIF stabilisation.

      The capacity of CoCl2 to chelate iron is the main property of CoCl2 that we used in order to stabilize HIF-1____a____. Actually, prolyl-4-hydroxylases need Fe2+ to hydroxylate HIF-1____a____ and induce its degradation. __Then, intermittent hypoxia (IH) is characterized by very rapid changes in PO2. This stimulus was designed to reproduce sleep apnea syndrome and its associated disorders (i. e. insulin-resistance, hypertension, increase in myocardial infarct size). This model was firstly developed and validated in rodents (Dematteis M, ILARJ 2008, Belaidi E, Eur Resp Rev 2022, Harki , Eur Resp J 2022). Compelling evidence indicate that the involvement of HIF-1 in IH-deleterious consequences is related to the repetitive phases of oxygenation and especially to IH-induced oxidative stress (Semenza Physiology 2009, Belaidi Pharmacol. & Ther 2016). In order to increase the level of mechanistic insights on HIF-1, we next attempted to optimize in vitro models. A device was developed by Minoves et al. (Minoves M, Am J Physiol, 2017) to expose endothelial and cancer cells to IH. __However, as illustrated below, this device does not mimic efficient rapid hypoxia-reoxygenation cycles able to induce cardiac cell death (Figure 3A). However, CoCl2 decreases H9C2 viability by 60% (Figure 3B) that is associated with a sustained stabilization of HIF-1____a (Figure 3C,D). Thus, we choose this in vitro model as it replicates cardiac cell death and HIF-1_a_overexpression or stabilization which we similarly observe in our in vivo model and in apneic patients (Moulin S, Can J Cardiol 2020).

      Please see the figure on the dowloaded file

      Figure 3 : (A-B) Cell viability measured by 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl-2H- tetrazolium bromide (MTT) of H9c2 cells exposed to 6 hours (h) of repetitive cycles of Intermittent hypoxia (2 minutes (min) PO2 16% - 2 min PO2 2%) (n=3) (A) or treated with CoCl2 (1mM, 2h) (n=6) (B). (C-D), __quantification of __total ____HIF-1____𝛂 expression relative to tubulin (C) and representative image of Wetsern-blot (D) (n=2-3); *p****p*

      Fig. 2K, change in BNIP3 expression is modest, but change in Parkin is very dramatic. BNIP3 is a HIF-1α target but Parkin is not, so it is plausible that mitophagy could be occurring through a HIF-1α independent mechanism.

      Fig. 2K is a representative panel of 2-3 independent experiments. The quantification reported in Figures 2I and 2J demonstrated a significant decrease in BNIP3 and Parkin expressions in HIF-1a heterozygous mice exposed to Intermittent Hypoxia (IH) compared to HIF-1a+/+ mice exposed to IH. While we acknowledge that only BNIP3 is a direct target of HIF-1, the role of HIF-1 in IH-induced auto/mitophagy is demonstrated by our experiments performed in HIF-1____a____heterozygous mice. This shows an important role for HIF-1 without excluding any impact of HIF-1a independent mechanisms.

      What are we meant to be looking at in Fig 2L?

      Figure 2L aims at illustrating mitochondrial remodeling under IH. Stars indicate that mitochondria have abnormal fate in IH conditions and arrows point autophagosomal membrane and formation. This figure was magnified to be clearer (please see new Figure 2L).

      Figure 3B-C, the reduction in pT172 on AMPK is modest. It would be good to include pACC as a downstream target for AMPK.

      As recommended by the reviewer, we inserted in the manuscript the quantification of 79Ser-P-ACC/ACC western-blot as well as a representative image of the Western-blot (See new Figure 3). We also modified the legend of the figure. 79Ser-P-ACC is an important target of AMPK; however, in our experimental conditions, its phosphorylation is not associated to the decrease in AMPK phosphorylation. This could be explained by many points. First, Metformin was administered every day and hearts were harvested 24 hours later after the last administration. Most studies demonstrating a modification of AMPK and ACC phosphorylation are experiments performed in vitro or directly (less than 1 hour) after a single dose of Metformin administration. In the context of myocardial ischemia-reperfusion, Yin et al. showed an increase in P-AMPK/AMPK directly after Metformin treatment without showing any data on P-ACC/ACC (Yin M; Am J Physiol 2011); similar data were published in models of chronic cardiac diseases (Soraya H, Eur J Pharmacol, Gundewar S, Circ Res 2009). Second, in line with the previous explanation, the lack of effect of metformin on P-AMPK and/or P-ACC in rodent models could be explained by its rapid distribution (Sheleme T Clin Pharmacokinetics of Metformin 2021) and its short half-life that is around 3.7 hours in mice (Junien N, Arch Int Pharmacodyn Ther 1979).

      To conclude, since we performed all our analysis 24h after the last treatment and exposure to hypoxia, we argue that the slight but significative decrease in AMPK phosphorylation that we observed in our study highlight a robust impact of chronic IH. However, this would be elegant to confirm this result by measuring AMPK through its phosphorylation capacity (Cool B, Cell Metab 2006, Ducommun S, Am J Physiol 2014). We already sent hearts from mice exposed to Normoxia or Intermittent Hypoxia to Luc Bertrand’s lab (IREC, Belgium) where they used to perform this assay.

      Fig. E-G, show data for mice treated with vehicle.

      In Figure 4 I-J­­, we demonstrated that Metformin significantly decreases infract size in IH condition only and this validates our main hypothesis regarding the specific beneficial effect of this drug in the context of chronic IH. __In order to show that the cardioprotective effect of Metformin is relative to AMPKa2 activation, we first showed that 79Ser-PACC/ACC, one of the main downstream targets of AMPKa2 was increased (Fig. E-G). We did not find it necessary to does not exhibit cardioprotective effects. However, as shown in Figure 3 below, __Metformin also increases 79Ser-PACC/ACC in Normoxic mice validating the treatment. Thus, in normoxic conditions, AMPK____a____2 activation does not exert any cardioprotective effect. We acknowledge that this reinforces our result about the specificity of AMPK____a____2 activation by Metformin under chronic IH condition. We could add this Figure in supplemental results.

      Please, see the figure on the dowloaded file

      Figure 4 : AMPK activation in Normoxic mice treated with vehicle (Vh, CmCNa 0.01%, 0,1ml.10g-1) or __ __metformin (Met, 300mg.kg-1.d-1 : __ __172Thr-P-AMPK/AMPK (A) and 79Ser-P-ACC/ACC (B) ratio and representative image of Western-blot (C) (n=3-6); *p

      Fig. 3K - what cre is used for the a2 KO mice?

      As written in the Materials and methods section, AMPKa2KO mice are not inducible Knock-out mice. Constitutive AMPKα2 knockout mice were kindly generated by Benoit Viollet (Viollet B, JCI, 2003).

      Include normoxia data for the a2 KO mice studies.

      The question of the reviewer concerning the cardioprotective effects of metformin is interesting but is not aligned with the objectives of the study. Indeed, we did not treat normoxic mice with Met for several reasons. First, the objective of the study was to find a cardioprotective strategy against IH-induced an increase in infarct size. Second, Fig. 3I shows that Met significantly reduced infarct size upon IH only; this suggests that AMPK____a____2 activation is specifically involved in IH-induced increase in infarct size but not in reducing infarct size in normoxic mice. Moreover, the beneficial impact of metformin in standard models of myocardial ischemia-reperfusion is controversial and has been extensively discussed (Foretz et al. Cell Metab. 2014). Overall, using AMPKa2 mice was legitimated in the context of IH only. We validated the involvement of AMPKa2 in the cardioprotective effect of metformin especially in IH conditions.

      Figure 5G, what is the rationale for switching to CoCl2 in the mice to prove metformin reduced HIF-1α expression? Why not use reduced O2 tension in mice.

      We respectfully disagree with the reviewer since mice were exposed to N and IH and treated or not with Metformin to demonstrate that this drug abolished IH-induced increase in HIF-1a nuclear expression (Figure 4 H, I). The same model was used in Figure 3 to demonstrate the impact of Metformin on infarct size. Fig. 5G was conducted to demonstrate the potential link between AMPKa2 and HIF-1a phosphorylation in basal conditions of AMPKa2 content or in absence of AMPKa2 (AMPKa2-/- mice). The single presence of AMPK____a____2 demonstrates an increase in HIF-1____a____ phosphorylation if its stabilization is increased by CoCl2; this was not observed in AMPK____a____2-/- mice highlighting that AMPK____a____2 plays an important role in HIF-1____a phosphorylation.


      Please find below the answer to the first question asked by the reviewer 2.

      1) Why did authors choose the IH protocol illustrated in Fig. S1A

      The choice of the hypoxic stimulus was based on literature and mainly on our recognized expertise in preclinical studies aiming at better understanding obstructive sleep apnea syndrome (OSA); a chronic pathology associated with several comorbidities such as diabetes, hypertension… We are conscious that the hypoxic stimulus used in this study is very severe, with a nadir arterial oxygen saturation (SaO2) around 60%. However, this experimental design is required to induce detrimental cardiovascular effects __in the absence of any confounding factors (i.e., obesity) or genetic susceptibility for complications (i. e. genetic susceptibility to hypertension) (Dematteis M, ILARJ 2008). Especially in the context of myocardial infarction, exposing rodents to 14 to 21 days of IH at 5% and subjected them to a myocardial ischemia-reperfusion protocol allows us to reproduce the increase in infarct size in rats (Belaidi E, J. Am. Coll. Cardiol., 2009; Bourdier G, Am J Physiol, 2016) and in mice (Belaidi E, Int J Cardiol 2016; Moulin S, Can J Cardiol 2020) similar to what has been observed in apneic patients (Buchner S, EHJ 2014). __Moreover, we recently conducted a meta-analysis based on 23 preclinical studies aiming at investigating the impact of the IH pattern (duration, FiO2, repetition of cycles…) on infarct size and cardiomyocyte death (Belaidi E, Eur. Resp. J 2022). We showed that IH significantly increases infarct size when IH is applied several days (especially 14 to 21 days) and when FiO2 is around 5%; whereas IH decreases infarct size when it is applied a single day at a FiO2 at 10%. This meta-analysis provided the confirmation that we need to apply a chronic and severe stimulus to reproduce an increase in infarct size that is observed in apneic patients which are exposed every day, during several days to a decrease in SaO2. If the reviewers and the editor consider that this point should be discussed in the discussion section, we will be happy to include it.

      • *

      Please find below the answers or the revisions that have already been incorporated in response to the comments of the reviewer 3. They appear in red in the new manuscript except the modifications performed in the “references” section which appear in black.

      1 The authors used H9c2 rat cardiac cells in vitro experiments although they used mouse model in vivo experiments. Using mouse P19.CL6 cardiac cells instead of rat H9c2 cells may much clearer. Why the authors did not use P19.CL6 cells should be explained.

      We thank the reviewer for his/her suggestion. P19CL6 cell line has been isolated from pluripotent P19 embryonal carcinoma (EC) cells after long term culture under conditions for mesodermal differentiation (Habara-Ohkubo A, Cell Struc Funct 1996). Therefore, these cells are mostly used to ____study the differentiation of cardiac muscle. Indeed, they were recognized to avoid large variations in the differentiation rates which were extensively reported (Mueller I, J Biomed Biotechnol 2010). To our knowledges no ventricular non-beating mice cell line. In this study, we used H9c2 which are extensively used and recognized as a gold standard cellular model to study the biology of cardiomyocytes including mechanisms involved in cardiac ischemia-reperfusion injury (Paillard M, Circulation, 2013; Zhang G Circulation 2021__), cardiac hypertrophy__ (Zhang N, Cell Death Diff 2020; Hu H, Cardiovasc Res 2020), intra-organites calcium exchanges (Moulin S, Antioxydants, 2022, Paillard M, Circulation, 2013) as drug testing (Beshay NM, J Pharm and Tox Methods, 2007). Recently, H9c2 and P19.CL6 were exposed to intermittent hypoxia (70 cycles of FiO2 1% (5 min) - FiO2 21% (10%)) in order to “mimic OSA” and investigate the transcription level of a pool of genes. The authors show some similiraties and differences of mRNA expression between the two cell lines that, indeed, could be attributed to variations in the cell origin (Takasawa S, IJMS 2022). However, in this study, there are no experiment allowing to assess the state of cardiac cells (apoptosis, life, metabolism, remodeling) questioning the pathophysiologic transposability of the model. Moreover, the number of experiments conducted on H9C2 (pubmed references : 7000 vs 100 for P19.CL6) to understand the mechanism involved in acute and chronic cardiac pathologies makes our choice confident and relevant.

      2 The authors described, "The protocol was approved by the French minister (APAFIS#23725-2020012111137561.v2)." in Animals (page 16) without showing approval date, the authors should clearly show the approval date together with their approval numbers.

      We added the approvement date in the materials and methods section. It was approved on February 20, 2020.

      3 In Figure 2 L, scale bar(s) should be added because figures are magnified and/or reduced by printer.

      We agree with the reviewer that scale bars were not visible; we highlighted them.

      4 In Figure 3J and 3N, scale bar(s) should be added.

      Scale bars have been now added on figures 3J and 3N. All pictures were acquired at the maximal zoom of a camera placed at an equal distance from the slices. Then, analyses were performed, slice per slice, with Image J with the same zoom (x5 to get an image at 100%). In this context, scale was added based on a photo of slice taken close to a ruler.

      5 In introduction, "HIF-1" should be changed to "hypoxia-inducible factor 1 (HIF-1)".

      Thank you, we replaced HIF-1 by Hypoxia Inducible Factor-1 (HIF-1) in the introduction section.

      6 In Results, "Angpt1, Txnip, Nmrk2, Nuak1 or Pfkfb1" should be changed to "Angiopoietin 1 (Angpt1), Thioredoxin-interacting protein (Txnip), Nicotinamide riboside kinase 2 (Nmrk2), NUAK family SNF1-like kinase 1 (Nuak1) or 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 1 (Pfkb1)".

      We did the modification and let the abbreviation in italic since it concerns genes name.

      7 In Chronic intermittent hypoxia, "FiO2" should be changed to "FiO2".

      Thank you, we changed it in the materials and methods section.

      8 In Western-blot, "Bio-Rad, California, USA" should be changed to "Bio-Rad, Hercules, CA".

      Thank you, we did it.

      9 In Western-blot, what "tubulin" (α-tubulin or β-tubulin) should be clarified.

      We agree with the reviewer that this point should be specified; α-tubulin was stained, we added the “α”.

      As mentioned below, we have done all the modifications required by the reviewer in the references section.

      10 In Ref. 8, "Antioxidants (Basel) 11 (2022)" should be changed to "Antioxidants (Basel) 11, 2326 (2022)".

      Done

      11 In Ref.10, "Pharmacol Ther (2016)" should be changed to "Pharmacol Ther 168, 1-11 (2016)".

      Done

      12 In Ref.13, "Antioxidants (Basel) 11 (2022)" should be changed to "Antioxidants (Basel) 11, 1462 (2022).

      Done

      13 In Ref. 21, "Diabetes (2017)" should be changed to "Diabetes 66, 2942-2951 (2017)".

      Done

      14 In Ref. 28, "Adv Biol (Weinh), e2300292 (2023)" should be changed to "Adv Biol (Weinh), 8, 2300292 (2023)".

      Done

      15 In Ref. 37, "J Am Heart Assoc 6 (2017)" should be changed to "J Am Heart Assoc 6, e006680 (2017)".

      Done

      16 In Ref. 41, "Eur Respir Rev 32 (2023)" should be changed to "Eur Respir Rev 32, 230083 (2023)".

      Done

      17 In Ref. 46, "Int J Mol Sci 22 (2020)" should be changed to "Int J Mol Sci 22, 268 (2021)".

      Done

      18 In Ref. 47, "Int J Mol Sci 21 (2020)" should be changed to "Int J Mol Sci 21, 2428 (2020)". Done

      4. Description of analyses that authors prefer not to carry out

      __Please find below the answers to the comment of the reviewer 3 that we cannot provide and that is not in the scope of the study. __

      • *

      Figure 5, multiple phosphorylation sites have been identified on HIF1a. What is the nature of the Thr/SerP-HIF1a antibody? It would be far more preferable (essential?) to identify the site(s) within HIF1a that are phosphorylated by AMPK.

      The antibody was provided by Cell Signalling, ref. 9631. Phospho-(Ser/Thr) Phe Antibody detects phospho-serine or threonine in the context of tyrosine, tryptophan or phenylalanine.

      The identification of the Phosphorylation sites will require a long-time consuming phosphoproteomic analysis and subsequent functional validation in vivo and in vitro (directed mutagenesis, knock-in mice, …) which are out of the scope of our paper.

    1. The document provides a fascinating overview of the technologies available to monitor personal health and sports performance, showing how the sector is rapidly evolving towards increasingly intelligent and personalized solutions. Exploring a wide range of devices and mobile applications designed to record and analyze physiological and biomechanical parameters, we highlight how we are moving from simple measurement of standard vital signs to sophisticated predictive algorithms based on population information and artificial intelligence.

      Personally, I believe it is crucial to highlight the importance of independent validation of these technologies. It is worrying to know that many of them have not yet been adequately tested, raising questions about their effectiveness and usefulness for the average consumer. In a world where we are increasingly surrounded by data and devices, it is essential that the technologies we choose to use are reliable and based on solid scientific evidence.

      Analyzing specific devices such as the Zephyr Sensor, E4 Wristband, and Reign Active Recovery Band offers a detailed look at their capabilities and potential benefits. Furthermore, the reference to the Mio SLICE™ bracelet, with its "Personal Activity Score" calculated via an algorithm, is particularly interesting. The clinical study that demonstrated a reduction in the risk of cardiovascular disease for those with a score ≥100 is tangible evidence of the benefits these technologies can offer when tested properly.

      Finally, discussion of sleep and cardiorespiratory monitoring technologies, such as the OURA ring and Fitbit Charge2™, highlights current challenges and limitations. Personally, I think sleep tracking is one of the most interesting aspects of these technologies, as rest is critical to overall health and physical performance. However, it is important to be aware of limitations in the precision and interpretation of the data collected, as this may affect our confidence in the results provided.

      Overall, the document offers a comprehensive overview of an ever-expanding sector. As technologies for monitoring health and sports performance continue to evolve, it is essential to stay informed about their potential and limitations. Innovation is exciting, but careful research and development is essential to ensure that these technologies are truly useful and reliable for improving our lives and performance.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Public Reviewer Comments

      We again thank the reviewers for the time and effort they clearly put into reviewing our manuscript. We have revised our manuscript to take into account the majority of their suggestions, primary among them being refinements of our model and classification approach, detailed sensitivity analysis of our model, and several new simulations. Their very constructive feedback has resulted in what we feel is a much-improved paper. In what follows, we respond to each of their points.

      Reviewer #1:

      COMMENT: The reviewer suggested that our control policy classification thresholds should be increased, especially if the behavioral labels are to be subsequently used to guide analyses of neural data which “is messy enough, but having trials being incorrectly labeled will make it even messier when trying to quantify differences in neural processing between strategies.”

      REPLY: We appreciate the observation and agree with the suggestion. In the revised manuscript, we simplified the model (as another reviewer suggested), which allowed for better training of the classifier. This enabled an increase in the threshold to 95% to have more confidence in the identified control strategies. Figures 7 and 8 were regenerated based on the new threshold.

      COMMENT: The reviewer asked if we could discuss what one might expect to observe neurally under the different control policies, and also suggested that an extension of this work could be to explore perturbation trials, which might further distinguish between the two control policies.

      REPLY: It is indeed interesting to speculate what neural activity could underlie these different behavioral signatures. As this task is novel to the field, it is difficult to predict what we might observe once we examine neural activity through the lens of these control regimes. We hope this will be the topic of future studies, and one aspect worthy of investigation is how neural activity prior to the start of the movement may reflect two different control objectives. Previous work has shown that motor cortex is highly active and specific as monkeys prepare for a cued movement and that this preparatory activity can take place without an imposed delay period (Ames et al., 2014; Cisek & Kalaska, 2005; Dekleva et al., 2018; Elsayed et al., 2016; Kaufman et al., 2014; Lara et al., 2018; Perich et al., 2018; Vyas et al., 2018; Zimnik & Churchland, 2021). It seems possible that the control strategies we observed correspond to different preparatory activity in the motor cortex. We added these speculations to the discussion.

      The reviewer’s suggestion to introduce perturbations to probe sensory processing is very good and was also suggested by another reviewer. We therefore conducted additional simulations in which we introduced perturbations (Supplementary Material; Figure S10). Indeed, in these model simulations the two control objectives separated more. However, testing these predictions via experiments must await future work.

      COMMENT: “It seems like a mix of lambda values are presented in Figure 5 and beyond. There needs to be some sort of analysis to verify that all strategies were equally used across lambda levels. Otherwise, apparent differences between control strategies may simply reflect changes in the difficulty of the task. It would also be useful to know if there were any trends across time?”

      REPLY: We appreciate and agree with the reviewer’s suggestion. We have added a complementary analysis of control objectives with respect to task difficulty, presented in the Supplementary Material (Figures S7 and S8). We demonstrate that, overall, the control objectives remain generally consistent throughout trials and difficulty levels. Therefore, it can be concluded that the difference in behavior associated with different control objectives does not depend on the trial sequence or difficulty of the task. A statement to this extent was added to the main text.

      COMMENT: “Figure 2 highlights key features of performance as a function of task difficulty. …However, there is a curious difference in hand/cursor Gain for Monkey J. Any insight as to the basis for this difference?”

      REPLY: The apparently different behavior of Monkey J in the hand/cursor RMS ratio could be due to subject-to-subject variability. Given that we have data from only two monkey subjects, we examined inter-individual variations between human subjects in the Supplementary Material by presenting individual hand/cursor gain data for all individual human subjects (Figure S1). As can be seen, there was indeed variability, with some subjects not exhibiting the same clear trend with task difficulty. However, on average, the RMS ratio shows a slight decrease as trials grow more difficult, as was earlier shown in Figure 2. We added a sentence about the possibility of inter-individual variations to address the difference in behavior of monkey J with reference to the supplementary material.

      Reviewer #2:

      (Reviewer #2's original review is with the first version of the Reviewed Preprint. Below is the authors' summary of those comments.)

      COMMENT: The reviewer commends the care and effort taken to characterize control policies that may be used to perform the CST, via dual human and monkey experiments and model simulations, noting the importance of doing so as a precursor to future neural recordings or BMI experiments. But the reviewer also wondered if it is all that surprising that different subjects might choose different strategies: “... it makes sense that different subjects might choose to favor different objectives, and also that they can do so when instructed. But has this taught us something about motor control or simply that there is a natural ambiguity built into the task?”

      REPLY: The redundancy in the task that allowed different solutions to achieve the task was deliberate, and the motivation for choosing this task for this study. We therefore did not regard the resulting subject-to-subject variability as a finding of our study. Rather, redundancy and inter-individual variability are features ubiquitous in all everyday actions and we explicitly wanted to examine behavior that is closer to such behavior. As commended by the reviewers, CST is a rich task that extends our research beyond the conventional highly-constrained reaching task. The goal of our study was to develop a computational account to identify and classify such differences to better leverage future neural analyses of such more complex behaviors. This choice of task has now been better motivated in the Introduction of the revised manuscript.

      COMMENT: The reviewer asks about our premise that subjects may use different control objectives in different trials, and whether instead a single policy may be a more parsimonious account for the different behavioral patterns in the data, given noise and instability in the system. In support of this view, the reviewer implemented a simple fixed controller and shared their own simulations to demonstrate its ability to generate different behavioral patterns simply by changing the gain of the controller. The reviewer concludes that our data “are potentially compatible with any of these interpretations, depending on which control-style model one prefers.”

      REPLY: We first address the reviewer’s concern that a simple “fixed” controller can account for the two types of behavioral patterns observed in Experiment 2 (instructed groups) by a small change in the control gain. We note that our controller is also fixed in terms of the plant, the actuator, and the sensory feedback loop; the only change we explore is in the relative weights of position vs. velocity in the Q matrix. This determines whether it is deviations in position or in velocity that predominate in the cost function. This, in turn, generates changes in the gain vector L in our model, since the optimal solution (i.e. the gains L that minimize the cost function) depends on the Q matrix as well as the dynamics of the plant (specifically, the lambda value). Hence, one could interpret the differences arising from changes in the control objective (the Q matrix) as changes in the gains of our “fixed” controller.

      More importantly, while the noise and instability in the system may indeed occasionally result in distinct behavioral patterns (and we have observed such cases in our simulations as well), these factors are far from giving an alternative account for the structural differences in the behavior that we attribute to the control objective. To substantiate this point, we performed additional simulations that are provided in the Supplementary Material (Figures S4—6). These simulations show that neither a change in noise nor in the relative cost of effort can account for the two distinct types of behavior. These differences are more consistently attributed to a change in the control objective.

      In addition, our approach provides a normative account of the control gains needed to simulate the observed data, as well as the control objectives that underlie those gains. As such, the two control policies in our model (Position and Velocity Control) resulted in control gains that captured the differences in the experimental groups (Experiment 2), both at the single trial and aggregate levels and across different task difficulties. Figure S9 in the Supplementary Material shows how the control gains differ between Position and Velocity Control in our model across different difficulty levels.

      We agree,with the reviewer’s overall point, that there are no doubt many models that can exhibit the variability observed in our experimental data, our simulations, or the reviewer’s simulations. Our study aimed to explore in detail not only the model’s ability to generate the variable behavior observed in experimental data, but also to match experimental results in terms of performance levels, gains, lags and correlations across a wide range of lambda values, wherein the only changes in the model were the lambda value and the control objective. Without the details of the reviewer’s model, we are unable to perform a detailed analysis of that model. Even so, we are not claiming that our model is the ‘ground truth,’ only that it is certainly a reasonable model, adopted from the literature, that provides intuitive and normative explanation about the performance of humans and monkeys over a range of metrics, system dynamics, and experimental conditions.

      Finally, we understand the reviewer’s concern regarding whether the trial-by-trial identification of control strategy in Figure 8 suggests that (uninstructed) subjects constantly switch control objectives between Position and Velocity. Although it is not unreasonable to imagine that individuals would intuitively try different strategies between ‘keeping the cursor still’ and ‘keeping the cursor at the center’ across trials, we agree that it is generally difficult to determine such trial-to-trial changes, especially when the behavior lies somewhere in between the two control objectives. In such cases, as we originally discussed in the manuscript, an alternative explanation could be a mixed control objective that generates behavior at the intersection of Position and Velocity Control, i.e., between the two slopes in Figure 8. We believe, however, that our modeling approach is still helpful in cases where performance is predominantly based on Position or Velocity Control. After all, the motivation for this study was to parse neural data into two classes associated with each control objective to potentially better identify structure underlying these behaviors.

      We clarified these points in the main text by adding further explanation in the Discussion section.

      COMMENT: The reviewer suggested additional experiments, such as perturbation trials, that might be useful to further explore the separability of control objectives. They also suggested that we temper our conclusion that our approach can reliably discriminate amongst different control policies on individual trials. Finally, the reviewer suggested that we modify our Introduction and/or Discussion to note past human/monkey research as well as investigations of minimization of velocity-error versus position-error in the smooth pursuit system.

      REPLY: We have expanded our simulations to investigate the effects of perturbation on the separability of different control objectives (Figure S10 in Supplementary Materials). We demonstrated that introducing perturbations more clearly differentiated between Position and Velocity Control. These results provide a good basis for further experimental verifications of the control objectives, but we defer these for future work.

      We also appreciate the additional past work that bridges human and monkey research that the reviewer highlights, including the related discussions in the eye movement literature on position versus velocity control. We have modified our Introduction and Discussion accordingly.

      Reviewer #3:

      COMMENT: The reviewer asked whether the observed differences in behavior might be due to some other factors besides the control policy, such as motor noise or effort cost, and suggested that we more systematically ruled out that possibility.

      REPLY: We appreciate and have heeded the reviewer’s suggestion. The revised manuscript now includes additional simulations in which the control objective was fixed to either Position or Velocity Control, while other parameters were systematically varied. Specifically, we examined the influence of the relative effort cost, the sensory delay, and motor noise, on performance. The results of these sensitivity analyses are presented in the Supplementary Material, Figures S4—6. In brief, we found that changing the relative effort cost, delay, or noise levels, mainly affected the success rate in performance (as expected), but did not affect the behavioral features originally associated with control objectives. We include a statement about this result in the main text with reference to the details provided in the Supplementary Material.

      COMMENT: The reviewer questioned our choice of classification features (RMS position and velocity) and wondered if other features might yield better class separation, such as the hand/cursor gain. In a similar vein, reviewer 2 suggested in their recommendations that we examine the width of the autocorrelation function as a potentially better feature.

      REPLY: We note first that our choice of cursor velocity and position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. However, we also explored other features as suggested. We found that they, too, exhibited overlap between the two different control objectives, and did not provide any significant improvement in classification performance (Figures S2 and S3; Supplementary Materials). Of course, that is not to say that a more exhaustive examination of features may not find ones that yield better classification performance than those we investigated, but that is beyond the scope of our study. We refer to this consideration of alternative metrics in the discussion.

      COMMENT: The reviewer notes that “It seems that the classification problem cannot be solved perfectly, at least on a single-trial level.” To address this point, the reviewer suggested that we conduct additional simulations under the two different control objectives, and quantify the misclassifications.

      REPLY: We appreciate the reviewer’s suggestion, and have conducted the additional simulations as suggested, the results of which are included in the revised manuscript.

      COMMENT: “The problem of inferring the control objective is framed as a dichotomy between position control and velocity control. In reality, however, it may be a continuum of possible objectives, based on the relative cost for position and velocity. How would the problem differ if the cost function is framed as estimating a parameter, rather than as a classification problem?”

      REPLY: A blended control strategy, formulated as a cost function that is a weighted combination of position and velocity costs, is indeed a possibility that we briefly discussed in the original manuscript. This possibility arises particularly for individuals whose performance metrics lie somewhere between the purely Position or purely Velocity Control. While our model allows for a weighted cost function, which we will explore in future work, we felt in this initial study that it was important to first identify the behavioral features unique to each control objective.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      None beyond those stated above.

      Reviewer #2 (Recommendations For The Authors):

      COMMENT: Line 166 states "According to equation (1), this behavior was equivalent to reducing the sum (𝑝 + 𝑥) when 𝜆 increased, so as to prevent rapid changes in cursor velocity". This doesn't seem right. In equation 1, velocity (not acceleration) depends on p+x. So a large p+x doesn't create a "rapid change in cursor velocity", but rather a rapid change in cursor position.

      REPLY: The reviewer is correct and we have corrected this misworded sentence; thank you for catching that.

      COMMENT: The reviewer points out the potential confusion readers may have, given our unclear use of ‘control strategy’ vs. ‘control policy’ vs. ‘control objective’. The reviewer suggests that “It would be helpful if this could be spelled out early and explicitly. 'Control strategy' seems perilously close to 'control policy', and it would be good to avoid that confusion. The authors might prefer to use the term 'cost function', which is really what is meant. Or they might prefer 'control objective', a term that they introduce as synonymous with 'control strategy'.”

      REPLY: We thank the reviewer for noting this ambiguity. We have clarified the language in the Introduction to explicitly note that by strategy, we mean the objective or cost function that subjects attempt to optimize. We then use ‘control objective’ consistently and removed the term ‘policy’ from the paper to avoid confusion. We also now use Position Control and Velocity Control as the labels for our two control objectives.

      COMMENT: The reviewer notes that in Figure 2B and the accompanying text in the manuscript, we need to be clearer about what is being correlated; namely, cursor and hand position.

      REPLY: Thank you for pointing out this lack of clarity, which we have corrected as suggested.

      COMMENT: The reviewer questions our attribution of decreasing lag with task difficulty as a consequence of subjects becoming more attentive/responsive when the task is harder, and points out that our model doesn’t include this possible influence yet the model reproduces the change in lag. The reviewer suggests that a more likely cause is due to phase lead in velocity compared to position, with velocity likely increasing with task difficulty, resulting in a phase advance in the response.

      REPLY: Our attribution of the decrease in lag with task difficulty being due to attention/motivation was a recapitulation of this point made in the paper by Quick et al. [2018]. But as noted by the reviewer, this potential influence on lag is not included in our model. Accordingly, the change in lag is more likely a reflection of the phase response of the closed loop system, which does change with task difficulty since the optimal gains depend upon the plant dynamics (i.e., the value of lambda). We have, therefore, deleted the text in question.

      COMMENT: “The Methods tell us rather a lot about the dynamics of the actual system, and the cost functions are also well defined. However, how they got from the cost function to the controller is not described. I was also a bit confused about the controller itself. Is the 50 ms delay assumed when deriving the controller or only when simulating it (the text seems to imply the latter, which might make sense given that it is hard to derive optimal controllers with a hard delay)? How similar (or dissimilar) are the controllers for the two objectives? Is the control policy (the matrix that multiplies state to get u) quite different, or only subtly?”

      REPLY: Thanks for pointing this out. For brevity, we had omitted the details and referred readers to the original paper (Todorov, 2005). However, we now revised the manuscript to now include all the details in the Methods section. Hence, the entire section on the model is new. This also necessitated updating all data figures (Figures 3, 4, 5, 6, 7, 8) as they contain modeling results.

      COMMENT: “Along similar lines, I had some minor to moderate confusions regarding the OFC model as described in the main text. Fig 3 shows a model with a state estimator, but it isn't explained how this works. …Here it isn't clear whether there is sensory noise, or a delay. The methods say a delay was included in the simulation (but perhaps not when deriving the controller?). Noise appears to have been added to u, but I'm guessing not to x or x'? The figure legend indicates that sensory feedback contains only some state variables, and that state estimation is used to estimate the rest. Presumably this uses a Kalman filter? Does it also use efference copy, as would be typical? My apologies if this was stated somewhere and I missed it. Either way, it would be good to add a bit more detail to the figure and/or figure legend.”

      REPLY: As the lack of detail evidently led to some confusion, we now more clearly spell out the details of the model in the Methods, including the state estimation procedure.

      COMMENT: The reviewer wondered why we chose to plot mean velocity vs. mean position as in Figure 5, noting that, “ignoring scale, all scatter plots would be identical if the vertical axis were final position (because mean velocity determines final position). So what this plot is really examining is the correlation between final position and average position. Under position control, the autocorrelation of position is short, and thus final position tends to have little to do with average position. Under velocity control, the autocorrelation of position is long, and thus final position tends to agree with average position. Given this, why not just analyze this in terms of the autocorrelation of position? This is expected to be much broader under velocity control (where they are not corrected) than under position control (where they are, and thus disappear or reverse quickly). To me, thinking of the result in terms of autocorrelation is more natural.”

      REPLY: The reviewer is correct that the scatter plots in Fig. 5 would be the same (to within a scale factor of the vertical axis) had we plotted final position vs. mean position instead of mean velocity vs. mean position as we did. Our preference for mean velocity vs. mean position stems from a dynamical systems perspective, where position-velocity phase-space analysis is common. We now mention these perspectives in the revised manuscript for the benefit of the reader.

      As suggested, we also investigated the width of the (temporal) autocorrelation function (acf) of cursor position for 200 simulated position control trials and 200 simulated velocity control trials, at four different lambda values (50 simulated trials per lambda). Figs. S2A and B (Supplementary Materials) show example trials and histograms of the acf width, respectively. As the reviewer surmised, velocity control trials tend to have wider acfs than position control trials. However, as with the metrics we chose to analyze, there is overlap and there is no visible benefit for the classification.

      COMMENT: “I think equation ten is incorrect, but would be correct if the identity matrix were added? Also, why is the last term of B set to 1/(Tau*M). What is M? Is it mass (which above was lowercase m)? If so, mass should also be included in A (it would be needed in two places in the last column). Or if we assume m = 1, then just ignore mass everywhere, including here and equation 5. Or perhaps I'm confused, and M is something else?”

      REPLY: Thanks for pointing this out. The Matrix A shown in the paper is for the continuous-time representation of the model. However, as the reviewer correctly mentioned, for the discrete-time implementation of the model, a modification (identity matrix) was added in our simulations. We have now clarified this in the Methods section of the revised manuscript. Also, as correctly pointed out, M is the mass of the hand, which depending on whether the hand acceleration (d^2 p/dt^2) or hand force (F) are taken as the state, it can be included in the A matrix. In our case, the A matrix is modified according to the state vector. Similarly, the B matrix is also modified. This is now clarified in the Methods section of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      COMMENT: “Equations 4-8 are written in continuous time, but Equation 9 is written in discrete time. Then Equation 10 is in discrete time. This needs to be tidied up. … I would suggest being more detailed and systematic, perhaps formulating the control problem in continuous time and then converting to discrete time.”

      REPLY: Thank you for this helpful suggestion. The model section in the Methods has been expanded to provide further details of the equation of motion, the discretization process, the control law calculation and the state estimation process.

      COMMENT: “It seems slightly odd for the observation to include only position and velocity of the cursor. Presumably participants can also observe the state of their own hand through proprioception (even if it were occluded). How would it affect the model predictions if the other states were observable?”

      REPLY: Thanks for pointing this out. We initially included only cursor position and velocity since we felt that was the most prominent state feedback, and the system is observable in that case. Nevertheless, we revised the manuscript and repeated all simulations using a full observability matrix. Our findings and conclusions remain unchanged. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “It seems unnecessary to include the acceleration of the cursor in the formulation of the model. …the acceleration is not even part of the observed state according to line 668… I think the model could therefore be simplified by omitting cursor acceleration from the state vector.”

      REPLY: We agree. We have simplified the model, and generated new simulations and figures. Our results and conclusions were unchanged by this modification. With the changes in the modeling, the figures were also updated (Fig.3, 4, 5, 6, 7, 8).

      COMMENT: “In the cost function, it's not clear why any states other than position and velocity of the cursor need to have non-zero values. …The choice to have the cost coefficient for these other states be 1 is completely arbitrary… If the point is that the contribution of these other costs should be negligible, then why not just set them to 0?”

      REPLY: We agree, and have made this change in the Methods section. Our findings and conclusions were unaffected.

      COMMENT: “It seems that the cost matrices were specified after transforming to discrete-time. It is possible however (and perhaps recommended) to formulate in continuous time and convert to discrete time. This can be done cleanly and quite straightforwardly using matrix exponentials. Depending on the discretization timestep, this can also naturally lead to non-zero costs for other states in the discrete-time formulation even if they were zero under continuous time. … A similar comment applies to discretization of the noise.”

      REPLY: Thanks for the suggestion. We have expanded on the discretization process in our Methods section, which uses a common approximation of the matrix exponentiation method.

      COMMENT: “Most of the parameters of the model seem to be chosen arbitrarily. I think this is okay as the point is to illustrate that the kinds of behaviors observed are within the scope of the model. However, it would be helpful to provide some rationale as to how the parameters were chosen. e.g. Were they taken directly from prior literature, or were they hand-tuned to approximately match observed behavior?”

      REPLY: We have revised the manuscript to more clearly note that the noise parameters, as well as parameters of the mechanical system (mass, muscle force, time scale, etc) in our model were taken from previous publications (Todorov, 2005, Cluff et al. 2019). As described in the manuscript, the parameter values of the cost function (Q matrix) were obtained by tuning the parameters to achieve a similar range of success rate with the model as observed in the experimental data. This is now clarified in the Methods section.

      COMMENT: “The ‘true’ cost function for this task is actually a 'well' in position space - zero cost within the screen and very high cost elsewhere. In principle, it might be possible to derive the optimal control policy for this more veridical cost function. It would be interesting to consider whether or not this model might reproduce the observed behaviors.”

      REPLY: This is indeed a very interesting suggestion, but difficult to implement based on the current optimal feedback control framework. However, this is interesting to consider in future work.

      Minor Comments:

      COMMENT: “In Figs 4 and 5, the data points are drawn from different conditions with varying values of lambda. How did the structure of this data depend on lambda? Might it be possible to illustrate in the figure (e.g. the shade/color of each dot) what the difficulty was for each trial?”

      REPLY: We performed additional analyses to show the effects of task difficulty on the choice of control objective. Overall, we found that the main behavioral characteristics of the control objective remained fairly unchanged across different task difficulties or across time. The results of this analysis are included in Fig. S7 and S8 of the Supplementary Materials.

      COMMENT: “Should mention trial duration (6s) in the main narrative of the intro/results.”

      REPLY: We now mention this detail when we describe the task for the first time.

      COMMENT: “As an alternative to training on synthetic data (which might not match behavior that precisely, and was also presumably fitted to subject data at some level) it might be worth considering to do a cross-validation analysis, i.e. train the classifier on subsets of the data with one participant removed each time, and classify on the held-out participant.”

      REPLY: This is indeed a valid point. The main reason to train the classifier based on model simulations was two-fold: first, to have confidence in the training data, as the experimental data was limited and noisy, which would result in less reliable classifications; and second, the model simulations are available for different contexts and conditions, where experimental data is not necessarily available. The latter is a more practical reason to be able to identify control objectives for any subject (who received no instructions), without having to collect training data from matching control subjects who received explicit instructions. Nonetheless, we appreciate the reviewer’s recommendation and will consider that for our future studies.

      COMMENT: “line 690 - Presumably the optimal policy was calculated without factoring in any delay (this would be tricky to do), but the 50ms delay was incorporated at the time of simulation?”

      REPLY: The discretization of the system equations allowed us to incorporate the delay in the system dynamics and solve for the optimal controller with the delay present. This was done simply by system augmentation (e.g., Crevecoeur et al., 2019), where the states of the system in the current time-step were augmented with the states from the 5 preceding time-steps to form the new state vector x(t)_aug =[x(t) , x(t-1) , … , x(t-d) ]. Similarly, the matrices A, B, and H from the system dynamics could be expanded accordingly to form the new dynamical system:

      $$x(t+1){aug} = A{aug} * x(t){aug} + B{aug} * u$$

      Then, the optimal control was implemented on the new (augmented) system dynamics.

      We have revised the manuscript (Methods) to clarify this issue.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study identifies the gene mamo as a new regulator of pigmentation in the silkworm Bombyx mori, a function that was previously unsuspected based on extensive work on Drosophila where the mamo gene is involved in gamete production. The evidence supporting the role of Bm-nano in pigmentation is convincing, including high-resolution linkage mapping of two mutant strains, expression profiling, and reproduction of the mutant phenotypes with state-of-the-art RNAi and CRISPR knock-out assays. While the discussion about genetic changes being guided or accelerated by the environment is extremely speculative and has little relevance for the findings presented, the work will be of interest to evolutionary biologists and geneticists studying color patterns and evolution of gene networks.

      Response: Thank you very much for your careful work. In the revised version, we conducted a comparative genomic analysis of the upstream regions of the Bm-mamo gene in 51 wild silkworms and 171 domesticated local silkworms. The analysis of nucleotide diversity (pi) and the fixation index (FSTs) of the Bm-mamo genome sequences in the wild and domesticated silkworm populations were also performed. The results showed that the Bm-mamo genome sequence of local silkworms was relatively conserved, while the upstream sequence of wild silkworms exhibited high nucleotide diversity. This finding suggested a high degree of variability in the regulatory region of the Bm-mamo gene, in wild strains. Additionally, the sequence in this region may have been fixed by domestication selection. We have optimized the description in the discussion section.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This papers performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument.

      The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate.

      Strengths:

      The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      This revision is a much improved manuscript and I command the authors for many of their edits.

      Response: Thank you very much for your careful work. With the help of reviewers and editors, we have revised the manuscript to improve its readability.

      I find the last part of the discussion, starting at "It is generally believed that changes in gene expression patterns are the result of the evolution of CREs", to be confusing.

      In this section, I believe the authors sequentially:

      • emphasize the role of CRE in morphological evolution (I agree)

      • emphasize that TF, and in particular their own CRE, are themselves important mutational targets of evolution (I agree, but the phrasing need to insist the authors are here talking about the CRE found at the TF locus, not the CRE bound by the TF).

      • use the stickleback Pel enhancer as an example, which I think is a good case study, but the authors also then make an argument about DNA fragility sites, which is hard to connect with the present study.

      • then continue on "DNA fragility" using the peppered moth and butterfly cortex locus. There is no evidence of DNA fragility at these loci, so the connection does not work. "The cortex gene locus is frequently mutated in Lepidoptera", the authors say. But a more accurate picture would be that the cortex locus is repeatedly involved in the generation of color pattern variants. Unlike for Pel fragile enhancer, we don't know if the causal mutations at this locus are repeatedly the same, and the haplotypes that have been described could be collateral rather than causal. Overall, it is important to clarify the idea that mutation bias is a possible factor explaining "genetic hotspots of evolution" (or genetic parallelism sensu 10.1038/nrg3483), but it is also possible that many genetic hotspots are repeated mutational targets because of their "optimal pleiotropy" (e.g. hub position in GRNs, such as mamo might be), or because of particularly modular CRE region that allow fine-tuning. Thus, I find the "fragility" argument misleading here. In fact the finding that "bd" and "bdf" alleles are different in nature is against the idea of a fragility bias (unless the authors can show increased mutation rates at this locus in a wild silkmoth species?). These alleles are also artificially-selected ie. they increased in frequency by breeding rather than natural selection in the wild, so while interesting for our understand of the genotype-phenotype map, they are not necessarily representative of the mutations that may underlie evolution in the wild.

      Response: Thank you very much for your careful work. DNA fragility is an interesting topic, but some explanations for DNA fragility are confusing. One study measured the rate of DNA double-strand breaks (DSBs) in yeast artificial chromosomes (YACs), which are chromosomes containing marine Pel that broke ~25 to 50 times more frequently than did the control. These authors believe that the increase in the mutation rate is caused by DNA sequence characteristics, particularly TG-dinucleotide repeats. Moreover, they found that adding a replication origin on the opposite side of Pel did not cause the fungus to switch fragile, making the forward sequence stable and the reverse complement fragile. Thus, Pel fragility is also dependent on the direction of DNA replication. In summary, they suggested that the special DNA sequence is the cause of DNA fragility. In addition, the sequence features associated with DNA fragility in the Pel region are also found in thousands of other positions in the stickleback and human genomes (Xie KT et al, 2019, science).

      In yeast artificial chromosomes (YACs), the characteristics of DNA sequences, such as TG-dinucleotide repeat sequences, may be important reasons for DNA fragility, and these breaks occur during DNA replication. However, the inserted sequence of YAC often undergoes deletion or recombination during cultivation and passage. In addition, yeast is a single-celled organism. Therefore, the results in yeast cannot represent the situation in multicellular organisms. If multicellular organisms are like this, there are several issues as follows:

      (1) The DNA replication process occurs separately in different multicellular organisms. Because DNA breakage and repair are independent, they can lead to the presence of different alleles in different cells. This can potentially lead to the occurrence of extensive chimeric organisms. However, we have not found such a situation in the genome sequencing of many multicellular organisms.

      (2) If the DNA sequence, TG-dinucleotide repeats, is the determining factor, the mutations near the sequence lose their strong correlation with environmental changes. The researchers conducted yeast artificial chromosome experiments in the same environment and found that the frequency of DNA breaks containing TG dinucleotide repeat sequences was 25 to 50 times greater than that of the control group. This means that, whether in the marine population or the lake population, this part of the sticklebacks’ genome has undergone frequent mutations. However, according to related research, populations of lake sticklebacks, rather than marine populations, often exhibit a decrease in the pelvic phenotype.

      (3) Researchers have found thousands of loci in the genome of sticklebacks and humans that contain such sequences (TG-dinucleotide repeats). This means that thousands of sites undergo frequent mutations during DNA replication. Unless these sites do not possess functionality, they will have some impact on the organism, even causing damage. Even if they are not functional sequences, these sequences will gradually be discarded or replaced during frequent mutations rather than being present in large quantities in the genome.

      Therefore, the study of DNA fragility in yeast cannot explain the situation in multicellular organisms.

      As you noted, we want to express that the frequent variation in the cortex gene should be regulated by targeted regulation involving the GRN in Lepidoptera. In addition, studies on specific epigenetic modifications discovered through the referenced fragile DNA sites suggest that DNA fragility is not determined by the DNA sequence (Ji F, 2020, Cell Res) but rather by other factors, such as epigenetic factors. The sequence features discovered at fragile DNA sites are traces of frequent mutations, not causes.

      In this revision, we analyzed the nucleotide diversity of the mamo genome in 51 wild and 171 domestic silkworms. We found high nucleic acid diversity from the third exon to the upstream region of this gene in wild silkworms. We randomly selected 12 wild silkworms and 12 domestic silkworms and compared their upstream sequences to approximately 1 kb. In wild silkworms, there is significant diversity in their upstream sequences. In domestic silkworms, the sequences are highly conserved, but in some silkworms, a long interspersed nuclear element (LINE) is inserted. This finding suggested that there is frequent variation in the sequence of this region in wild silkworms, while fixation occurs in domesticated silkworms. These genomic data are sourced from the pangenome of silkworms (Tong X, 2022, Nat Commun.). In the pangenomic research, 1078 strains (205 local strains, 194 improved strains, 632 mutant strains, and 47 wild silkworms), which included 545 third-generation sequencing genomes, were obtained. An online website was built to utilize these data (http://silkmeta.org.cn/). We warmly welcome you to use these data.

      In summary, for clearer expression, we have rewritten this section.

      Xie KT, Wang G, Thompson AC, Wucherpfennig JI, Reimchen TE, MacColl ADC, Schluter D, Bell MA, Vasquez KM, Kingsley DM. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science. 2019 Jan 4;363(6422):81-84. doi: 10.1126/science.aan1425.

      Ji F, Liao H, Pan S, Ouyang L, Jia F, Fu Z, Zhang F, Geng X, Wang X, Li T, Liu S, Syeda MZ, Chen H, Li W, Chen Z, Shen H, Ying S. Genome-wide high-resolution mapping of mitotic DNA synthesis sites and common fragile sites by direct sequencing. Cell Res. 2020 Nov;30(11):1009-1023. doi: 10.1038/s41422-020-0357-y.

      Tong X, Han MJ, Lu K, Tai S, Liang S, Liu Y, Hu H, Shen J, Long A, Zhan C, Ding X, Liu S, Gao Q, Zhang B, Zhou L, Tan D, Yuan Y, Guo N, Li YH, Wu Z, Liu L, Li C, Lu Y, Gai T, Zhang Y, Yang R, Qian H, Liu Y, Luo J, Zheng L, Lou J, Peng Y, Zuo W, Song J, He S, Wu S, Zou Y, Zhou L, Cheng L, Tang Y, Cheng G, Yuan L, He W, Xu J, Fu T, Xiao Y, Lei T, Xu A, Yin Y, Wang J, Monteiro A, Westhof E, Lu C, Tian Z, Wang W, Xiang Z, Dai F. High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation. Nat Commun. 2022 Sep 24;13(1):5619. doi: 10.1038/s41467-022-33366-x.

      Lu K, Pan Y, Shen J, Yang L, Zhan C, Liang S, Tai S, Wan L, Li T, Cheng T, Ma B, Pan G, He N, Lu C, Westhof E, Xiang Z, Han MJ, Tong X, Dai F. SilkMeta: a comprehensive platform for sharing and exploiting pan-genomic and multi-omic silkworm data. Nucleic Acids Res. 2024 Jan 5;52(D1):D1024-D1032. doi: 10.1093/nar/gkad956.

      Curiously, the last paragraph ("Some research suggests that common fragile sites...") elaborate on the idea that some sites of the genome are prone to mutation. The connection with mamo and the current article are extremely thin. There is here an attempt to connect meiotic and mitotic breaks to Bm-mamo, but this is confusing: it seems to propose Bm-mamo as a recruiter of epigenetic modulators that may drive higher mutation rates elsewhere. Not only I am not convinced by this argument without actual data, but this would not explain how the mutations at the Bm-mamo itself evolved.

      Response: Thank you very much for your careful work. This section mainly illustrates that DNA fragility is not determined by sequence but is regulated by other factors in animals. In fruit flies, they found that mamo is an important candidate gene for recombination hotspot setting in meiosis. First, we evaluated PRDM9, which plays an important role in setting recombination hotspots during meiosis. Our purpose in mentioning this information is to illustrate that chromosome recombination is a process of programmed double strand breaks and to answer another reviewer's question about programmed events in the genome. In summary, we suggest that some variations in DNA sequences are procedural results. We have optimized the description of this section in this version.

      On a more positive note, I find it fascinating that the authors identified a TF that clearly articulates or orchestrate larval pattern development, and that when it is deleted, can generate healthy individuals. In other words, while it is a TF with many targets, it is not too pleiotropic. This idea, that the genetically causal modulators of developmental evolution are regulatory genes, has been described elsewhere (e.g. Fig 4c in 10.1038/s41576-020-0234-z, and associated refs). To me, the beautiful findings about Bm-mamo make sense in the general, existing framework that developmental processes and regulatory networks "shape" the evolutionary potential and trajectories of organisms. There is a degree of "programmability" in the genomes, because some loci are particularly prone to modulate a given type of trait. Here, Bm-mamo, as a potentially regulator of both CPs and melanin pathway genes, appear to be a potent modulator of epithelial traits. Claiming that there are inherent mutational biases behind this is unwarranted.

      Response: Thank you very much for your careful work. I completely agree with your statement that the genome exhibits a certain degree of programmability. On the one hand, some transcription factors can precisely control the spatiotemporal expression levels of some structural genes (such as pigment synthesis genes). On the other hand, these transcription factors are also subject to strict expression regulation. Because the color pattern is complex, changes in single or minority structural genes result in incomplete or imprecise changes in coloring patterns. Nevertheless, several regulatory factors can regulate multiple downstream target genes. Changes in their expression patterns can lead to holistic and significant changes in color patterns. There are long intergenic regions upstream of many important transcription factors, dozens of kilobase pairs (Kb) to hundreds of Kb, which may contain many different regulatory elements for better control of their expression patterns. Therefore, gene regulatory networks can directly regulate transcription factors to modulate a given type of trait. Transcription factors and their downstream target genes can form a functional module, which is similar to a functional module in software or operating systems. This regulation of transcription factors is simpler in terms of steps, which are similar to a single click switch button. The gene regulatory network regulates these modules in response to environmental changes and is widely recognized.

      Some people do not agree that genetic variations can also be regulated. They claim that this is completely random. The infinite monkey theorem (Félix-Édouard-Justin-Émile Borel, 1909) states that if an infinite number of monkeys were given typewriters and an infinite amount of time, they would eventually produce the complete works of Shakespeare. Although this theory advocates randomness on the surface, its conclusions are full of inevitability (tail event). In nature, some things we observe do not have obvious regularity because they involve relatively complex factors, and the underlying logic is obscure and difficult to understand. We often name them random. However, as we gradually understand the logic behind this complex event, we can also recognize the procedural nature of this randomness.

      Previously, chromosomal recombination during meiosis was believed to be a random event. However, currently, it is believed that the process is procedural. The occurrence of meiotic recombination mentioned earlier indicates that the genome has the ability to self-set the position of double-strand breaks to form new allelic forms. Because meiotic recombination is programmed, transcription factors that recognize DNA sites, enzymes that cleave double strands, and DNA repair systems exist, programming can also introduce genetic variation. A study in plants has provided insights into this programmed mutation (Monroe JG, 2023, nature). Frequent changes in the expression patterns of some transcription factors occur between and/or within species. In this article, we only discuss the possible reasons for variations in the expression patterns of some transcription factors in a general manner and simple reasoning. We have added an analysis of the response of wild silkworms and improved the relevance of the discussion.

      Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D, Weng ML, Imbert E, Ågren J, Rutter MT, Fenster CB, Weigel D. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105. doi: 10.1038/s41586-021-04269-6. Epub 2022 Jan 12. Erratum in: Nature. 2023 Aug;620(7973):

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Please structure your Discussion with section headers.

      Response: Thank you very much for your careful work. We have added relevant section headers.

      • As explained in my public review, I found the two last sections of the Discussion to be dispersed and confusing. I also must say that I carefully read the Response to Reviewers on this, which helped me to better understand the authors' intentions here. Please consider the revision of this Discussion as this feels extremely speculative difficult to connect with Bm-mamo.

      Response: Thank you very much for your careful work. We have rewritten this part of the content.

      • typo: were found near the TTS of yellow --> TSS

      Response: Thank you very much for your careful work. We have made these modifications.

      • l. 234 :"expression level of the 18 CP genes in the integument". Consider adding a mention of Figure 7 here, as only Fig. S10 is cited here.

      Response: Thank you very much for your careful work. We have made these modifications.

      • Editorial comment on the second half of the Abstract:

      Wu et al : "We found that Bm-mamo can comprehensively regulate the expression of related pigment synthesis and cuticular protein genes to form color patterns. This indicates that insects have a genetic basis for coordinate regulation of the structure and shape of the cuticle, as well as color patterns. This genetic basis provides the possibility for constructing the complex appearances of some insects. This study provides new insight into the regulation of color patterns."

      I respectfully suggest a more accurate rephrasing, where the methods are mentioned, and where the logical argument is more straightforward. For example

      "Using RNAi and CRISPR we show that Bm-mamo is a repressor or dark melanin patterns in the larval epithelium. Using in-vitro binding assays and gene expression profiling in wild-type and mutant larvae, we also show that Bm-mamo likely regulate the expression of related pigment synthesis and cuticular protein genes in a coordinated manner to mediate its role in color pattern formation. This mechanism is consistent with a dual role of this transcription factor in regulating both the structure and shape of the cuticle and pigments that are embedded within it. This study provides new insight into the regulation of color patterns as well as in the construction more complex epithelial features in some insects."

      I hope this let the ideas of the original version transpire as the authors intended.

      Response: Thank you very much for your careful work. We have made these modifications.

    1. Author Response

      Public Reviews

      We thank both reviewers for taking the time and effort to think critically about our paper and point out areas where it can be improved. In this document, we do our best to clarify any misunderstandings with the hope that further consideration about the strengths and weaknesses of our approach will be possible. Our responses are in bold.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation.

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures. Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study.

      The environments we study represent 12 different concentrations or combinations of two drugs, radicicol and fluconazole. Our hope is that this large dataset (774 mutants x 12 environments) will be useful, both to scientists who are generally interested in the genetic and phenotypic underpinnings of adaptation, and to scientists specifically interested in the evolution of drug resistance.

      Weaknesses:

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements.

      This is a misunderstanding that we will work to clarify in the revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons.

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we will explicitly state that these 21,000 isolated lineages do not all represent unique, adaptive lineages. In figure 2 and all associated text, we will change the word “lineages” to “isolates,” where relevant.

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype, and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Note that 774 adaptive lineages is more than most previous studies. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 161 - 162).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance.

      The word “briefly” feels a bit unfair because we discuss this bias on 3 separate occasions (on lines 146 - 147, 260 - 264, and in more detail on 706 - 714). We even walk through an example of a class of mutants that our study misses. We say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we will add more text to the first mention of these missing mutants (lines 146 - 147) so that the implications are more immediately made apparent.

      While we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs.

      We discussed these implications in some detail in the 16 lines mentioned above (146 - 147, 260 - 264, 706 - 714). To add to this discussion, we will also add the following sentence to the end of the paragraph on lines 697 - 714: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”.

      We will also add a new paragraph that discusses these implications earlier in our manuscript. This paragraph will highlight the strengths of our method (e.g., that we “catch” classes of mutants that are often overlooked) while being transparent about the weaknesses of our approach (e.g., that we “miss” mutants with strong tradeoffs).

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations.

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult.

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system).

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay.

      We understand how the reviewer came to this misunderstanding and will adjust our revised manuscript accordingly. Previous work has demonstrated that, in this particular evolution platform, most of the mutations actually occur during the transformation that introduces the DNA barcodes (PMID25731169). In other words, these mutations do not accumulate during the 40 generations of evolution, they are already there. So the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.

      This concern, and all subsequent concerns, seem to be driven by either (a) general concerns about the noisiness of fitness measurements obtained from large-scale barcode fitness assays or (b) general concerns about whether the clusters obtained from our dimensional reduction approach capture this noise as opposed to biologically meaningful differences.

      We will respond to each concern point-by-point, but want to start by generally stating that (a) our particular large-scale barcode fitness assay has several features that diminish noise, and (b) we devote 4 figures and 200 lines of text to demonstrating that these clusters capture biologically meaningful differences between mutants (and not noise).

      In terms of this specific concern, we performed an analysis of noise in the submitted manuscript: Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing noise (Figure S7 panel B). But we agree with the reviewer that this analysis alone is not sufficient to conclude that the clusters distinguish groups of mutants with unique fitness tradeoffs.

      Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages.

      To evaluate the strength of the clustering, we performed numerous analyses including whole genome sequencing, growth experiments, reclustering, and tracing the evolutionary origins of each cluster (Figures 5 - 8). All of these analyses suggested that our clusters capture groups of mutants that have different fitness tradeoffs. We will adjust our revised manuscript to make clear that we do not rely on the results of a clustering algorithm alone to draw conclusions about phenotypic convergence.

      We are also grateful to the reviewer for helping us realize that, as written, our manuscript is not clear with regard to how we perform clustering. We are not using UMAP to decide which mutant belongs to which cluster. Recent work highlights the importance of using an independent clustering method (PMID37590228). Although this recent work addresses the challenge of clustering much higher dimensional data than we survey here, we did indeed use an independent clustering method (gaussian mixture model). In other words, we use UMAP for visualization but not clustering. We also confirm our clustering results using a second independent method (hierarchical clustering; Figure S8). And in our revised manuscript, will confirm with a third method (PCA, see below). We will adjust the main text and the methods section to make these choices clearer.

      This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted.

      The salient question is whether the clusters are so “fuzzy” that they are not meaningful. That interpretation seems unreasonable. Our clusters group mutants with similar genotypes, evolutionary histories, and fitness tradeoffs (Figures 5 - 8). Clustering mutants with similar behaviors is important and useful. It improves phenotypic prediction by revealing which mutants are likely to have at least some phenotypic effects in common. And it also suggests that the phenotypic space is constrained, at least to some degree, which previous work suggests is helpful in predicting evolution (PMID33263280, PMID37437111, PMID22282810, PMID25806684).

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components.

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent some intuitive phenotype, like resistance to fluconazole.

      Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods.

      We will adjust our revised manuscript to explain these reasons why we chose UMAP and GMM over PCA.

      Also, we will include PCA in the supplement of our revised manuscript. Please find below PC1 vs PC2, with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Author response image 1.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages.

      We worry that the idea stems from apriori notions of what the important dimensions should be. It also seems like this would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole.

      Also, we believe the reviewer meant “fitness profile” and not “fitness landscape”. A fitness landscape imagines a walk where every “step” is a mutation. Most lineages in barcoded evolution experiments possess only a single adaptive mutation. A single-step walk is not enough to build a landscape, though others are expanding barcoded evolution experiments beyond the first step (PMID34465770, PMID31723263), so maybe one day this will be possible.

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered.

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. We thank the reviewer for pointing out this gap in our writing. We will adjust our revised manuscript to explain that we ultimately chose to describe 6 clusters that we were able to validate with follow-up experiments. In figures 5, 6, 7, and 8, we use external information to validate the clusters that we report in figure 4. And in lines 697 – 714, we explain that there are may be additional clusters beyond those we tease apart in this study.

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset.

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e. merge) when we removed noise suggests these clusters were not capturing noise.

      More generally, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays.

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously (PMID37237236).

      The main assay we use to measure fitness has been previously validated (PMID27594428). No subsequent study using this assay validates using the methods suggested by the reviewer (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203).

      More to the point, bar-seq has been used, without the reviewer’s suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate.

      For all of these reasons, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors.

      We don’t agree that fitness measurements obtained from this bar-seq assay generally require validation. But we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways, in particular, in that they have different fitness tradeoffs. We have four figures (5 - 8) and 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. Happily, one of these figures (Fig 7) includes growth curves, which are exactly the type of validation experiment asked for by the reviewer.

      Below, we walk through the different types of validation experiments that are present in our original manuscript, and additional validation experiments that we plan to include in the revised version. We are hopeful that these validation experiments are sufficient, or at the very least, that this list empowers reviewers to point out where more work is needed.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S10).

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. Indeed they often do (see pie charts in Figures 6, 7, 8). This method also provides evidence supporting each cluster’s differing fitness tradeoffs.

      For example, mutants in cluster 5 appear to have a tradeoff in a double drug condition (described above). They rarely originate from that evolution condition, unlike mutants in nearby cluster 4 (see Figure 7).

      (3) Mutants from each cluster often fall into different genes: In our original manuscript, we sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6).

      (4) Mutants from each cluster have behaviors previously observed in the literature: In our original manuscript, we compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 457 - 462). Previous work suggests that some mutations to PDR have different tradeoffs than others, which is what we see (lines 540 - 542). IRA1 mutants were previously observed to have high fitness in our “no drug” condition, and are found in the cluster that has the highest fitness in the “no drug” condition (lines 642 - 646). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 652 - 657).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods: In our original manuscript, we performed various different reclustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S9). The clusters of mutants that we observe in figure 4 do not change substantially when we recluster the data. We will add PCA (see above) to these analyses in our revised manuscript.

      (6) We will include additional data showing that mutants in different clusters have different evolutionary origins: Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole (see Fig 4E and Fig 5C). In our revised manuscript, we will show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see figure panel A below). No other cluster’s evolutionary history shows this pattern (figures 6, 7, and 8).

      (7) We will include additional data showing that mutants in different clusters have different growth curves: Cluster 1 lineages are unique in that their fitness advantage is specific to low flu and trades off in higher concentrations of fluconazole. We obtained growth curves for three cluster 1 mutants (2 SUR1 mutants and 1 UPC2 mutant). We compared them to growth curves for three PDR mutants (from clusters 2 and 3). Cluster 1 mutants appear to have the highest growth rates and reach the higher carrying capacity in low fluconazole (see red and green lines in Author response image 2 panel B below). But the cluster 1 mutants are negatively affected by higher concentrations of fluconazole, much more so than the mutants from clusters 2 and 3 (see Author response image 2 panel C below). This is consistent with the different fitness tradeoffs we observe for each cluster (figures 4 and 5). We will include a more detailed version of this analysis and the figures below in our revised manuscript.

      Author response image 2.

      Validation experiments demonstrate that cluster 1 mutants have uniquely high fitness in only the lowest concentration of fluconazole. (A) The mutant lineages in cluster 1 were largely sampled from evolution experiments performed in low flu. This is not true of other clusters (see pie charts in main manuscript). (B) In low flu (4 𝜇g/ml), Cluster 1 lineages (red/UPC2 and green/SUR1) grow faster and achieve higher density than lineages from clusters 2 and 3 (blue/PDR). This is consistent with barseq measurements demonstrating that cluster 1 mutants have the highest fitness in low flu. (C) Cluster 1 lineages are sensitive to increasing flu concentrations (SUR1 and UPC2 mutants, middle and rightmost graphs). This is apparent in that the gray (8 𝜇g/ml flu) and light blue (32 𝜇g/ml flu) growth curves rise more slowly and reach lower density than the dark blue curves (4 𝜇g/ml flu). But this is not the case for the PDR mutants from clusters 2 and 3 (leftmost graph). These observations are consistent with the bar-seq fitness data presented in the main manuscript (Fig 4E).

      With all of these validation efforts combined, we are hopeful that the reviewer is now more convinced that our clusters capture groups of mutants with different fitness tradeoffs (as opposed to noise). We want to conclude by saying that we are grateful to the reviewer for making us think deeply about areas where we can include additional validation efforts as well as areas where we can make our manuscript clearer.

      Reviewer #2 (Public Review):

      Summary:

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping.

      Strengths:

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory).

      We are very grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.

      Weaknesses:

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one!

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think:

      We think that phrasing the “jump” as a question might help lay readers get from point A to point B. So, in the introduction of our revised manuscript, we will add a paragraph roughly similar to this one: “If two groups of drug-resistant mutants have different fitness tradeoffs, does it mean that they provide resistance through different underlying mechanisms? Alternatively, it could mean that both provide drug resistance via the same mechanism, but some mutations come with a cost that others don’t pay. However, another way to phrase this alternative is to say that both groups of mutants affect fitness through different suites of mechanisms that are only partially overlapping. And so, by identifying groups of mutants with different fitness tradeoffs, we argue that we will be uncovering sets of mutations that impact fitness through different underlying mechanisms. The ability to do so would be useful for genotype-phenotype mapping endeavors.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm.

      In our revised manuscript, we will carefully review all citations. The issue may stem from our attempt to reach two different groups of scientists. We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). Though the 3 papers the reviewer mentions on lines 132 - 133 all pertain to yeast, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should apply broadly, beyond yeast. Similarly, the reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the genotype-phenotype-fitness map should apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So we cited papers from across the tree of life to support this sentence.

      On the other hand, because we study drug resistant mutations, we also hope that our work is of use to scientists studying the evolution of resistance. We agree with the reviewer that in this regard, some of our findings may be especially pertinent to the evolution of resistance to antifungal drugs. We will consider this when reviewing the citations in our revised manuscript and add some text to clarify these points.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae).

      In the revised manuscript, we will make clear that we study S. cerevisiae.

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly?

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance. Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper.

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections?

      Perhaps because our background lies in general study of the genotype-phenotype map, we did not want to make bold assertions about how our work might apply to pathogenic yeasts. But we see how this could be helpful and will add some discussion points about this. Specifically, we will discuss which of the genes and mutants we observe are also found in Candida. We will also investigate whether our observation that low fluconazole represents a seemingly unique challenge, not just a milder version of high fluconazole, has any corollary in the Candida literature.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      • Albeit the link between CSA and NE integrity the work is in my eyes too preliminary. Although the data presented are well done and carefully evaluated they mostly (except Fig 1A) rely on direct comparisons of one patient cell line (CS-A or CS-B) to the same cells expression the wildtype protein. It remains thus open whether the effects seen on LEM2 expression, LEM2-LaimA/C interaction, stress fibre formation, cGAS/STING signaling pathway activation in the CS-A cells are representative for a number of different CS patient derived cells. This is especially important given the small changes observed. Please note that there are alos clear differences between the CSA-wt and CSB-wt cells. Would the HPA CSA KO cells show in addition to NE irregularities (not even quantified) the same phenotypes and can they be reverted by re-expression of the wildtype CSA protein? We would like to thank the reviewer for this comment. Indeed we have previously observed nuclear circularity defects in CSA KO HAP1 cells but we haven’t investigated the other phenotypes in this cell line. One of the main reasons behind this is the technical difficulties associated with performing immunofluorescence with HAP1 cells who are very small and tend to grow in aggregates.

      Proposed experimental plan: To address this reviewer’s comment, we will attempt again to use HAP1 cells (WT and CSA KO) and look at nuclear circularity (with quantification), stress fiber formation and cGas foci. If we don’t succeed, we will use an alternative isogenic cell model consisting of fibroblasts in which we will knock out CSA using CRISPR/Cas9. We will then repeat the same experiments as proposed above for the HAP1 cells.

      • The link between CSA and SUN1 is not well worked out. What is the effect of SUN2 downregulation and that of nespirins? It remains unclear whether the observed effects are indeed LINC mediated. Proposed experimental plan: To address this point, we will downregulate SUN2 and nesprins using siRNAs in two different cell models (as described above) and assess nuclear shape as well as cGas/STING pathway activation.

      Minor comments:

      * Fig 1B: Why is HA-Tagged CSA not shown on the CSA western? This would be helpful to compare to the endogenous levels at least in CSB cells. A western showing an housekeeping marker would allow better comparison. Judging from the proteins markers HA-tagged CSA seems much larger as endogenous CSA (first versus second row). Again, less cropped western blots would help.*

      We are sorry for the confusion and we have realised that the molecular weights on the western blots were incorrectly labeled on this figure. This will be modified, and the full, uncropped WB will be provided as a supplementary file.

      Fig 3A: Is CSA FLAG or HA-tagged? Or both? If both are expressed the question raises of why the CSA-LEM 2 interaction is only seen in an overexpression situation.

      • *

      CSA is HA tagged. To address this reviewer’s comment we will try performing immunoprecipitation on endogenous proteins.

      Fig 5: Inconsistency between figure and figure legend: 20 vs 25 nM Jasplakinolide. I assume Latrunculin A should read Cytochalasin D?

      Thank you for pointing this out, and yes this is indeed a mistake on our labeling. We will rectify this on the figure legend.

      Fig5B: Not clear why in "CSA-wT cells" Cytochalasin D and Jasplakinolide have the same effect on nuclear envelope shape yet only Jasplakinolide increases the number of blebs.

      Cyt D inhibits actin polymerization while jasplakinolide increases polymerization. Likely actin polymerization increases blebs through extra force being put on the nucleus through actin cables/ actin based motility. Both drugs decrease nuclear roundness as they disrupt the normal actin network leading to worsening of nuclear shape through different mechanisms.

      Page 10: Method for IF: 4% (v/v) paraformaldehyde and 2% /v/v) should likely read (w/v). Page 19: replace "withl" by "with".

      We will rectify both these points.

      Reviewer #2

      The paper is well-written and for the most part, the data support the conclusions of the authors. Some minor caveats could be addressed to improve the quality of the manuscript.

      We would like to thank the reviewer for their positive feedback on our manuscript.

      • The phenotype of decreased LEMD2 incorporation into the NE in CS-A cells is minor. Only ~20% and thus, it is not clear whether this is causal of any of the NE abnormalities. It should be better explained how these data add to the story.

      To address this point, we will overexpress LEMD2 in CS-A cells and assess whether the NE phenotype can be significantly rescued. This will add value to this part of story.

      • Inducing actin polymerization and depolarization impact nuclear morphological abnormalities and nuclear blebbing. Do these treatments impact nuclear fragility and cGAS accumulation at NE break sites?* This is a good point indeed. To address this question, we will include cGas foci staining and quantification upon treatment with these chemicals.

      • Depletion of SUN1 in CS-A cells increased nuclear circularity, decreased blebbing, and phosphorylation of TBK1. The impact of SUN1 depletion in cGAS foci formation at NE break sites and phosphorylation of STING is not shown. Such experiments will provide stronger evidence that CS-A activates the cGAS-STING pathway in a SUN1 (mechanical stress)-dependent manner.

      * We will address this question by analysing cGas foci and cGas-STING pathway activation upon SUN1 depletion by siRNA.

      Reviewer #3

      • *

      The data are generally clear, well performed and well interpreted with some exceptions:*

      1) I appreciate the use of isogenic cell lines (a big plus when dealing with patient-derived cell lines). However, these lines were established 30 years ago and the reported phenotypes might be due to genetic drifts. To exclude this, I suggest to complement the HAP-1 ERCC8 KO cell line with exogenously expressed CSA and assess if this rescues the phenotypes reported. Validation of the KO in these lines, either by western blotting or sequencing is needed.*

      This point has also been raised by the first reviewer, and will be addressed as described above (and pasted below):

      We would like to thank the reviewer for this comment. Indeed we have previously observed nuclear circularity defects in CSA KO HAP1 cells but we haven’t investigated the other phenotypes in this cell line. One of the main reasons behind this is the technical difficulties associated with performing immunofluorescence with HAP1 cells who are very small and tend to grow in aggregates.

      Proposed experimental plan: To address this reviewer’s comment, we will attempt again to use HAP1 cells (WT and CSA KO) and look at nuclear circularity (with quantification), stress fiber formation and cGas foci. If we don’t succeed, we will use an alternative isogenic cell model consisting of fibroblasts in which we will knock out CSA using CRISPR/Cas9. We will then repeat the same experiments as proposed above for the HAP1 cells.

      2) Related to the complementation of patient cell lines, the exogenous HA-CSA is not recognised by the anti-CSA in the CSA-null patient cell lines (Fig 1B, second blot). Shouldn't you be able to see this exogenous protein? HA-GFP-CSB in the complemented CSB-null patient cell line runs at the same weight as endogenous CSB (Fig 1B, fourth blot). This is also unexpected. I think you need better characterisation of your cell lines and need to demonstrate the level of exogenous transgenes that have been used to complement the cells and that they localise appropriately, presumably to the nucleus. You should also make sure to cite the paper where they were isolated and describe that they were immortalised (Troelstra et al., 1992) and the paper in which transgenes were stably overexpressed (Qiang et al., 2021).*

      *

      As mentioned above, we have realised that we made some mistakes with the labeling of the molecular weight on this western blot. This will be corrected and the full uncropped western blot will be provided as a supplementary figure.

      We will also cite the suggested papers accordingly.

      3) Immunolocalisation of INM proteins is notoriously tricky and the permeabilisation steps include only 0.2% Tx100, which can be insufficient to permeabilise the INM. I appreciate the Emerin and Lamin immunostaining seems to have worked, but in many cases successful immunostaining can be antibody-specific. Can you try harsher permeabilisation to expose LEM2 epitopes? I'm somewhat uncomfortable with the suggestion that there is a cytosolic (ER?) pool of endogenous LEM2 as this runs counter to the literature and feel that your antibody or fixation conditions are illuminating a non-specific protein. The WB in Fig 2E shows that there is virtually no LEM2 in the "soluble" fraction. I would be more cautious on this cytoplasmic/nuclear pool interpretation. Biochemical nuclear and cytoplasmic fractionation would help clarify the signal in a NE vs a non-NE pool.*

      *

      As suggested by the reviewer, we will try harsher permeabilisation conditions to test the LEMD2 antibody. As we suggest in the manuscript however, we think that the “cytoplasmic” LEMD2 pool we observed by IF in the absence of pre-extraction is indeed unspecific. This is why we have performed the rest of the experiments with a pre-extraction step, that we have shown to give a specific LEMD2 signal that disappear upon depleting LEMD2 by siRNA.

      4) Page 15: "Using a Proximity Ligation Assay (PLA), we showed a significant reduction in the number of PLA foci in CS-A cells compared to the WT(HA-CSA) cells, reflecting a reduced number of LEMD2-lamin A/C complexes (Figure 2G, 2H). This data suggests defects in the incorporation of LEMD2 into the NE and lamin protein complexes in CS-A cells". If you have less LEM2 in the NE, it is quite expected that you will have less "LEM2-laminA/C" complexes. To me the logic doesn't hold and this data does not suggest that there is an underlying defect in LEM2-lamin interaction. To ascertain whether there is such a defect one could perform an IP against LEM2 and quantify laminA/C, normalizing by the amount of LEM2 in the input.

      We feel we may not have been clear in how we interpreted this data. What we mean is that in each individual cell, the number of Lamin-LEMD2 complexes is decreased, probably indeed due to the fact that there is less LEMD2 altogether within the nucleus in the absence of CSA. We will clarify this in the text.

      5) "We overexpressed LEMD2-GFP and Flag-CSA constructs, followed by GFP pulldown in WT(HA-CSA) cells". Since the co-IP data are obtained in overexpression conditions (of both HA-CSA and Flag-CSA?), the authors should validate the interaction between LEM2 and CSA using an orthogonal approach. Perhaps anti-HA capture of the WT(HA-CSA) cells would allow you to immunoblot for endogenous LEM2?*

      *

      To address this point, we will try to immunoprecipitate HA-CSA and look at endogenous LEMD2.

      6) Related to the CSA-LEM2 binding in the above experiment, the procedure involves combining a native detergent-extracted cytoplasmic pool with a denatured (RIPA-extracted) nuclear pool for performing the GFP-trap. From which pool was the tagged CSA bound to LEM2 in?*

      *

      We are sorry about the confusion. We didn’t try to run the IP from the different pools but instead from the combined pools, to ensure we were looking in the whole cell extract. We would expect however that the interaction occurs in the nuclear pool as both CSA and LEMD2 are nuclear proteins.

      7) "The absence of CSA in CS-A patient cells does not affect the mobility of LEMD2 at the NE but instead decreases its interaction with A-type lamins". To me the fact that loss of CSA decreases LEM2-lamin interaction is not well supported (see point 3).

      • *

      See our response to point 4

      8) "Here, we showed by immunoprecipitation that LEMD2 also interacts with CSA. This suggests that the recruitment and stabilization of LEMD2 to the NE is mediated by an interaction with CSA, although the mechanism remains unclear". I think this is an overstatement: there are no data suggesting that CSA recruits or stabilises LEM2 at the NE.

      * *We will tone down this statement in the text

      9) As the authors suggest in the discussion, it would be worth checking whether LEM2 overexpression is able to rescue some of the NE defects reported, strengthening the hypothesis that LEM2 levels are at least in part responsible for the phenotypes reported.

      To address this point, we will perform LEMD2 overexpression in the CSA cells, and analyse the nuclear envelope defects and ruptures (shape and cGas foci quantification)

      10) To me it is not clear how the reported phenotypes are interrelated. The first part of the manuscript shows that CSA interacts with LEM2, and that loss-of-function CSA impacts on LEM2 levels and LEM2-lamin interaction, suggesting a direct role for CSA at the nuclear envelope. The second part of the manuscript shows that cells with defective CSA have more actin stress fibres and releasing the cytoskeleton-nuclear tethering is able per se to rescue the nuclear membrane and cGAS phenotypes. How do the authors reconciliate these two parts? Is CSA directly involved in both inner nuclear membrane homeostasis and actin cytoskeleton modulation or is this latter role upstream and the NE defects a mere consequence of increased cytoskeleton rigidity?

      At this point indeed we cannot draw definitive conclusions as to whether the two described phenotypes are inter-related. However, by addressing the other points raised by the reviewers, we hope this will help clarifying the mechanism.

      11) It is not clear how or why actin stress fibres are elevated in the CS-A cells. Can the authors provide any insight based on their RNAseq analysis? Demonstrating a link to ROCK, LIMK or Rho signalling would be interesting and verifying ppMLC2 levels would help explain why contractility is enhanced. Additionally, is the increase in contractility dependent upon any of the genes identified as up- or downregulated in RNAseq? Presently, the manuscript is missing a link between its two halves.*

      *

      We would like to reiterate that the RNASeq analysis we performed was done on previously published data from another group (as described in the text). To address the point raised by the reviewer, we will look more specifically into our analysis to look at ROCK, LIMK or Rho signalling to see if any of these pathways appear to be modulated by the absence of CSA.

      12) Related to point 1, the RNAseq comparison was performed on patient cells lacking CS-A and patient cells lacking CS-A and later over-expressing HA-CSA, and this comparison is used extensively for phenotype description in the manuscript. In isn't clear to me that this is the most insightful comparison to make; the rescue by overexpression is not as elegant as CRISPR reversion and the ko fibroblasts have presumably been surviving well in culture without CS-A before this protein was overexpressed. Can you validate the differential expression of any identified proteins in the acute HAP1 ko? Can you validate any of the differentially expressed proteins in comparison to normal fibroblasts (e.g., 13O6, as per Qiang et al., 2021)?

      As we will validate our experiments in an additional cell model (as described above), we will also indeed validate the level of expression of cytoskeletal proteins upon CSA KO/rescue.

      Minor comments

      * - Page 14: "To characterize the NE phenotypes further, we obtained CS patient-derived cell lines carrying loss-of-function mutations in CSA (CS-A cells) or CSB (CS-B cells), and their respective isogenic control cell lines (WT(HACSA) and WT(HA-GFP-CSB))." What type of loss-of-function? Is the mutant protein still produced? In Fig 6A there seem to be a band in the CS-A blot (second lane), but in Fig 1B, there isn't. I think this is important to know to interpret the phenotype related to LEM2 interaction.*

      We can clarify that in the text. Indeed, the loss of function mutation leads to the absence of CSA protein.

      - Figure 1B is poorly annotated. What do - and + stand for? In general, I find a bit confusing how the WB are presented throughout the manuscript, specifically how the antibodies are reported (e.g., HA-CSA instead of HA). Please mark up all western blots with antisera used. Please make sure all expected bands are within the crops - e.g., Fig 3B, the anti-LEM2 blot should be expanded vertically to show the LEM2-GFP relative to endogenous LEM2.

      We will correct these on the figures

      - From the methods, it appears that you obtained a Please provide clarity on which construct was used in which figure, and verify that an N-terminally tagged LEM2 still localises to the NE.

      We actually cloned LEMD2 into an empty pEGFP vector but still maintained LEM2-GFP. We will remove the C1plasmid from the methods to avoid confusion as we removed the MCS and GFP and just used the blank vector and inserted lem2-gfp as we obtained it.

      - Fig 1I: there is some text on top of the upper panels (DAPI, cGAS, Merge).

      • *

      * - "Through gene ontology analysis, we found that genes involved in endoplasmic reticulum (ER) stress were differentially expressed (Figure 4B)". I don't think that the way data are shown in Fig 4B is effective. Since GO has been performed, I would replace the table with a GO enrichment analysis graph. Ensure to report all the data in a supplementary .xls so that others can see and reuse it. Is there a mandated repository that accepts RNAseq data?*

      The RNAseq experiment and data was performed by another group and reported in a previous study, as referenced in the main text of the manuscript (Epanchintsev A, Costanzo F, Rauschendorf MA, Caputo M, Ye T, Donnio LM, et al. Cockayne’s Syndrome A and B Proteins Regulate Transcription Arrest after Genotoxic Stress by Promoting ATF3 Degradation. Mol Cell. 2017 Dec;68(6):1054-1066.e6.). Here, we only re-analysed their data using STRING pathway analysis, as detailed in the Material and Methods. However, as suggested by the reviewer, we will replace the table by a GO enrichment graph.

      - The volcano plot looks weird with many values at the maximum log10 (P-value) - is the data processed appropriately?

      As mentioned above, the RNA Seq analysis was performed and published in a different study. We think this is because the Y axis shows adjusted P values.

      - Figure 5B: the legend says "Latrunculin A". Please correct.

      We will correct this

      - For a Wellcome funded researcher, I'm surprised that the mandated OA statement and RRS is absent from the acknowledgements.

      We will of course comply with the open access policy of the Wellcome Trust. However, and based on the WT requirements detailed on their website, we believe the acknowledgement section complies with the funder’s policy: “All research publications must acknowledge Wellcome's support and list the grant reference number which funded the research reported.”

      Maybe mention the changes in nuclear shape is not a causative of nuclear blebbing. But maybe not say that they are completely mutually exclusive phenotype to each other.

      suggestion

      Maybe say that we will overexpress LEMD2 in CS-A cells and show that the NE phenotype can be significantly rescued. This will add value to this part of story. I remember when I did the FRAP experiment, CS-A cells with expression of LEMD2-GFP (that doesn’t form aggregates) looks better in term of shape.

      I think Anne, please check the plasmid map? According to the lab inventory (Plasmids Anne), it is LEMD2-GFP. So probably GFP is at C-terminus.

      I think there was a part in discussion was LEMD2-GFP was mistakenly written as GFP-LEMD… But I am sure I used LEMD2-GFP throughout the work

      We cloned it into an empty pEGFP vector but still maintained LEM2-GFP. Maybe remove the C1 in the methods to avoid confusion as we removed the MCS and GFP and just used the blank vector and inserted lem2-gfp as we obtained it.

      Same construct was used for GFP pulldown and for FRAP. And we can see in FRAp that they localise to the NE. SO it should localise to the NE. Maybe mention that we will do a LEMD2-GFP over expression experiment in CS-A cells and show that they do localise to the NE.

      I don’t remember fully if Denny did this and what came out. I thought he did and ER stress and cytoskeleton regulation came out as enriched terms?

      Denny will have to check this but I think this is because the Y axis shows adjusted P values?? I have the same in my data and Jack told me this is an artefact of the analysis if you adjust for multiple comparisons and is something more often seen in mass spec data

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The study investigates the role of cylicin-1 (CYLC1) in sperm acrosome-nucleus connections and its clinical relevance to male infertility. Using mouse models, the researchers demonstrate that cylicin-1 is specifically expressed in the post acrosomal sheath-like region in spermatids and plays a crucial role in mediating acrosome-nucleus connections. Loss of CYLC1 results in severe male subfertility, characterized by acrosome detachment and aberrant head morphology in sperm. Further analysis of a large cohort of infertile men reveals CYLC1 variants in patients with sperm head deformities. The study provides valuable insights into the role of CYLC1 in male fertility and proposes CYLC1 variants as potential risk factors for human male infertility, emphasizing the importance of mouse models in understanding the pathogenicity of such variants.

      We appreciate the comprehensive summary of reviewer 1.

      Strengths:

      This article demonstrates notable strengths in various aspects. Firstly, the clarity and excellent writing style contribute to the accessibility of the content. Secondly, the employed techniques are not only relevant but also complementary, enhancing the robustness of the study. The precision in their experimental design and the meticulous interpretation of results reflect the scientific rigor maintained throughout the study. Furthermore, the decision to create a second mouse model with the exact CYLC1 mutation found in humans adds significant qualitative value to the research. This approach not only validates the clinical relevance of the identified variant but also strengthens the translational impact of the findings.

      We appreciate the positive comment of reviewer 1.

      Weaknesses:

      There are no obvious weaknesses. While a few minor refinements, as suggested in the recommendations to authors, could enhance the overall support for the data and the authors' messages, these suggested improvements in no way diminish the robustness of the already presented data.

      In the recommendation for the authors, reviewer 1 mentioned a recent study (Schneider et al., eLife, 2023) showing that Cylc1-KO mice exhibits a reduced sperm count, an observation not noted in our current study. We would like to comment that that main and most important phenotype of Cylc1-KO mice in both studies is quite similar, including male subfertility and abnormal head morphology. We think the different targeting strategy and mouse strain may cause this discrepancy. In Schneider’s and our current studies, the total motility abnormality of Cylc1-KO mice are not observed. We appreciate the suggestion of reviewer 1 to further examine the detailed parameters of motility such as VCL, VSL, and ALH. Given that the head deformation is the most obvious phenotype of Cylc1-KO mice and the focus of our study, we feel sorry that this detailed analysis of sperm motility was not performed in the current stage. Reviewer 1 also asked whether Cylc1-KO female mice are fertile or not. Given that Cylc1 is an X chromosome gene and Cylc1-KO (Cylc1-/Y) mice are severely subfertile, we do not obtain enough Cylc1-KO female mice to examine their fecundity. We also would like to thank reviewer 1 to point out several inaccurate descriptions.

      Reviewer #2 (Public Review):

      Summary:

      To verify the function of PT-associated protein CYLC1, the authors generated a Cylc1-KO mouse model and revealed that loss of cylicin-1 leads to severe male subfertility as a result of sperm head deformities and acrosome detachment. Then they also identified a CYLC1 variant by WES analysis from 19 infertile males with sperm head deformities. To prove the pathogenicity of the identified mutation site, they further generated Cylc1-mutant mice that carried a single amino acid change equivalent to the variant in human CYLC1. The Cylc1-mutant mice also exhibited male subfertility with detached acrosomes of sperm cells.

      We appreciate the comprehensive summary of reviewer 2.

      Strengths:

      The phenotypes observed in the Cylc1-KO mice provide strong evidence for the function of CYLC1 as a PT-associated protein in spermatogenesis and male infertility. Further mechanistic studies indicate that loss of cylicin-1 in mice may disrupt the connections between the inner acrosomal membrane and acroplaxome, leading to detached acrosomes of sperm cells.

      We appreciate the positive comment of reviewer 2.

      Weaknesses:

      The authors identified a missense mutation (c.1377G>T/p. K459N) from 19 infertile males with sperm head deformities. The information for the variant in Table 1 is insufficient to determine the pathogenicity and reliability of the mutation site. More information should be added, including all individuals in gnomAD, East Asians in gnomAD, 1000 Genomes Project for allele frequency in the human population; MutationTaster, M-CAP, FATHMM, and more other tools for function prediction. Then, the expression of CYLC1 in the spermatozoa from men with CYLC1 mutation should be explored by qPCR, Western blot, or IF staining analyses. Although 19 infertile males were found carrying the same missense mutation (c.1377G>T/p. K459N), their phenotypes are somewhat different. For example, sperm concentrations for individuals AAX765, BBA344, and 3086 are extremely low but this is not observed in other infertile males. Then, progressive motility for individuals AAT812, 3165, 3172, 3203, and 3209 are extremely low but this is also not observed in other infertile males. It is worth considering why different phenotypes are observed in probands carrying the same mutation.

      We appreciate the suggestion of reviewer 2. First, Table 1 shows the information of the variant identified in CYLC1 gene, including allele frequency in gnomAD and functional prediction by SIFT, PolyPhen-2, and CADD. Given that mutant mice is a gold standard to confirm the pathogenicity of a variant, we generate Cylc1-mutant mice and Cylc1-mutant mice exhibit male subfertility with sperm acrosome detachment. The animal evidence is much more solid than bioinformatics prediction to confirm the pathogenicity of the identified variant in the CYLC1 gene. Second, the expression of CYLC1 in the spermatozoa from patients have been examined by IF staining (Fig. 5B). Unfortunately, the patients declined to continue in the project to donate more semen for qPCR and Western blot analyses. Third, the reviewer 2 asks why not all patients with CYLC1 gene mutation show the identical phenotype. Although some patients exhibit low sperm count or reduced motility, sperm head deformities are the shared phenotype of 19 patients. Many factors, such as way of life, may affect sperm quality. Perfectly identical phenotype of all 19 patients carrying the CYLC1 mutation is idealistic and will not always happen in clinical diagnosis. We also appreciate other suggestions from reviewer 2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Major points:

      R1C1: I appreciate that the data are aligned, in some points, with related studies of this niche. However, it would help the reader to have this alignment explored more extensively in the Discussion as well.

      Answer: We acknowledge that the discussion would benefit from additional comparisons to the available datasets. We thus add the following comment after the first paragraph of the discussion: “Previous studies of the different sub-populations of SVZ progenitors were carried out using transcriptomic approaches based on the expression of various more or less specific markers. These approaches have made it possible to identify quiescent and activated neural stem cells as well as mature neuroblasts, but have been faced with the strong influence of the cell cycle on cell clustering. Indeed, neural progenitors in these studies cycling have been gathered in either “mitotic” clusters (Llorens et al. 2015, Zywitza et al. 2018, Cebrian et al. 2021) or “neural progenitor cells” clusters (Dulken et al. 2017) that had no clear biological significance and hindering identification of subtypes of SVZ cycling progenitors. Our study, combining, for the first time, characterization of Facs-isolated cells and an irradiation-based model of sequential regeneration, allowed to clearly distinguish the molecular profiles of TAP and iNB among cycling progenitors reflecting differences in their in vitro and in vivo respective potentials”.

      R1C2: The data on multilineage differentiation, both in culture and upon engraftment, would be greatly strengthened by quantification. What is the relative yield of TUJ1/DCX-positive cells versus the other marker combinations? Specifically regarding the multilineage differentiation in vitro - because different media conditions are used to generate each lineage, it may be difficult to determine relative yield. Could a differentiation system that allows production of all 3 lineages be used instead?

      If the fraction of non-DCX/TUJ1-labeled progeny is low, particularly in vivo, this might suggest that while multilineage differentiation is possible, it is a much less likely cellular state outcome than production of mature neuroblasts. Some suggested references with examples of the culture conditions, experimental conditions, and discussions highlighted in the public review: Culture conditions that allow simultaneous trilineage differentiation. PMID: 17615304 Influence of culture conditions on potency: similar to issues covered in PMID: 21549325.

      Answer: We agree with the reviewer that quantification of a multilineage differentiation in vitro would improve the characterization of the relative potencies of the different SVZ progenitor.

      According to PMID: 17615304 and PMID: 21549325, and in agreement with our own experience, the only culture condition that allows neurosphere-derived neural progenitors to differentiate in vitro into the three lineages is the removal of mitogens from the culture medium. However, this does not work on freshly isolated SVZ cells, which remain in an undifferentiated state in this condition.

      This is why we chose to use specific differentiation media for each of the 3 lineages as in Figure 1C. It is also for this reason that we performed as many experiments as possible in vivo rather than in vitro as in Figure S2. In the new version, we have added a quantitative analysis of stainings by antibodies against GFAP, CNPase or DCX of GFP-positive cells persisting at IS, where high number of grafted cells were found in Figure S2B. This was performed by using the NIS software measuring eGFP-, GFAP-, CNPase- and DCX-positive areas. The intersection between each marker and eGFP areas was then determined as a percentage of staining (Figure S2C). The results showed that approximately one third of GFP+ cells expressed GFAP or DCX. The quantitative analysis of CNPase expression was complicated by CNPase-positive host cells, but the stronger CNPase staining in eGFP-positive areas clearly revealed the expression of CNPase by a significant proportion of eGFP-positive cells.

      R1C3: Additionally, for claims similar to what is currently made in the text, it would be extremely valuable to confirm the purity of the sort for each population - for example by fixing and staining the sorted fraction with additional antibodies that confirm cell identity.

      Answer: We have previously shown in Daynac et al. 2013 that s-iNB expressed the neuroblast markers CD24 and DCX, but also markers of neural progenitors such as Mash1, a basic helix-loop-helix transcription factor. As suggested by the reviewer, we have further investigated the expression of other markers of neural progenitors by sorted cells. The results showed that the proportion of DLX2+ cells a marker of proliferating progenitors (Doetsch et al. 2002) was very high in aNSC/TAP (98%) and progressively decreased in iNB (82%) and mNB (25%). Similarly, the expression of the transcription factor SOX2 that plays an essential role in the maintenance of neural progenitors (PMID: 25126380) accounted for 78% of aNSC/TAP, 70% of iNB and 17% of mNB.

      Altogether, these new data confirmed the identity of the different cell populations and particularly that of iNB. They are commented at the beginning of the Results and shown in Figure S1.

      R1C4: Line 125: GFAP alone doesn't necessarily indicate a "conversion to NSCs" - this conclusion could be greatly strengthened by inclusion of more markers, particularly at the protein level, or cyto-architectural studies.

      Answer: We agree with the reviewer that GFAP expression alone is not sufficient to evidence the presence of NSC in the SVZ. We have thus modified the text accordingly: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with eGFP+s-iNB and eGFP+s-NSC/TAP (Fig. 1Db, Fig. 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R1C5: Could these cellular states be reflective of preferential translation of DCX? It would be very helpful to see the flow cytometry sort data for iNBs / mNBs used in Figure 6, particularly if these cells were also fixed and stained directly for DCX protein.

      Answer: As suggested by the reviewer, freshly FAC-sorted iNB and mNB were fixed and labelled with an anti-DCX monoclonal antibody after permeabilization. As shown in the figure below, we found a higher level of DCX expression in mNB than in iNB. Therefore, this result tends to indicate that the proliferation capacity is somehow related to the level of DCX expression. However, because of the relatively low importance of this result, we decided not to include them in the manuscript.

      Author response image 1.

      Modal histogram representation of DCX expression level in unstained, iNB and mNB cells determined by flow cytometry (FlowJo).

      <R1C6: Figure S8 is all zeroes, showing the GFP+Dcxhigh NBs do not retain proliferative capacity. But we don't get a direct experimental comparison to EGFPnegative/lowDcxlow iNB engraftment, which would strengthen the conclusions of the paper.

      Answer: Unfortunately, there is no method available to analyse the eGFPnegative/lowDcxlow iNB engraftment: by definition, these cells do not express eGFP and the use of a tracker is not appropriate for long periods of time — and thus a high number of cell divisions — after engraftment. However, to us, this control is not needed to conclude that GFP+Dcxhigh iNB have no (or at least a lower) stem cell potential in vivo considering that we have shown in Figure 1 and Table 1 that the whole iNB population is able to generate the different types of neural cells.

      R1C7: Transplant data in Table 1 - a relatively small proportion of transplant derived cells are in OB, etc. Given that A cells are thought to cycle at least once in vivo, is this expected?

      Answer: The reviewer is right considering that a relatively small proportion of transplant derived cells were found in the OB. However, we should consider that we used immunocompetent mice as receivers, which could have significantly reduced the engraftment efficiency, and the migration of engrafted cells outside the injection site.

      R1C8: A caveat is that there is not much functional testing of the proposed model, especially for the interconversion of iNB states suggested by the diagram in Figure 7. The text is relatively restrained in proposing this model, so it is reasonable to keep - but perhaps should be noted that this part of the model will need additional testing.

      Answer: Data presented in Figure 6 clearly suggest that Dcxhigh iNB have similar in vitro potential than Dcxlow iNB, whereas they don’t have such potential in vivo (Figure S10). This suggests that, providing they are in appropriate conditions, Dcxhigh iNB could reacquire stem/progenitor properties. However, we agree that this hypothesis requires further investigation. Therefore, as suggested by the reviewer, we have added in the Figure 7 legend: “Possible interconversion of iNB states would require further experimental confirmation.”

      Additional minor points:

      R1C9: Introduction: the SVZ is described as "the lateral wall" - however, several works in the mouse have also examined the medial wall and callosal roof, as cited later in the intro. Suggest rephrasing the second sentence (line 48) and later sentence (line 66) to clarify that "the SVZ" encompasses all of these subregions, they are not necessarily separate niches. Answer: As indicated by the reviewer, the SVZ encompasses distinct subdomains, with NSCs having a regional identity based on their location in the lateral or septal wall of the ventricle and generating different types of neuronal and glial progeny (PMID:34259628.). To address the reviewer concern about possible confusion and clearly indicate that SVZ encompass several subdomains, we have modified the sentence line 66 as follows: “Since then, the single cell RNA-sequencing has revolutionized the field and has made it possible to precisely elucidate the transcriptome of SVZ cells present in the LW and in the septal wall which also harbors NSC niches”.

      However, we did not modify the line 48, since in this sentence we just indicate that the largest neurogenic niche in the adult brain reside in the LW of the SVZ.

      R1C10: Line 77: "exposure" not "exposition"

      Answer: The error has been corrected in the revised manuscript.

      R1C11: As noted in the Public Review - the use of the term "D1/D2" cells seems likely to confuse readers who are also versed in dentate gyrus neurogenesis. Recommend removing this term from the manuscript.

      Answer: We agree that the D1/D2 terminology could bring confusion, D cells referring to Tanycytes in the hypothalamus. We now refer to iNB1 for DcxLow iNB and iNB2 for DcxHigh iNB in the revised manuscript.

      Reviewer 2

      Major comments:

      Lack of rigor

      R2C1: There is a lack of appropriate normalization controls for the microarray data. As there is a decreased level of transcription in quiescent NSCs, there needs to be a cell number control (spike-ins based on cell numbers). Without this normalization, the readout can be greatly skewed.

      Answer: We agree that qNSC are marked by a decreased level of transcription due to quiescence. To overcome this problem in the Clariom assays, we thus chose to calibrate each population, with a fixed amount of cRNA and cDNA using Hela cells as internal control. We totally agree that this method is not optimal but it appears to be efficient in the end. Indeed, it should be noticed that it has been adopted, thus with the same rigor, in other microarray studies published in the field (PMID: 24811379) and also on skeletal muscle cells (PMID: 29273087). Moreover, interestingly the transcriptomic signature of qNSC matches perfectly with those from other studies and particularly to those of related clusters in single cell experiments (including ours, Figure S5). This is probably linked to the fact that more importantly that the number of cells, the main characteristic of these cells is the lack of expression of genes involved in cell proliferation and metabolism. Whatever so, these data confirming previously published are not the main information of our manuscript, which is mainly dedicated to the characterization of proliferating cells, which is not impaired by our choices of normalization.

      R2C2: The absolute segregation of clusters in the single-cell analysis is currently entirely in agreement with the cell cycle stage. This suggests that in the author's analysis, the clustering in 3F is entirely shaped by the cell cycle, making that the defining characteristic of the author's definitions for their cell types. Has an analysis been done that regresses out cell cycle-associated genes to see if there are clusters for different cell states/types that are identified in the absence of cell cycle stage being the defining factor? (Barron and Li, 2016). For example, just as you would see a difference in cluster if you are a quiescent or activated NSC as compared to a neuroblast for example, even without the contribution of cell cycle. These are different cell types.

      Answer: We agree that cell cycle regression would theoretically allow for further discrimination between cycling cells along successive neurogenic stages. We have already performed regression using several methods, including regressing using S- and G2/M-score regression as indicated in the Seurat workflow, removing cell cycle-related PCs from UMAP calculation as used in the Cebrian-Sylla study, and using alternative gene sets such as the ones provided by the tricycle method (PMID: 35101061). These regression methods have all been used on our datasets, the original Cebrian-Sylla datasets and a combination of our datasets with the Cebrian-Sylla original datasets to increase cell number and clustering resolution. However, none of these methods modified the clustering of cycling cells.

      In fact, the strong influence of the cell cycle over clustering highlights the relevance of our depletion/replenishment approaches to decipher the molecular changes masked by the cell cycle, as discussed below.

      R2C3: The use of the DCX-CreERT2 line is a lineage tracing line. Once DCX is expressed, Cre recombines the DNA to allow for fluorescence. It is binary, on or off associated with DCX expression. And once on, it is always on, whether the cell is currently expressing DCX or not. As the authors had previously described a DCXlow condition, the eGFP- cells would not reflect DCXlow, but no DCX at all. And the eGFP+ cells may not be currently expressing DCX anymore. The authors should have used a system where the DCX promoter itself drives fluorescence.

      Answer: We took advantage of the DCX-CreERT2 line to demonstrate that some neural cells that have recently acquired DCX expression (i.e. eGFP+ iNB) could keep (or recover) the potential of neural progenitors in vitro. Of course, some of these GFP+ cells could have stopped to express DCX. This is probably the case when they differentiate into astrocytes and oligodendrocytes in vitro as shown in Figure 6.

      Whatever so, the use of the Dcx promoter as a direct driver of eGFP fluorescence would have totally impeded our capacity to demonstrate such changes in cell fate in vivo because of the impossibility to track oligodendrocytes or astrocytes derived from iNB because of the loss of Dcx expression.

      R2C4: The lack of analysis of images (differentiation, for example) limits the conclusions of the in-vitro data, and the images with unclear staining, limit the conclusions of the in-vivo experiments.

      Answer: This comment is similar to that of R1C2. We have now added a quantification in Figure S2.

      R2C5: The cited difference in splicing differences in cell types was interesting (though did not show up in the transcriptome enrichment analyses Fig S2) and would be something to further pursue, however, this was a very limited analysis. There was no further study of these splicing mediators beyond single-cell data.

      Answer: We now show enrichments of GO terms corresponding to mRNA splicing isoforms in the different types of sorted SVZ cells (Figure S4). This analysis clearly revealed that spliced genes in SVZ cells are mainly involved in neuron development and neurogenesis. Interestingly this also showed that qNSC logically differed from the other cell types by splicing concerning genes involved in mitosis and cell cycle, consistently with their quiescent state. More importantly, GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features. We agree with the reviewer that further analysis of splicing mediators would be very important for understanding molecular changes involved in neurogenesis. However, we think that it is largely beyond the scope of this study.

      R2C6: Fig 1C - Show values, not just pictures. You may need to shift your current differentiation paradigm to do so by removing growth factors instead of unique differentiation conditions.

      Answer: See the answer to R1C2.

      R2C7: Fig S1A - Stainings for GFAP and DCX are not clear. It is very hard to distinguish which cells are associated with these signals.

      Answer: This figure (now Figure S2A) shows an eGFP+iNB cell (white arrow) that has reached the rostral migratory stream and expressed DCX (inset a3), but not GFAP (inset a2). This is now indicated in the figure legend. We have also moved the arrow for more clarity.

      R2C8: Fig S1B2 - There is red staining everywhere, so it is very hard to see a specific CNPase signal.

      Answer: We have added a new figure (Fig S2B) distinguishing eGFP+CNPase+ cells (yellow arrows) from eGFP+CNPase- cells (white arrow).

      R2C9: Line 174 - It's the mRNA that you are detecting is being downregulated - be more specific as you are not showing protein downregulation.

      Answer: We specified, "encoding" a major splicing repressor in the Line 174 text to refer to the mRNA: “Interestingly, Ptbp1, encoding a major splicing repressor”.

      R2C10: Line 189 - text in this line have some clusters not shown in the figure - (clusters 6 and 15, DCX+ Ki67+ neuroblasts) - which would be an important thing to visualize. As is shown now, the authors are only showing that iNBs are similar to mitotic TAPs.

      Answer: Clusters 6 and 15 have been added to Figure S5.

      R2C11: Fig 3D-E - Why is cluster 17 called aNSCs (3E) when it has the highest GFAP (Fig 3D). Typically, the highest GFAP cells are qNSCs or astrocytes, not aNSCs.

      Answer: We previously reported that the level of gfap mRNA expression in neural stem cells (quiescent and activated) did not exactly reflect the amount of protein in these cells. This is the reason why we also used the Slc1a3 marker (Glast), which is highly expressed both at the RNA and protein levels in quiescent NSCs (Daynac et al. 2013).

      R2C12: Line 216 - You said in line 216 cluster 13 were astrocytes, then you said in line 227 that cluster 13 was s-qNSC. Which is it?

      Answer: This is due to the fact that we performed two distinct analyses.

      In the first one (line 216), cells were scored based on datasets provided by Cebrian et al. with one dataset containing genes enriched in astrocytes, and another one, genes enriched in quiescent B-cells. Therefore, cluster 13 was shown to contain 73% cells expressing astrocyte markers, whereas cluster 4 gathered cells expressing both qNSC (B-cells, 48%) and astrocyte (52%) genes.

      In the second one (line 227), cells were scored using our transcriptomic signatures of FAC-sorted SVZ cells, which do not include differentiated astrocytes. We demonstrated that the cluster 13 cells only expressed s-qNSC genes.

      R2C13: Line 214 - While other clusters were all named in lines 214-221 that were then further discussed in lines 227-230, clusters 15 and 19 were not. You associate both of those clusters with s-iNB - what was it associated with in the above section?

      Answer: Lines 219-221 have been reworded as follows: Clusters 10, 5, 15, 12, and 8 were defined as cycling progenitors based on the expression of proliferative markers such as Top2a, Mki67, Ascl1. Clusters 1, 3, 7 and 9 were identified as mNB due to the loss of Mki67, Top2 a and Ascl1 expressions and the expression of Robo2 and Dcx. Cluster 19 that have lost Ascl1 but still expressing Top2a and Mki67 together with Robo2 and Dcx appears at the transition between iNB and mNB.

      R2C14: Fig 3I-J - 5 days after irradiation, I would like to see from tissue slices how many cells are dividing compared to 1day post-irradiation and controls. In other paradigms, such as temozolomide experiments (Kalamakis et al), by 5 days we should see less cells in quiescence and more of those quiescent cells exiting quiescence into the cell cycle. Why would there be more cells in quiescence in the irradiated brain? Even if they are radiation resistant, the base number should be comparative between controls and irradiated, which is not what you show in Fig 3I-J. And R2C14)

      Line 234-235 - the text says normalized to numbers of qNSCs which is supposed to be the same (which I agree should be the same). However, your graph in 3I and J shows more qNSCs in irradiated conditions, which would influence greatly and is currently hard to interpret.

      Answer: As stated by the reviewer, there is no increase in the absolute number of quiescent cells in the irradiated SVZ. The reconstitution of SVZ cell populations after 4Gy irradiation has already been studied by our group (Daynac et al. 2013, see Fig. 3F), showing that s-iNB and s-mNB are still under-represented after 5 days, while qNSC are in similar numbers as in unirradiated SVZ. Therefore, this led to an over-representation of quiescent cells and early SVZ progenitors in Figure 3J as compared in Figure 3I.

      R2C15: Fig 6A - the authors show a significant difference in neurospheres between eGFP- (DCX-) and eGFP+ (DCX+) iNBs - as would be expected as DCX suggests a further commitment towards neurogenic fates, yet your population doubling is the same.

      Answer: To determine the population doublings, the medium was changed and cells numbered every 7 days. This condition masked the differences between two cell populations reaching the plateau phase at different time, explaining why eGFP-iNB and eGFP+iNB could not be clearly distinguished by this technique.

      R2C16: Fig 6C - Differentiation data (in-vitro) should be quantified in 6C, just as was mentioned for 1C. These values should be done for both of the populations (eGFP-iNB, and eGFP+iNB) and not just compared to the previous pictures which were on total iNB. Again, numbers are required, not just picture examples.

      Answer: Quantitative data have been given in Figure 6D showing that approximately 60-80% of cells eGFP+iNB are able to differentiate in either neurons, oligodendrocytes or astrocytes. We did not analyze the differentiation of eGFP-iNB since it would not add any supplementary information.

      R2C17: Fig S8 - The authors did not show if the lack of engraftment of eGFP+ cells is due to the transplant (previously you showed only 2/3 worked in a similar paradigm). It would be helpful if the authors would have some means to visualize the DCX low cells to confirm they worked as before in the transplantation (another color? Another type of mouse (Thy1 antigen differences)?) Answer: Unfortunately, the Thy1 antigen has not been documented in mouse subventricular zone progenitors, but only in neurons (PMID: 10813783). Thy1 antigen has also been described in bipotent glial progenitor cell (GCP) from the developing human brain giving rise to oligodendrocytes (PMID: 36931245).

      As shown, in Figure S10 we have performed 5 grafts with s-iNB eGFP+ cells, 2 alone and 3 mixed with eGFP- cells and never found any eGFP+ cells 5 weeks after grafting. Moreover, we did not find any eGFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (These data have been added to Figure S10). As compared to the results described in Figure 1 this clearly shows that iNB DCXhigh are not able to generate persistent cells in the grafted brains similarly as mNB.

      R2C18: Fig S8 - Why were there no eGFP cells even at the injection site? DCX expression promotes migration, indeed DCX expression becomes very high in cells in the SVZ as they begin to exit to go to the migratory stream. If one didn't see migration, one would expect you would still have survival. Currently, the authors show no cells at 5 weeks, however, they would need to show earlier timepoints as well to determine what is happening with these cells. It is possible these GFP+ cells are not even expressing DCX anymore (see above).

      Answer: As stated above, we did not find any GFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (see Figure S10).

      R2C19: Line 320 - the authors suggest a subpopulation of NEURONS continues to divide and cite 2 works from the 1990s showing proliferating SVZ cells can differentiate. Our knowledge of this system has come dramatically forward since the 1990s as well as technologically, and to date, neurons have not been shown to divide.

      Answer: We apologize for this lack of clarity, as we agree that neurons correspond to differentiated non-cycling cells, but we used the terminology used in these articles. The incorrect part of the sentence Line 320 has thus been deleted from the text.

      R2C20: Fig 7 - The whole figure is based on changing levels of RSR genes which were not confirmed in any way to be involved in any of these stages, only descriptively in single-cell analyses.

      Answer: As stated above, in our opinion, further characterization of the involvement of RSR genes in neurogenesis is largely beyond the scope of our manuscript. Nevertheless, we think that the role of RSR genes in neurogenesis is an important question that should be addressed in further studies.

      Overstatement of findings

      R2C21: Fig 1 - Authors did not compare all cell types in each condition but made overstatements about their relationships to each other between graphs. There should also be separate graphs showing all cell types at 4% and a separate one at 20%.

      Answer: In the revised version, Figure 1 shows the graph comparing all cell types at 4%O2 and a separate one at 20% as requested by the reviewer. The graphs clearly shows that 4%O2 promotes iNB proliferation compared to the 20% condition.

      R2C22: Fig 1D-b2 - Why does DCX look nuclear? One can't say they are only NSCs if they are GFAP as astrocytes also express GFAP. The authors would need another marker to separate those populations. In the text, the authors say expressing GFAP (line 124) which means NSC, but then in line 127 expressing GFAP means astrocytes - which further shows you need additional markers to validate those 2 different cell types. Answer: DCX nuclear translocation has been shown to improve cellular proliferation (PMID:32050972).

      As indicated in R1C4. The text has been modified as follows: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with s-iNB eGFP+ and s-NSC/TAP eGFP+ (Fig. 1Db, 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R2C23: Fig S2 - The transcriptome signature for s-iNBs is very similar to s-TAP, basically suggesting the iNBs are further along in cell cycle.

      Answer: This is now the Figure S3. Functional enrichment analysis of individual transcriptome signatures revealed that both s-TAP and s-iNB are enriched in genes related to the cell cycle although with different GO terms enrichments. Indeed, s-TAP are enriched in genes related to G1, G1/S and S phase (but with low -log10 adjusted p-values) and s-iNB with genes related to cell cycle mitosis and M phase (with high -log10 adjusted p-values).

      We have previously shown that around 33 % s-iNB have DNA content>2N, versus around 26% of s-TAP and s- aNSC (Daynac et al. 2013), which is in accordance with GO terms enrichments. However, these data have also shown that most s-iNB and s-TAP are in G1, indicating that siNB are not just further along mitosis than TAP.

      Moreover, our transcriptomic data clearly show that s-iNB are distinct from s-TAP: 1) according to principal component analyses (Figure 2B et C), the whole transcriptome of s-TAP is closer to that of s-aNSCs than to that of s-iNB (10% variations in PCA2), 2) the heatmap in Figure 2D shows that they have different RSR genes expression profiles, 3) the new Figure S4 shows that GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features, and 5) Figure S5 shows that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look closer to aNSC. Finally, scRNAsq cell clusters related to s-iNB are distinct from the cluster related to s-TAP as shown 1) in Figure 3D and 2) in Figure 4.

      R2C24: Fig 3 - The lack of information about timepoint 0 after irradiation, and when proliferation and cell cycle entry begins again following irradiation, limits our interpretation of the single-cell irradiated data.

      Answer: We have previously reported the relative abundance of each SVZ neural progenitors in the young adult mouse brain in several papers. Particularly, we based our interpretation on our SVZ irradiation model reported in Daynac et al. 2013 demonstrating a radio resistance of qNSC re-entering into the cell cycle as early as 2 days after 4Gy irradiation successively regenerating aNSC, TAP then iNB and mNB.

      R2C25: Fig S3 - These results effectively show that the s-aNSCs and s-TAPs are actually less specific when compared to that same identity in other studies, and that the iNBs are most similar to mitotic TAPs. This supports what was mentioned above, which is that the transcriptional signatures are very similar between the s-TAPs and i-NBs, showing these are not a unique cell state, but just a bit further along mitosis within the TAP cell state.

      Answer: This is now the Figure S5. In this figure, we show that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look like closer to aNSC. As indicated above in R2C23, s-iNB are not just a bit further along mitosis within the TAP cell state. Indeed, we give several data showing that s-iNB and s-TAP have different transcriptomic profiles.

      R2C26: Fig 4B - The focus on Ptbp1 as being associated with the iNB cluster border to mNB is expected as all previous studies of Ptbp1 have focused on its role in the progression of other cell types through the cell cycle, its control of cell cycle regulators, and a cell cycle mRNA regulon (Monzon-Casanova et al, 2018, 2019, 2020). This further supports these analyses are specifically defined by cell cycle stages.

      Answer: We totally agree that Ptbp1 expression distinguishes cycling cells from postmitotic neuroblasts in accordance with previously published paper, and that based on this unique gene we cannot find any differences between cycling cells ie. aNSC, TAP and iNB. However, as shown in the manuscript and stated above (R2C23 and 25), these cells can be distinguished by their respective expression of many other genes, including other RSR genes.

      R2C27: Line 281-282 is an overstatement - the authors suggest that this is a new type of cycling neural progenitor - when all studies point to it being the end of mitosis TAPs as they go on their way to mNBs. This clearly shows a trajectory and not a defined, binary cell type.

      Answer: We agree with this statement that the use of the word "type" was misleading, and changed it to "stage" to better reflect that s-iNB are a distinct stage along the differentiation process according to our pseudotime cell-trajectory analysis.

      Author response image 2.

      Pseudotime analysis using Monocle 3 (excluding the cluster 13 corresponding to astrocytes and starting from s-qNSC) revealed two branches starting from s-TAP, one towards cell cycle the other towards neuronal differentiation.

      minor comments:

      R2C28: Fig 3D - For ease, please define what you called the clusters in 3D - not just cluster numbers

      Answer: We chose not to call the clusters in 3D because their identification (Group names) is based on data presented after in Figures 3E, F and G.

      R2C29: Fig 3E-F - Show astrocytes by text in 3E and F

      Answer: As discussed above, astrocytes cannot be shown in these figures because they are based on our signatures which did not include astrocyte signature.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents a valuable approach to exploring CD4+ T-cell response in mice across stimuli and tissues through the analysis of their T-cell receptor repertoires. The authors use a transgenic mouse model, in which the possible diversity of the T-cell receptor repertoire is reduced, such that each of a diverse set of immune exposures elicits more detectably consistent T-cell responses across different individuals. However, whereas the proposed experimental system could be utilized to study convergent T-cell responses, the analyses done in this manuscript are incomplete and do not support the claims due to limitations in the statistical analyses and lack of data/code access.

      We worked to address the reviewers' concerns below, point-by-point.

      All data on immune repertoires are deposited here: https://figshare.com/articles/dataset/Convergence_plasticity_and_tissue_residence_of_regulatory_and_effector_T_cell_response/22226155

      We added the Data availability statement to the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the alpha chain TCR landscape in conventional vs regulatory CD4 T cells. Overall I think it is a very well thought out and executed study with interesting conclusions. The authors have investigated CDR3 alpha repertoires coupled with a transgenic fixed CDR3beta in a mouse system.

      Strengths:

      • One of a kind evidence and dataset.

      • State-of-the-art analyses using tools that are well-accepted in the literature.

      • Interesting conclusions on the breadth of immune response to challenges across different types of challenges (tumor, viral and parasitic).

      Thank you for the positive view.

      Weaknesses:

      • Some conclusions regarding the eCD4->eTreg transition are not so strong using only the data.

      The overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • Some formatting issues.

      We are working on the manuscript to correct minor errors and formatting.

      Reviewer #2 (Public Review):

      This study investigates T-cell repertoire responses in a mouse model with a transgenic beta chain, such that all T-cells in all mice share a fixed beta chain, and repertoire diversity is determined solely by alpha chain rearrangements. Each mouse is exposed to one of a few distinct immune challenges, sacrificed, and T-cells are sampled from multiple tissues. FACS is used to sort CD4 and Treg cell populations from each sample, and TCR repertoire sequencing from UMI-tagged cDNA is done.

      Various analyses using repertoire diversity, overlap, and clustering are presented to support several principal findings: 1) TCR repertoires in this fixed beta system have highly distinct clonal compositions for each immune challenge and each cell type, 2) these are highly consistent across mice, so that mice with shared challenges have shared clones, and 3) induction of CD4-to-Treg cell type transitions is challenge-specific.

      The beta chain used for this mouse model was previously isolated based on specificity for Ovalbumin. Because the beta chain is essential for determining TCR antigen specificity, and is highly diverse in wildtype mice, I found it surprising that these mice are reported to have robust and consistently focused clonal responses to very diverse immune challenges, for which a fixed OVA-specific beta chain is unlikely to be useful. The authors don't comment on this aspect of their findings, but I would think it is not expected a priori that this would work. If this does work as reported, it is a valuable model system: due to massively reduced diversity, the TCR repertoire response is much more stereotyped across individual samples, and it is much easier to detect challenge-specific TCRs via the statistics of convergent responses.

      This was to some extent expected, since these mice live almost normally and have productive adaptive immune responses and protection. In real life, there are frequent TCR-pMHC interactions where the TCR-alpha chain dominates (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5701794/; https://pubmed.ncbi.nlm.nih.gov/37047500/). On the fixed TCR-beta background this mechanics starts working full-fledged, essentially substituting TCR-beta diversity, at the extent of relatively simplified TCRab repertoire and probably higher cross-reactivity.

      We agree that this is a valuable model, for sure, and indicated this in the last sentence of our Discussion. Now we are also adding this point to the abstract.

      While the data and analyses present interesting signals, they are flawed in several ways that undermine the reported findings. I summarize below what I think are the most substantive data and analysis issues.

      (1) There may be systematic inconsistencies in repertoire sampling depth that are not described in the manuscript. Looking at the supplementary tables (and making some plots), I found that the control samples (mice with mock challenge) have consistently much shallower sampling-in terms of both read count and UMI count-compared with the other challenge samples. There is also a strong pattern of lower counts for Treg vs CD4 cell samples within each challenge.

      The immune response of control mice is less extensive, as it should be. Just like the fact that the number of Tregs in tissues is lower than CD4, this is normal. So this all follows the expectations. But please note that we were very accurate everywhere with appropriate data normalisation, using all our previous extensive experience (https://pubmed.ncbi.nlm.nih.gov/29080364/).

      In particular (now adding more relevant details to Methods):

      For diversity metrics calculations, we randomly sampled an equal number of 1000 UMI from each cloneset. Samples with UMI < 700 were excluded from analysis.

      For amino acid overlap metrics calculations, we selected top-1000 largest clonotypes from each cloneset. Samples with clonotype counts < 700 were excluded from analysis.

      For nucleotide overlaps metrics calculations (eCD4-eTreg), we selected top-100 clonotypes from each cloneset. Samples with clonotypes < 100 were excluded from analysis.

      The top N clonotypes were selected as the top N clonotypes after randomly shuffling the sequences and aligning them in descending order. This was done in order to get rid of the alphabetical order for clonotypes with equal counts (e.g. count = 1 or 2).

      Downsampling was carried out using software vdjtools v.1.2.1.

      (2) FACS data are not reported. Although the graphical abstract shows a schematic FACS plot, there are no such plots in the manuscript. Related to the issue above, it would be important to know the FACS cell counts for each sample.

      Yes, we agree that this is valuable information that should be provided. Unfortunately, this data has not been preserved.

      (3) For diversity estimation, UMI-wise downsampling was performed to normalize samples to 1000 random UMIs, but this procedure is not validated (the optimal normalization would require downsampling cells). What is the influence of possible sampling depth discrepancies mentioned above on diversity estimation? All of the Treg control samples have fewer than 1000 total UMIs-doesn't that pose a problem for sampling 1000 random UMIs?

      Indeed, I simulated this procedure and found systematic effects on diversity estimates when taking samples of different numbers of cells (each with a simulated UMI count) from the same underlying repertoire, even after normalizing to 1000 random UMIs. I don't think UMI downsampling corrects for cell sampling depth differences in diversity estimation, so it's not clear that the trends in Fig 1A are not artifactual-they would seem to show higher diversity for control samples, but these are the very same samples with an apparent systematic sampling depth bias.

      We evaluated this approach through all our work, and summarised in the ref: https://pubmed.ncbi.nlm.nih.gov/29080364/. Altogether, normalising to the same count of randomly sampled UMI seems to be the best approach (although, preferably, the initial sequencing depth should be essentially higher for all samples than the sampling threshold used). Initial sorting of identical numbers of cells and ideally uniform library preparation and sequencing is generally not realistic and does not work in the real world, while UMI downsampling does the same work much better.

      (4) The Figures may be inconsistent with the data. I downloaded the Supplementary Table corresponding to Fig 1 and made my own version of panels A-C. This looked quite different from the diversity estimations depicted in the manuscript. The data does not match the scale or trends shown in the manuscript figure.

      There was a wrong column for Chao1, now correcting. Also, please note that we only used samples with > 700 UMI. Supplementary Table now corrected accordingly. Also, please note that Figure 1 shows the results for lung samples only.

      (5) For the overlap analysis, a different kind of normalization was performed, but also not validated. Instead of sampling 1000 UMIs, the repertoires were reduced to their top 1000 most frequent clones. It is not made clear why a different normalization would be needed here. There are several samples (including all Treg control samples) with only a couple hundred clones. It's also likely that the noted systematic sampling depth differences may drive the separation seen in MDS1 between Treg and CD4 cell types. I also simulated this alternative downsampling procedure and found strong effects on MDS clustering due to sampling effects alone.

      That’s right, for the overlap analysis (which values are mathematically proportional to the clonotype counts in both compared repertoires, so the difference in the counts causes major biases) the right way to do it is to choose the same number of clonotypes. See Ref. https://pubmed.ncbi.nlm.nih.gov/29080364/.

      We kept only samples with > 700 for the overlap analyses. Some relatively poor samples are present in all challenges, while MDS1 localization has clear reproducible logic, so we are confident in these results.

      It is not made clear how the overlap scores were converted to distances for MDS. It's hard to interpret this without seeing the overlap matrix.

      This is a built-in feature in VDJtools software (https://pubmed.ncbi.nlm.nih.gov/26606115/). See also here: https://vdjtools-doc.readthedocs.io/en/master/overlap.html.

      (6) The cluster analysis is superficial, and appears to have been cherry-picked. The clusters reported in the main text have illegibly small logo plots, and no information about V/J gene enrichments. More importantly, as the caption states they were chosen from the columns of a large (and messier-looking) cluster matrix in the supplementary figure based on association with each specific challenge. There's no detail about how this association was calculated, or how it controlled for multiple tests. I don't think it is legitimate to simply display a set of clusters that visually correlate; in a sufficiently wide random matrix you will find columns that seem to correlate with any given pattern across rows.

      Particular CDR3 sequences and VJ segments do not mean much for the results of this manuscript. Logos are given just for visual explanation of how the consensus motifs of the clusters look like.

      We now add two more Supplementary Tables and a Supplementary Figure with full information about clusters.

      We disagree that the Supplementary Figure 1 (representing all the clusters) looks “messy”. Vice versa, it is surprisingly “digital”, showing the clear patterns of responses and homings. This becomes clear if you visually study it for a while. But yes, it is too big to let the reader focus on this or that aspect. That is why we need to select TCR clusters to illustrate this or that aspect discussed in the work, but they were selected from the overall already structured picture.

      (7) The findings on differential plasticity and CD4 to Treg conversion are not supported. If CD4 cells are converting to Tregs, we expect more nucleotide-level overlap of clones. This intuition makes sense. But it seems that this section affirms the consequent: variation in nucleotide-level clone overlap is a readout of variation in CD4 to Treg conversion. It is claimed, based on elevated nucleotide-level overlap, that the LLC and PYMT challenges induce conversion more readily than the other challenges. It is not noted in the textual interpretations, but Fig 4 also shows that the control samples had a substantially elevated nucleotide-level overlap. There is no mention of a null hypothesis for what we'd expect if there was no induced conversion going on at all. This is a reduced-diversity mouse model, so convergent recombination is more likely than usual, and the challenges could be expected to differ in the parts of TCR sequence space they induce focus on. They use the top 100 clones for normalization in this case, but don't say why (this is the 3rd distinct normalization procedure).

      Your point is absolutely correct: “This is a reduced-diversity mouse model, so convergent recombination is more likely than usual”. Distinct normalisation procedure was required to focus on the most expanded clonotypes to avoid the tail of (presumably cross-reactive) and identical TCRs present in all repertoires in these limited-repertoire mice. So we downsampled as strictly as possible to minimise this background signal of nucleotide overlap, and only this strict downsampling to the top-100 clonotypes allowed us to visualise the difference between the challenges. This is a sort of too complicated explanation that would overload the manuscript. But your comments and our answers will be available to the reader who wants to go into all the details.

      The observed (at this strict downsampling) overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts in interpretations based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      Although interpretations of the reported findings are limited due to the issues above, this is an interesting model system in which to explore convergent responses. Follow-up experimental work could validate some of the reported signals, and the data set may also be useful for other specific questions.

      Yes, thank you for your really thorough analysis. We fully agree with your conclusion.

      Reviewer #3 (Public Review):

      Nakonechnaya et al present a valuable and comprehensive exploration of CD4+ T cell response in mice across stimuli and tissues through the analysis of their TCR-alpha repertoires.

      The authors compare repertoires by looking at the relative overlap of shared clonotypes and observe that they sometimes cluster by tissue and sometimes by stimulus. They also compare different CD4+ subsets (conventional and Tregs) and find distinct yet convergent responses with occasional plasticity across subsets for some stimuli.

      The observed lack of a general behaviour highlights the need for careful comparison of immune repertoires across cell subsets and tissues in order to better understand their role in the adaptive immune response.

      In conclusion, this is an important paper to the community as it suggests several future directions of exploration.

      Unfortunately, the lack of code and data availability does not allow the reproducibility of the results.

      Thank you for your positive view.

      All data on immune repertoires are deposited here: https://figshare.com/articles/dataset/Convergence_plasticity_and_tissue_residence_of_regulatory_and_effector_T_cell_response/22226155

      We added the Data availability statement to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • In the manuscript at "yielding 13,369 {plus minus} 1,255 UMI-labeled TCRα cDNA molecules and 3233 {plus minus} 310 TCRα CDR3 clonotypes per sample" I'm not sure how can there be fewer unique DNA molecules than clonotypes in each sample.

      That was our mistake for sure, now corrected.

      • In the manuscript at "This indicates that the amplitude and focused nature of the effector and regulatory T cell response in lungs is generally comparable."

      I'm not sure it's possible to conclude that a drop in diversity in all conditions necessarily signals a focused nature. Since at this stage, the nature of the colotypes was not compared between conditions, it is not possible to claim a focused nature of the response.

      We have softened the wording:

      "This could indicate that the amplitude and focused nature of the effector and regulatory T cell response in lungs is generally comparable."

      • What are your thoughts on why there is such a large overlap between Treg and Teff in the Lung in control? For some replicates it is almost as much as a post-LLC challenge!

      There is some natural dispersion in the data, which is generally expectable. The overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • In the manuscript at "These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches" I'm not sure we can conclude this from the Figure. Wouldn't you expect the samples to be grouped by color (the different challenges)? Maybe I'm not understanding the sentence!

      This is a different story, about resident Tregs, irrespective of the challenge.

      The whole explanation is here in the text:

      “Global CDR3α cluster analysis revealed that characteristic eTreg TCR motifs were present in distinct lymphatic tissues, including spleen and thymus, irrespective of the applied challenge (Supplementary Fig. 1). To better illustrate this phenomenon, we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge. This analysis demonstrated close proximity of eTreg repertoires obtained from the same lymphatic tissues upon all lung challenges and across all animals (Fig. 5a, b). These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches. Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells (Fig. 5c, d).”

      And in the abstract:

      “Additionally, our TCRα repertoire analysis demonstrated that distinct antigenic specificities are characteristic for eTreg cells residing in particular lymphatic tissues, regardless of the challenge, revealing the homing-specific, antigen-specific resident Treg populations. ”

      • In the manuscript at " Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells ":

      5b and 5d seem to have the same pattern: Spleen and MLN group together, AxLN and IgLN together and thymus is separate. Do you mean to say that the groups are more diffuse? I feel like the pattern really is the same and it's likely due to some noise in the data…

      Yes, we just mean here that eTreg groups are less diffuse - means more convergent.

      • I'm not sold on the eCD4 to eTreg conversion evidence. Why only limit to the top 100 clones? The top 1000 clones were used in previous analyses! Moreover, the authors claim that calculating relative overlap (via F2) of matching CDR3+V+J genes is evidence of a conversion between eCD4 and eTreg. I think to convince myself of a real conversion, I would track the cells between groups, unfortunately, I'm not sure how to track this.. Maybe looking at the thymus population? For example, what is the overlap in the thymus vs. after the challenge? I don't have an answer on how to verify but I feel that this conclusion is a bit on the weaker end.

      Distinct normalisation procedure was required to focus on the most expanded clonotypes to avoid the tail of (presumably cross-reactive) and identical TCRs present in all repertoires in these limited-repertoire mice. So we downsampled as strictly as possible to minimise this background signal of nucleotide overlap, and only this strict downsampling to the top-100 clonotypes allowed us to visualise the difference between the challenges. This is a sort of too complicated explanation that would overload the manuscript. But your comments and our answers will be available to the reader who wants to go into all the details.

      The observed (at this strict downsampling) overlaps between the top-nucleotide clones in both LLC and PYMT challenges are prominently above the average, and this result is reproducible in lungs and skin, so we have no doubts in interpretations based on these data. Further experiments with different methods, including tracking the clonal fates, should clarify and confirm/correct/disprove our findings.

      • There is a nuance in the analysis between Figure 3 and Figure 5 which I think I am not grasping. Both Figures use the same method and the same data but what is different? I think the manuscript would benefit from making this crystal clear. The conclusions will likely be more evident as well!

      As explained in the text and above, on Figure 5 “we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge.”

      The idea of this mini-chapter of the manuscript is to reveal tissue-resident Tregs, distinct for distinct tissues, resident there in all these mice, irrespectively of the challenge we applied. And they are really there (!).

      • Do the authors plan to share their R scripts?

      All calculations were performed in VDJtools. R was only used to build figures. Corrected this in Methods.

      Minor typos and formatting issues to address:

      • Typo in Figure 2a the category should read "worm" instead of "warm"

      Corrected.

      • Figure 2a heatmap is missing a color bar indicating the value ranges

      The detailed information can be found in additional Supplementary materials.

      • Figure 2f is never mentioned in the manuscript!

      Corrected.

      • "eTreg repertoire upon lung challenge is reflected in the draining lymph node" - the word upon is of a lower size

      Corrected.

      • The authors should make the spelling of eTreg uniform across the manuscript (reg in subscript vs just lower case letters. Same goes for CDR3a vs CDR3\alpha

      Corrected.

      • Figure 4a-d p-values annotations are not shown. Is it because they are not significant?

      Corrected.

      • The spelling of FACS buffer should be uniform (FACs vs FACS, see methods)

      Corrected.

      • In the gating strategy, I would make a uniform annotation for the cluster of differentiation, for example, "CD44 high" vs "CD44^{hi}", pos vs + etc.

      Corrected.

      • Citation for MIGEC software (if available) is missing from methods

      Present in the text so probably sufficient.

      Reviewer #2 (Recommendations For The Authors):

      I noticed the data was made available via Figshare in the preprint, but there is no data availability statement in the current ms.

      We provided Data availability statement.

      The methods state that custom scripts were written to perform the various analyses. Those should be made available in a code repository, and linked in the ms.

      All calculations were performed in VDJtools. R was only used to build figures. Corrected this in Methods.

      The title mentioned "TCR repertoire prism", so I thought "prism" was the name of a new method or software. But then the word "prism" didn't appear anywhere in the ms.

      We just mean viewing or understanding something from a different perspective or through a lens that reveals different aspects or nuances.

      Figure 1D lacks an x-axis label.

      Worked on the figures in general.

      Reviewer #3 (Recommendations For The Authors):

      • The paper is very concise, possibly a bit too much. It could use additional explanations to properly affirm its relevance, for example:

      why the choice of fixing the CDR3beta background?

      To make repertoire more similar across the mice, and to track all the features of repertoire using only one chain.

      to what it is fixed?

      As explained in Methods:

      “C57BL/6J DO11.10 TCRβ transgenic mice (kindly provided by Philippa Marrack) and crossed to C57BL/6J Foxp3eGFP TCRa-/- mice.”

      What do you expect to see and not to see in this specific system and why it is important?

      As stated above: we expected repertoire to be more similar across the mice, and it is important to find antigen-specific TCR clusters across mice, and to be able to track all the features of the TCR repertoire using only one chain.

      Does this system induce more convergent responses? If so, can we extrapolate the results from this system to the full alpha-beta response?

      Such a model, compared to conventional mice, is much more powerful in terms of the ability of monitoring convergent TCR responses. At the same time, it behaves natural, mice live almost normally, so we believe it reflects natural behaviour of the full fledged alpha-beta T cell repertoire.

      • Is the lack of similarity of other tissues to Lung/MLN due to a lack of a response?

      As indicated in the title of the corresponding mini-chapter: “eTreg repertoire upon lung challenge is reflected in the draining lymph node”. And conclusion of this mini-chapter is that “these results demonstrate the selective tissue localization of the antigen-focused Treg response. ”

      Can you do a dendrogram like 2a for the other tissues to better clarify what is going on there? There is space in the supplementary material.

      We built lots of those, but in such single dimension mostly they are less informative compared to 2D MDS plots.

      • Figure 5 seems a bit out of place as it looks more related to Figure 2. It could maybe be integrated there, sent to supplementary or become Figure 3?

      This is a different story, about resident Tregs, irrespective of the challenge.

      The whole explanation is here in the text:

      “Global CDR3α cluster analysis revealed that characteristic eTreg TCR motifs were present in distinct lymphatic tissues, including spleen and thymus, irrespective of the applied challenge (Supplementary Fig. 1). To better illustrate this phenomenon, we performed MDS analysis of CDR3α repertoires for distinct lymphatic tissues, excluding the lungs due to their otherwise dominant response to the current challenge. This analysis demonstrated close proximity of eTreg repertoires obtained from the same lymphatic tissues upon all lung challenges and across all animals (Fig. 5a, b). These results indicate that distinct antigenic specificities are generally characteristic for eTreg cells that preferentially reside in particular lymphatic niches. Notably, the convergence of lymphatic tissue-resident TCR repertoires was less prominent for the eCD4 T cells (Fig. 5c, d).”

      And in the abstract:

      “Additionally, our TCRα repertoire analysis demonstrated that distinct antigenic specificities are characteristic for eTreg cells residing in particular lymphatic tissues, regardless of the challenge, revealing the homing-specific, antigen-specific resident Treg populations. ”

      • Have you explored more systematically the role of individual variability? If you stratify by individual, do you observe any trend? If not this is also an interesting observation to highlight and discuss.

      This is inside the calculations and figures/ one dot = 1 mice, so this natural variation is there inside.

      • Regarding the MDS plots: why are 2 dimensions the right amount? Maybe with 3, you can see both tissue specificity and stimuli contributions. Can you do a stress vs # dimensions plot to check what should be the right amount of dimensions to more accurately reproduce the distance matrix?

      Tissue specificity and stimuli contribution is hard to distinguish without focussing on appropriate samples, as we did on Fig. 3 and 5. The work is already not that simple as is, and attempting to analyse this in multidimensional space is far beyond our current abilities. But this is an interesting point for future work, thank you.

      • Figure 2: A better resolution is needed in order to properly resolve the logo plots at the bottom.

      Yes, we worked on Figures, and also provide new Supplementary Figure with all the logos.

      • No code or data are made available. There is also a lack of supplementary figures that complement and expand the results presented in the main text.

      We believe that the main text, although succinct, contains lots of information to analyse and conclusions (preliminary) to make. So we do not see it rational to overload it further.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Overall Response

      We thank the reviewers for reviewing our manuscript, recognizing the significance of our study, and offering valuable suggestions. Based on the reviewer’s comments and the updated eLife assessment, we would like to chose the current version of our manuscript as the Version of Record of our manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model which takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input, than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter.

      The authors control for some degree of redundancy between their training and test sets, both using sequence and structural similarity criteria. This is more careful than can be said of most works in the field of PPI prediction.

      As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      The authors check for performance drops when the test set is restricted to pairs of interacting proteins such that the chain pair is not similar as a pair (in sequence or structure) to a pair present in the training set. A more challenging test would be to restrict the test set to pairs of interacting proteins such that none of the chains are separately similar to monomers present in the training set. In the case of structural similarity (TM-scores), this would amount to replacing the two "min"s with "max"s in Eq. (4). In the case of sequence similarity, one would simply require that no monomer in the test set is in any MMSeqs2 cluster observed in the training set. This may be an important check to make, because a protein may interact with several partners, and/or may use the same sites for several distinct interactions, contributing to residual data leakage in the test set.

      We thank the reviewer for the suggestion! In the case of protein-protein prediction (“0D prediction”) or protein-protein interfacial residue prediction(“1D prediction”), we think making none of the chains in the test set separately similar to monomers in the training set is necessary, as the reviewer pointed out that a protein may interact with several partners, and may even use the same sites for the interactions. Since the task of this study is predicting the inter-protein residue-residue contacts (“2D prediction”), even though a protein uses the same site to interact with different partners, as long as the interacting partners are different, the inter-protein contact maps would be different. Therefore, we don’t think that in our task, making this restriction to the test set is necessary.

      The training set of AFM with v2 weights has a global cutoff of 30 April 2018, while that of PLMGraph-Inter has a cutoff of March 7 2022. So there may be structures in the test set for PLMGraph-Inter that are not in the training set of AFM with v2 weights (released between May 2018 and March 2022). The "Benchmark 2" dataset from the AFM paper may have a few additional structures not in the training or test set for PLMGraph-Inter. I realize there may be only few structures that are in neither training set, but still think that showing the comparison between PLMGraph-Inter and AFM there would be important, even if no statistically significant conclusions can be drawn.

      We thank the reviewer for the suggestion! It is not enough to only use the date cutoff to remove the redundancy, since similar structures can be deposited in the PDB in different dates. Because AFM does not release the PDB codes of its training set, it is difficult for us to totally remove the redundancy. Therefore, we think no rigorous conclusion can be drawn by including these comparisons in the manuscript. Besides, the main point of this study is to demonstrate that the integration of multiple protein language models using protein geometric graphs can dramatically improve the model performance for inter-protein contact prediction, which can provide some important enlightenments for the future development of more powerful protein complex structure prediction methods beyond AFM, rather than providing a tool which can beat AFM at this moment. We think including too many stuffs in the comparison with AFM may distract the readers. Therefore, we choose to not include these comparisons in the manuscript.

      Finally, the inclusion of AFM confidence scores is very good. A user would likely trust AFM predictions when the confidence score is high, but look for alternative predictions when it is low. The authors' analysis (Figure 6, panels c and d) seems to suggest that, in the case of heterodimers, when AFM has low confidence, PLMGraph-Inter improves precision by (only) about 3% on average. By comparison, the reported gains in the "DockQ-failed" and "precision-failed" bins are based on knowledge of the ground truth final structure, and thus are not actionable in a real use-case.

      We agree with the reviewer that more studies are needed for providing a model which can well complement or even beat AFM. The main point of this study is to demonstrate that the integration of multiple protein language models using protein geometric graphs can dramatically improve the model performance for inter-protein contact prediction, which can provide some important enlightenments for the future development of more powerful protein complex structure prediction methods beyond AFM.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep learning approach for predicting inter-protein contacts, which is crucial for understanding proteinprotein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      We thank the reviewer for recognizing the strengths of our work!

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • I recommend renaming the section "Further potential redundancies removal between the training and the test" to "Further potential redundancies removal between the training and the test sets"

      Changed.

      • In lines 768-769, the sentence seems to end prematurely in "to use more stringent threshold in the redundancy removal"

      Corrected.

      • In Eq. (4), line 789, there are many instances of dashes that look like minus signs, creating some confusion.

      Corrected.

      • I think I may have mixed up figure references in my first review. When I said (Recommendations to the authors): "p. 22, line 2: from the figure, I would have guessed "greater than or equal to 0.7", not 0.8", I think I was referring to what is now lines 423-424, referring to what is now Figure 5c. The point stands there, I think.

      Corrected.

      • A couple of new grammatical mishaps have been introduced in the revision. These could be rectified.

      We carefully rechecked our revisions, and corrected the grammatical issues we found.

      Reviewer #2 (Recommendations For The Authors):

      Most of my concerns were resolved through the revision. I have only one suggestion for the main figure.

      The current scatter plots in Figure 2 are hard to understand as too many different methods are abstracted into a single plot with multiple colors. I would suggest comparing their performances using box plot or violin plot for the figure 2.

      We thank the reviewer for the suggestion! In the revision, we tried violin plot, but it does not look good since too many different methods are included in the plot. Besides, we chose the scatter plot as it can provide much more details. We also provided the individual head-to-head scatter plots as supplementary figures, we think which can also be helpful for the readers to capture the information of the figures.


      The following is the authors’ response to the original reviews.

      Overall Response

      We would like to thank the reviewers for reviewing our manuscript, recognizing the significance of our study, and offering valuable suggestions. We have carefully revised the manuscript to address all the concerns and suggestions raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! In the revision, to emphasize the performance of PLMGraph-Inter using the predicted monomer structures, we moved the evaluation results based on the predicted monomer from the supplementary to the main text (see the new Table 1 and Figure 2 in the revised manuscript) and re-organized the two subsections “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets” and “Impact of the monomeric structure quality on contact prediction” in the main text.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! It is worth noting that AFM automatically searches monomer templates in the prediction, and when we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) at least 20 templates were identified (AFM employed the top 20 templates in the prediction), and 87.8% of the targets employed the native templates (line 455-462 in page 25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”). Therefore, we think Figure 6 not Figure S5 (the original Figure S2) shows a fairer comparison. Besides, it is also worth noting the targets used in this study would have a large overlap with the training set of AlphaFold-Multimer, since AFM used all protein complex structures in PDB deposited before 2018-04-30 in the model training, which would further cause the overestimation of the performance of AFM (line 450-455 in page 24-25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).

      To mimic the performance of AlphaFold2 in real practice and produce predicted monomeric structures with more diverse qualities, we only used the MSA searched from Uniref100 protein sequence database as the input to AlphaFold2 and set to not use the template (line 203~210 in page 12 in the subsection of “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets”). Since some of the predicted monomer structures are of bad quality, it is reasonable that the performance of PLMGraph-Inter drops when the predicted monomeric structures are used in the prediction. We provided a detailed analysis of the impact of the monomeric structure quality on the prediction performance in the subsection “Impact of the monomeric structure quality on contact prediction” in the main text.

      We provided the analysis of the AFM multimer confidence values (“iptm + ptm”) in the revision (Figure 6, Figure S5 and line 495-501 in page 27 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion, and we are sorry for the confusion! In the AFM runs to predict protein complex structures, we used the default setting of AFM which automatically searches monomer templates in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions (AFM only used the top 20 templates), and 87.8% of the targets employed the native template. We further clarified this in the revision (line 455462 in page 25 in the subsection of “Comparison of PLMGraph-Inter with AlphaFoldMultimer”). We also included the mean precisions of AFM (top-50 contact prediction) in the revision (Table S5 and line 483-484 in page 26 in the subsection of “Comparison of PLMGraph-Inter with AlphaFold-Multimer”).

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number would be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      Author response image 1.

      The head-to-head comparison of qualities of complex predicted by AlphaFold-Multimer (2.2.0) and AlphaFold-Multimer (2.3.2) for each target PPI.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. During the revision, we also tested the new version of AFM on the datasets of HomoPDB and HeteroPDB, but we found the performance difference between the two versions of AFM is actually very little (see the figure above, not shown in the main text). One reason might be that some targets in HomoPDB and HeteroPDB are redundant with the training sets of the two version of AFM. Since our test sets would have more overlaps with the training set of AFM V3, we keep using the AFM V2 weights in this study.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We thank the reviewer for the suggestion! In the revision, we explored the performance of PLMGraph-Inter when using different thresholds of fold similarity scores of interacting monomers to further remove potential redundancies between the training and test sets (i.e. redundancy in structure ) (line 353-386 in page 19-21 in the subsection “Ablation study”; line 762-797 in page 41-43 in the subsection “Further potential redundancies removal between the training and the test”). We found that for heteromeric PPIs (targets in HeteroPDB), the further removal of potential redundancy in structure has little impact on the model performance (~3%, when TM-score 0.5 is used as the threshold). However, for homomeric PPIs (targets in HomoPDB), the further removal of potential redundancy in structure significantly reduce the model performance (~18%, when TM-score 0.5 is used as the threshold) (see Table 2). One possible reason for this phenomenon is that the binding mode of the homomeric PPI is largely determined by the fold of its monomer, thus the does not generalize well on targets whose folds have never been seen during the training.

      Whether the deep learning model can generalize well on targets with novel folds is a very interesting and important question. We thank the reviewer for pointing out this! However, to the best of our knowledge, this question has rarely been addressed by previous studies including AFM. For example, the Benchmark 2 dataset is prepared by ClusPro TBM (bioRxiv 2021.09.07.459290; Proteins 2020, 88:1082-1090) which uses a sequence-based approach (HHsearch) to identify templates not structure-based. Therefore, we don’t think this dataset is non-redundant in structure.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model.

      Reviewer #1 (Recommendations For The Authors):

      Some sections of the paper use technical terminology which limits accessibility to a broad audience. An obvious example is in the section "Results > Overview of PLMGraph-Inter > The residual network module": the average eLife reader is not a machine learning expert and might not be familiar with a "convolution with kernel size of 1 * 1". In general, the "Overview of PLMGraph-Inter" is a bit heavy with technical details, and I suggest moving many of these to Methods. This overview section can still be there but it should be shorter and written using less technical language.

      We thank the reviewer for the suggestion! We moved some technical details to the Methods section in the revision (line 184-185 in page 11; line 729-735 in page 39).

      List of typos and minor issues (page number according to merged PDF):

      • p. 3. line -3: remove "to"

      Corrected (line 36, page 3)

      • p. 5, line 7: "GINTER" should be "GLINTER"

      Corrected (line 64, page 5)

      • p. 6, line -4: "Given structures" -> "Given the structures"

      Corrected (line 95, page 6)

      • p. 6, line -2: "with which encoded"... ?

      We rephrased this sentence in revision. (line 97, page 6)

      • p. 9, line 1: "principal" -> "principle"

      Corrected (line 142, page 9)

      • p. 13, line 1: "has" -> "but have"

      Corrected (line 231, page 13)

      • p. 14, lines 6-7: "As can be seen from the figure that the predicted" -> "As can be seen from the figure, the predicted"

      We rephrased this paragraph, and the sentence was deleted in the revision (line 257-259 in page 15).

      • p. 18, line 1: the "five models" are presumably models a-e? If so, say "of models a-e"

      Corrected (line 310, page 17)

      • p. 22, line 2: from the figure, I would have guessed "greater than or equal to 0.7", not 0.8

      Based the Figure 3C, we think 0.8 is a more appropriate cutoff, since the precision drops significantly when the DTM-score is within 0.7~0.8.

      • p. 23, lines 2-3: "worth to making" -> "worth making"

      Corrected (line 443, page 24)

      • p. 24, line -5: "predict" -> "predicted"

      Corrected (line 484, page 26)

      • p 28, line -5: Please clarify what you mean by "We doubt": are you saying that you don't think these rearrangements exist in nature? If not, then reword.

      Corrected (line 566, page 30)

      • Figure 2, panel c, "DCPred" in the legend should be "CDPred"

      Corrected

      • Figures 3 and 5: Please improve the y-axis title in panel C. "Percent" of what?

      We changed the “Percent” to “% of targets” in the revision.

      We thank the reviewer for carefully reading our manuscript!

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding proteinprotein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We have carefully revised the manuscript to address the reviewer’s concerns.

      (1) The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! The “40 sequence identity” is a widely used threshold to remove redundancy when evaluating deep-learning based protein-protein interaction and protein complex structure prediction methods, thus we also chose this threshold in our study (bioRxiv 2021.10.04.463034, Cell Syst. 2021 Oct 20;12(10):969-982.e6). In the revision, we explored whether PLMGraph-inter can keep its performance when more stringent thresholds (30%,20%,10%) is applied (line 353386 in page 20-21 in the subsection of “Ablation study” and line 762-780 in page 40 in the subsection of “Further potential redundancies removal between the training and the test”). The result shows that even when using “10% sequence identity” as the threshold, mean precisions of the predicted contacts only decreases by ~3% (Table 2).

      (2) Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-tohead scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision (Figure S1 and Figure S2 in the supplementary).

      (3) The authors claim that PLMGraph-Inter is complementary to AlphaFoldmultimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We included this comparison in the revision (Figure S7).

      (4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We analyzed the relationship between the prediction performance and the depth of MSA in the revision (Figure S4 and Line 253264 in page 15 in the subsection of “Evaluation of PLMGraph-Inter on HomoPDB and HeteroPDB test sets” and line 798-806 in page 42 in the subsection of “Calculating the normalized number of the effective sequences of paired MSA”).

      Reviewer #2 (Recommendations For The Authors):

      I have the following suggestions in addition to the public review.

      (1) Overall, the manuscript is well-written; however, I recommend a careful review for minor grammar corrections to polish the final text.

      We carefully checked the manuscript and corrected all the grammar issues and typos we found in the revision.

      (2) It would be better to indicate that single sequence embeddings, MSA embeddings, and structure embeddings are ESM-1b, ESM-MSA & PSSM, and ESM-IF when they are first mentioned in the manuscript e.g. single sequence embeddings from ESM-1b, MSA embeddings from ESM-MSA and PSSM, and structural embeddings from ESM-IF.

      We revised the manuscript according to the reviewer’s suggestion (line 86-88 in page 6; line 99-101 in page 7).

      (3) I don't think "outer concatenation" is commonly used. Please specify whether it's outer sum, outer product, or horizontal & vertical tiling followed by concatenation.

      It is horizontal & vertical tiling followed by concatenation. We clarified this in the revision (line 129-130 in page 8).

      (4) 10th sentence on the page where the Results section starts, please briefly mention what are the other 2D pairwise features.

      We clarified this in the revision (line 131-132 in page 8).

      (5) In the result section, it states edges are defined based on Ca distances, but in the method section, it says edges are determined based on heavy atom distances. Please correct one of them.

      It should be Ca distances. We are sorry for the carelessness, and we corrected this in the revision (line 646 in page 35).

      (6) For the sentence, "Where ESM-1b and ESM-MSA-1b are pretrained PLMs learned from large datasets of sequences and MSAs respectively without label supervision,", I'd suggest replacing "without label supervision" with "with masked language modeling tasks" for clarity.

      We revised the manuscript according to the reviewer’s suggestion (line 150-151 in page 9).

      (7) It would be better to briefly explain what is the dimensional hybrid residual block when it first mentioned.

      We explained the dimensional hybrid residue block when it first mentioned in the revision (line 107 in page 7).

      (8) Please include error bars for the bar plots and standard deviations for the tables.

      We thank the reviewer for the suggestion! Our understanding is the error bars and standard deviations are very informative for data which follow gaussian-like distributions, but our data (precisions of the predicted contacts) are obviously not this type. Most previous studies in protein contact prediction and inter-protein contact prediction also did not include these in their plots or tables. In our case, including these elements requires a dramatic change of the styles of our figures and tables, but we would like to not change our figures and tables too much in the revision.

      (9) Please indicate whether the chain break is considered to generate attention map features from ESM-MSA-1b. If it's considered, please specify how.

      The paired sequences were directly concatenated without using any letter to connect them, which means we did not consider chain break in generating the attention maps from ESM-MSA-1b.

    1. Author Response

      Reviewer #1 (Public Review):

      Response to reviewer 1 comments on “weaknesses”:

      “A weakness in the approach is the use of genetic models that do not offer complete deletion of the prolactin receptor from targeted neuronal populations...”

      We acknowledge that neither model used provided a complete deletion of the prolactin receptor (Prlr) from the targeted neuronal populations. We suspect that incomplete deletion of targeted genes is not uncommon in these sort of studies, but this remains the best approach to addressing our question, and we believe we have been thorough and transparent in reporting the degree of deletion observed. We thought we had appropriately discussed the implications of the low proportion of Kiss1 cells still expressing Prlr, but will certainly revisit to ensure it is discussed thoroughly. This does not detract, however, from the key conclusion that prolactin action is necessary for full suppression of fertility in lactation in the mouse.

      “Results showing no impact of progesterone on LH secretion during lactation are surprising, given the effectiveness of progesterone-containing birth control in lactating women...”

      We think that this comment misrepresents what has been done in our study. We did not report a lack of impact of progesterone, as exogenous progesterone was never administered to mice. We did, however, give mifepristone as a progesterone receptor antagonist to determine whether endogenous progesterone contributed to the suppression of kisspeptin neuronal activity. We found that mifepristone, at levels sufficient to terminate pregnancy, had no effect on pulsatile LH secretion in lactating mice. This is consistent with our prior observation that progesterone levels are low in mouse lactation, suggesting that progesterone does not contribute significantly to the suppression of kisspeptin neuronal activity during lactation in the mouse. We agree with the reviewer that if we had given exogenous progesterone, it likely would result in suppression of pulsatile LH secretion (as it does in women). Indeed, in other work, we have found that progesterone administration profoundly suppresses activity of the kisspeptin neurons in mice (https://doi.org/10.1210/en.2019-00193). But this was not the point of the present experiment. We will review how we have described this experiment to ensure that this is absolutely clear.

      “While the authors assert their findings may reflect an important role for prolactin in lactational infertility in other mammalian species, that remains to be seen….”

      We acknowledge that our study cannot address whether prolactin is necessary for the suppression of lactation in other mammalian species. We hope our data may stimulate a re-examination of this question in other species, however, as some of the prior methodology (such as using pharmacological suppression of prolactin) may have had off target effects that confound interpretation. We thought that this point was discussed appropriately in the manuscript but we will certainly check and make sure this is addressed suitably.

    1. Author Response

      The following is the authors’ response to the current reviews.

      eLife assessment

      This important study used Voltage Sensitive Dye Imaging (VSDI) to measure neural activity in the primary visual cortex of monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. The authors show convincingly that the initial effect of the mask ran counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. They interpret these results in terms of influences from the receptive field center, and although an alternative view that emphasizes the role of the receptive field surround also seems reasonable, this study stands as an interesting and important contribution to our understanding of mechanisms of visual perception.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings.

      The authors have done a good job responding to the main points of my previous review. One important question remains, as stated in that review:

      "My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry at al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here."

      Their rebuttal of my first review is not convincing -- I still believe that surround influences are important and perhaps predominant in determining the outcome of the experiments. This is particularly clear for the "paradoxical" dynamics that they observe, which seem exactly to reflect the behavior of the surround.

      The authors' arguments to the contrary are based on three main points. First, their stimuli cover the center and surround, unlike those of many previous experiments, so they argue that this somehow diminishes the impact of the surround. But the argument is not accompanied by data showing the effects of center stimuli alone or surround stimuli alone. Second, their model -- a normalization model -- does not need surround influences to account for the masking effect. Third, they cite human psychophysical masking results from their collaborators (Sebastian et al 2017), but do not cite an equally convincing demonstration that surround contrast creates potent orientation selective masking when presented alone (Petrov et al 2005, https://doi.org/10.1523/JNEUROSCI.2871-05.2005).

      At the end of the day, these issues will be resolved by further experiments, not argumentation. The paper stands as an excellent contribution, but it might be wise for the authors to be less doctrinaire in their interpretations.

      We thank the reviewer for their positive comments and constructive criticism. In general, we agree with the reviewer’s comments. Importantly, we do not claim that there is no effect from the surround. What we say in the discussion is:

      “Because our targets are added to the background rather than occluding it, it is likely that a significant portion of the behavioral and neural masking effects that we observe come from target-mask interactions at the target location rather than from the effect of the mask in the surround.”

      We still stand by this assessment. We also make the point that, at least within the framework of our delayed normalization model, there is no need for the normalization mechanism to extend beyond the center mechanism to account for our results, and even if the normalization mechanism is somewhat larger than the center, the overlap region at the center would still have a large contribution to the modulations. Overall, we agree that these issues will be need to be resolved by future experiments.

      For the reasons discussed in our previous reply, we disagree with the reviewers’ statement “…this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature”. For similar reasons we disagree with the statement “It is nice to see that VSDI results square well with those from prior extracellular recordings”.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      We thank the reviewer for their positive comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      None, except perhaps for a more balanced representation of the "surround" possibility in the Discussion. The Petrov et al paper (https://doi.org/10.1523/JNEUROSCI.2871-05.2005) should be considered and cited.

      As discussed above, we believe that our discussion of possible contribution from the surround is balanced. While the paper by Petrov et al is interesting, the stimuli used to study the surround effects are quite different (e.g., gap between center and surround, and the sharp edge of the surround inner boundary) so direct comparison with our results is not possible.

      Reviewer #2 (Recommendations For The Authors):

      The authors have addressed the questions/suggestions I raised in my review.


      The following is the authors’ response to the original reviews.

      We thank the reviewers for their helpful comments and suggestions.

      eLife assessment

      This is an important contribution that extends earlier single-unit work on orientation-specific center-surround interactions to the domain of population responses measured with Voltage Sensitive Dye (VSD) imaging and the first to relate these interactions to orientation-specific perceptual effects of masking. The authors provide convincing evidence of a pattern of results in which the initial effect of the mask seems to run counter to the behavioral effects of the mask, a pattern that reversed in the latter phase of the response. It seems likely that the physiological effects of masking reported here can be attributed to previously described signals from the receptive field surround.

      We thank the reviewers for bringing up the relation of our results to findings from previous orientation-specific center-surround interactions studies. In our final manuscript, we added a paragraph discussing this important issue. Briefly, for multiple reasons, we believe that the orientation-dependent behavioral and neural masking effects that we observe are unlikely to depend on previously described center-surround interactions in V1. First, in human subjects, perceptual similarity masking effects are almost entirely accounted for by target-mask interactions at the target location and are recapitulated when the mask has the same size and location as the target (Sebastian et al 2017). Second, in our computational model, the effect of mask orientation on the dynamics of the response are qualitatively the same if the mask is restricted to the size and location of the target while mask contrast is increased (Fig. 8 – figure supplement 3). Third, in our model, the results are qualitatively the same when the spatial pooling region for the normalization signal is the same as that for the excitation signal (Fig. 8 – figure supplement figure 1). These considerations suggest that center-surround interactions may not be necessary for neural and behavioral similarity masking effects with additive targets.

      We would also like to point out some key differences between the stimuli that we use and the ones used in most previous center-surround studies. First, in our experiments, the target and the mask were additive, while in most previous center-surround studies the target occludes the background. Such studies therefore restrict the mask effect to the surround, while in our study we allow target-mask interactions at the center. Second, most center-surround studies have a sharp-edged target/surround, while in our experiments no sharp edges were present. Unpublished results from our lab suggest that such sharp edges have a large impact on V1 population responses. A third key difference is that our stimuli were flashed for a short interval of 250 ms corresponding to a typical duration of a fixation in natural vision, while most previous center-surround studies used either longer-duration drifting stimuli or very short-duration random-order stimuli for reverse-correlation analysis.

      In addition, we would like to emphasize that our results go beyond previous studies in two important ways. First, we study the effect of similarity masking in behaving animals and quantitatively compare the effect of similarity masking on behavior and physiology in the same subjects and at the same time. Second, VSD imaging allows us to capture the dynamics of superficial V1 population responses over the entire population of millions of neurons activated by the target at two important spatial scales. Such results therefore complement electrophysiological studies that examine the activity of a very small subset of the active neurons.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear account of some interesting work. The experiments and analyses seem well done and the data are useful. It is nice to see that VSDI results square well with those from prior extracellular recordings. But the work may be less original than the authors propose, and their overall framing strikes me as odd. Some additional clarifications could make the contribution more clear.

      Please see our reply above regarding the agreement with previous studies and framing.

      My reading is that this is primarily a study of surround suppression with results that follow pretty directly from what we already know from that literature, and although they engage with some of the literature they do not directly mention surround suppression in the text. Their major effect - what they repeatedly describe as a "paradoxical" result in which the responses initially show a stronger response to matched targets and backgrounds and then reverse - seems to pretty clearly match the expected outcome of a stimulus that initially evokes additional excitation due to increased center contrast followed by slightly delayed surround suppression tuned to the same peak orientation. Their dynamics result seems entirely consistent with previous work, e.g. Henry et al 2020, particularly their Fig. 3 https://elifesciences.org/articles/54264, so it seems like a major oversight to not engage with that work at all, and to explain what exactly is new here.

      We thank the reviewer for the pointing out this previous work which we now cite in the final version of the manuscript. For the reasons discussed above, while this study is interesting and related to our work, we believe that our results are quite distinct.

      • In the discussion (lines 315-316), they state "in order to account for the reduced neural sensitivity with target-background similarity in the second phase of the response, the divisive normalization signal has to be orientation selective." I wonder whether they observed this in their modeling. That is, how robust were the normalization model results to the values of sigma_e and sigma_n? It would be useful to know how critical their various model parameters were for replicating the experimental effects, rather than just showing that a good account is possible.

      Thank you for this suggestion. In the final manuscript we include a supplementary figure that shows how the model’s predictions are affected by the orientation tuning and spatial extent of the normalization signal, and by the size and contrast of the mask (Fig. 8 – figure supplement 1-4).

      • The majority of their target/background contrast conditions were collected only in one animal. This is a minor limitation for work of this kind, but it might be an issue for some.

      We agree that this is a limitation of the current study. These are challenging experiments and we were unable to collect all target/background contrast combinations from both monkeys. However, in the common conditions, the results appear similar in the two animals, and the key results seem to be robust to the contrast combination in the animal in which a wider range of contrast combinations was tested. We added these points to the discussion in the final manuscript.

      • The authors point out (line 193-195) that "Because the first phase of the response is shorter than the second phase, when V1 response is integrated over both phases, the overall response is positively correlated with the behavioral masking effect." I wonder if this could be explored a bit more at the behavioral level - i.e. does the "similarity masking" they are trying to explain show sensitivity to presentation time?

      We agree that testing the effect of stimulus duration on similarity masking is interesting, but unfortunately, it is beyond the scope of the current study. We would also like to point out that the duration of the presentation was selected to match the typical time of fixation during natural behaviors, so much shorter or much longer stimulus durations would be less relevant for natural vision.

      • From Fig. 3 it looks like the imaging ROI may include some opercular V2. If so, it's plausible that something about the retinotopic or columnar windowing they used in analysis may remove V2 signals, but they don't comment. Maybe they could tell us how they ensured they only included V1?

      We thank the reviewer for this comment. As part of our experiments, we extract a detailed retinotopic map for each chamber, so we were able to ensure that the area used for the decoding analysis lays entirely within V1. We now incorporate this information in the final manuscript (Fig. 3 – figure supplement 1).

      • In the discussion (lines 278-283) they say "The positive correlation between the neural and behavioral masking effects occurred earlier and was more robust at the columnar scale than at the retinotopic scale, suggesting that behavioral performance in our task is dominated by columnar scale signals in the second phase of the response. To the best of our knowledge, this is the first demonstration of such decoupling between V1 responses at the retinotopic and columnar scales, and the first demonstration that columnar scale signals are a better predictor of behavioral performance in a detection task." I am having trouble finding where exactly they demonstrate this in the results. Is this just by comparison of Figs. 4E,K and 5E,K? I may just be missing something here, but the argument needs to be made more clearly since much of their claim to originality rests on it.

      We thank the reviewer for this comment. In the final manuscript we are more explicit when we discuss this point and refer to the relevant panels in Figs. 4, 5 and their figure supplements. To substantiate this key claim, we also report the timing of the transition between the two phases in all temporal correlation panels and report the neural-behavioral correlation for the integration period.

      Reviewer #2 (Public Review):

      Summary

      In this experiment, Voltage Sensitive Dye Imaging (VSDI) was used to measure neural activity in macaque primary visual cortex in monkeys trained to detect an oriented grating target that was presented either alone or against an oriented mask. Monkeys' ability to detect the target (indicated by a saccade to its location) was impaired by the mask, with the greatest impairment observed when the mask was matched in orientation to the target, as is also the case in human observers. VSDI signals were examined to test the hypothesis that the target-evoked response would be maximally suppressed by the mask when it matched the orientation of the target. In each recording session, fixation trials were used to map out the spatial response profile and orientation domains that would then be used to decode the responses on detection trials. VSDI signals were analyzed at two different scales: a coarse scale of the retinotopic response to the target and a finer scale of orientation domains within the stimulus-evoked response. Responses were recorded in three conditions: target alone, mask alone, and target presented with mask. Analyses were focused on the target evoked response in the presence of the mask, defined to be the difference in response evoked by the mask with target (target present) versus the mask alone (target absent). These were computed across five 50 msec bins (total, 250 msec, which was the duration of the mask (target present trials, 50% of trials) / mask + target (target present trials, 50% of trials). Analyses revealed that in an initial (transient) phase the target evoked response increased with similarity between target and mask orientation. As the authors note, this is surprising given that this was the condition where the mask maximally impaired detection of the target in behavior. Target evoked responses in a later ('sustained') phase fell off with orientation similarity, consistent with the behavioral effect. When analyzed at the coarser scale the target evoked response, integrated over the full 250 msec period showed a very modest dependence on mask orientation. The same pattern held when the data were analyzed on the finer orientation domain scale, with the effect of the mask in the transient phase running counter to the perceptual effect of the mask and the sustained response correlating the perceptual effect. The effect of the mask was more pronounced when analyzed at the scale.

      Strengths

      The work is on the whole very strong. The experiments are thoughtfully designed, the data collection methods are good, and the results are interesting. The separate analyses of data at a coarse scale that aggregates across orientation domains and a more local scale of orientation domains is a strength and it is reassuring that the effects at the more localized scale are more clearly related to behavior, as one would hope and expect. The results are strengthened by modeling work shown in Figure 8, which provides a sensible account of the population dynamics. The analyses of the relationship between VSDI data and behavior are well thought out and the apparent paradox of the anti-correlation between VSDI and behavior in the initial period of response, followed by a positive correlation in the sustained response period is intriguing.

      Points to Consider / Possible Improvements

      The biphasic nature of the relationship between neural and behavioral modulation by the mask and the surprising finding that the two are anticorrelated in the initial phase are left as a mystery. The paper would be more impactful if this mystery could be resolved.

      We thank the reviewer for the positive comments. In our view, while our results are surprising, there may not be a remaining mystery that needs to be resolved. As our model shows, the biphasic nature of V1’s response can be explained by a delayed orientation-tuned gain control. Our results are consistent with the hypothesis that perception is based on columnar-scale V1 signals that are integrated over an approximately 200 ms long period that incorporates both the early and the late phase of the response, since such decoded V1 signals are positively correlated with the behavioral similarity masking effect (Fig. 5D, J; Fig. 5 – figure supplement 1). We now explain this more clearly in the discussion of our final manuscript.

      The finding is based on analyses of the correlation between behavior and neural responses. This appears in the main body of the manuscript and is detailed in Figures S1 and S2, which show the correlation over time between behavior and target response for the retinotopic and columnar scale.

      One possible way of thinking of this transition from anti- to positive correlation with behavior is that it might reflect the dynamics of a competitive interaction between mask and target, with the initial phase reflecting predominantly the mask response, with the target emerging, on some trials, in the latter phase. On trials when the mask response is stronger, the probability of the target emerging in the latter phase, and triggering a hit, might be lower, potentially explaining the anticorrelation in the initial phase. The sustained response may be a mixture of trials on which the target response is or is not strong enough to overcome the effect of the mask sufficiently to trigger target detection.

      It would, I think, be worth examining this by testing whether target dynamics may vary, depending on whether the monkey detected the target (hit trials) or failed to detect the target (miss trials). Unless I missed it I do not think this analysis was done. Consistent with this possibility, the authors do note (lines 226-229) that "The trajectories in the target plus mask conditions are more complex. For example, when mask orientation is at +/- 45 deg to the target, the population response is initially dominated by the mask, but then in mid-flight, the population response changes direction and turns toward the direction of the target orientation." This suggests (to this reviewer, at least) that the emergence of a positive correlation between behavioral and neural effects in the latter phase of the response could reflect either a perceptual decision that the target is present or perhaps deployment of attention to the location of the target.

      It may be that this transition reflected detection, in which it might be more likely on hit trials than miss trials. Given the SNR it would presumably be difficult to do this analysis on a trial-by-trial basis, but the hit and miss trials (which make each make up about 1/2 of all trials) could be averaged separately to see if the mid-flight transition is more prominent on hit trials. If this is so for the +/- 45 degree case it would be good to see the same analysis for other combinations of target and mask. It would also be interesting to separate correct reject trials from false alarms, to determine whether the mid-flight transition tends to occur on false alarm trials.

      If these analyses do not reveal the predicted pattern, they might still merit a supplemental figure, for the sake of completeness.

      We thank the reviewer for suggesting this interesting possibility. The original analysis in the manuscript was based on both correct and incorrect trials, raising the possibility that our results reflect some contribution from decision- and/or attention-related signals rather than from low-level nonlinear encoding mechanisms in V1 that we postulate in our model (Fig. 8). To explore this possibility, we re-examined our results while excluding error trials. We found that our key results from Figs 4 and 5 – namely that there is an early transient phase in which the neural and behavioral similarity effects are anti-correlated, and a later sustained phase in which they are positively correlated – hold even for the subset of correct trials, reducing the possibility that decision/attention-related signals play a major role in explaning our results. We now include the results of this analysis as a supplementary figure in the final manuscript (Fig. 4 – figure supplement 2). While there may be some interesting differences in the response dynamics between correct and incorrect trials, the current study was not designed to address this question and the large number of conditions and small number of repeats that it necessitated make this data set suboptimal for examining these phenomena.

      References

      Sebastian S, Abrams J, Geisler WS. 2017. Constrained sampling experiments reveal principles of detection in natural scenes. Proc Natl Acad Sci U S A 114: E5731-e40

    1. Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      Comments on revised version:

      Regarding the Reviewer's 1 comment on the architecture details, I have now understood that the precise architecture (number/type of layers, activation functions, pooling operations, skip connections, upsampling choice...) might have remained relatively hidden to the authors themselves, as the U-net is built automatically by the fast.ai library from a given classical choice of encoder architecture (ResNet34 and ResNet101 here) to generate the decoder part and skip connections.

      Regarding the Major point 1, I raised the question of the generalisation potential of the method. I do not think, for instance, that the optimal number of frames to use, nor the optimal choice of their time-shift with respect to the division time (t-n, t+m) (not systematically studied here) may be generic hyperparameters that can be directly transferred to another setting. This implies that the method proposed will necessarily require re-labeling, re-training and re-optimizing the hyperparameters which directly influence the network architecture for each new dataset imaged differently. This limits the generalisation of the method to other datasets, and this may be seen as in contrast to other tools developed in the field for other tasks such as cellpose for segmentation, which has proven a true potential for generalisation on various data modalities. I was hoping that the authors would try themselves testing the robustness of their method by re-imaging the same tissue with slightly different acquisition rate for instance, to give more weight to their work.

      In this regard, and because the authors claimed to provide clear instructions on how to reuse their method or adapt it to a different context, I delved deeper into the code and, to my surprise, felt that we are far from the coding practice of what a well-documented and accessible tool should be.

      To start with, one has to be relatively accustomed with Napari to understand how the plugin must be installed, as the only thing given is a pip install command (that could be typed in any terminal without installing the plugin for Napari, but has to be typed inside the Napari terminal, which is mentioned nowhere). Surprisingly, the plugin was not uploaded on Napari hub, nor on PyPI by the authors, so it is not searchable/findable directly, one has to go to the Github repository and install it manually. In that regard, no description was provided in the copy-pasted templated files associated to the napari hub, so exporting it to the hub would actually leave it undocumented.

      Regarding now the python notebooks, one can fairly say that the "clear instructions" that were supposed to enlighten the code are really minimal. Only one notebook "trainingUNetCellDivision10.ipynb" has actually some comments, the other have (almost) none nor title to help the unskilled programmer delving into the script to guess what it should do. I doubt that a biologist who does not have a strong computational background will manage adapting the method to its own dataset (which seems to me unavoidable for the reasons mentioned above).

      Finally regarding the data, none is shared publicly along with this manuscript/code, such that if one doesn't have a similar type of dataset - that must be first annotated in a similar manner - one cannot even test the networks/plugin for its own information. A common and necessary practice in the field - and possibly a longer lasting contribution of this work - could have been to provide the complete and annotated dataset that was used to train and test the artificial neural network. The basic reason is that a more performant, or more generalisable deep-learning model may be developed very soon after this one and for its performance to be fairly compared, it requires to be compared on the same dataset. Benchmarking and comparison of methods performance is at the core of computer vision and deep-learning.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on diabetogenic risk from colorectal cancer (CRC) treatment. The authors claim that postoperative screening for type 2 diabetes should be prioritized in CRC survivors with overweight/obesity, irrespective of the oncological treatment received. The evidence supporting the claims is solid but requires confirmation in different populations. These results have theoretical or practical implications and will be of interest to endocrinologists, oncologists, general practitioners, gastrointestinal surgeons, and policymakers working on CRC and diabetes.

      Author response: We thank you for taking the time to provide constructive feedback on our manuscript and for the useful suggestions. We have provided a point-by-point response to each of the reviewers’ comments with clearly marked changes to the manuscript.

      Public reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors set out to determine whether colorectal cancer surgery site (right, left, rectal) and chemotherapy impact the subsequent risk of developing T2DM in the Danish national health register.

      Strengths:

      • The research question is conceptually interesting

      • The Danish national health register is a comprehensive health database

      • The data analysis was thorough and appropriate

      • The findings are interesting, and a little surprising that there was no impact of chemotherapy on the development of T2DM

      Weaknesses:

      This is not a weakness as such, but in the discussion, I would consider adding some brief comment on the international generalizability of the findings - e.g. demographic make up of the Danish population health register and background rates of DM and obesity in this population with CRC compared to countries on other continents.

      Author response: We agree that this information would be valuable. It has now been added in the Discussion section.

      Changes in manuscript: "In Denmark, the overall T2D prevalence is 6.9%25, lower than the global average in 2021 (10.5%) and also falls below the estimate of high-income countries (11.1%).26 Similarly, the obesity rate of 20% aligns with other Scandinavian countries and is below that of most high-income nations.27” (Page 8, line 256-258)

      A little more information would be helpful regarding how T2DM was diagnosed in the registry.

      Author response: We have now added a more thorough explanation of how T2D was diagnosed in the Methods section.

      Changes in manuscript: “Diabetes is defined as the second occurrence of any event across three types of inclusion events: 1) Diabetes diagnosed during hospitalisation 2) diabetes-specific services received at podiatrist 3) purchases of glucose lowering. Thus, if a patient developed transient T2D during chemotherapy treatment, it will only be an inclusion event if they purchase glucose lowering drugs. Individuals were classified as having T1D if they had received prescriptions for insulin combined with a diagnosis of type 1 from a medical hospital department. Otherwise, diabetes was classified as type 2.22” (Page 5, line 154-160)

      If someone did develop transient hyperglycemia requiring DM medications during chemotherapy, would the investigators have been able to identify these people?

      Author response: Yes, we have added a sentence in the Methods section.

      Changes in manuscript: “Thus, if a patient developed transient T2D during chemotherapy treatment, it will only be an inclusion event if they purchase glucose lowering drugs.” (Page 5, line 156-158)

      Would they have been classified as T2DM based on filling a prescription for DM meds for a period of time? Also, did the authors have information regarding time to development of T2DM after surgery?

      Author response: Yes, if they have 2 (or more) prescriptions of oral glucose lowering drugs. Yes, we have information regarding time to development of T2DM after surgery and found no difference between the groups.

      Changes in manuscript: Information on mean time to develop T2D post-surgery has now been added to Table 2.

      In the adjusted Models, the authors did not adjust for cancer stage, even though cancer stage appears to be very different between the chemo and no chemo groups. It would be interesting to know if it affects the results if the model adjusted for cancer stage

      Author response: We agree that adjustment for cancer stage would be a valuable information and we have performed the analysis and added a sentence in the Result section.

      Changes in manuscript: An adjusted analysis of cancer stage now appears in the Supplementary table 1.

      “Moreover, adjusting for cancer stage did not affect the results (Supplementary table 1).” (Page 7, line 219-220)

      It would be worthwhile to report if mortality rates were different between the groups during follow up, and if the authors investigated whether perhaps differences in mortality rates led to specific groups living longer, and therefore having more time to develop DM

      Author response: This situation is accounted for in the analysis by using Cox-regression analysis. This method accounts for the potential competing effect of mortality.

      Changes in manuscript: None.

      Overall, the authors achieved their aims, and the conclusions are supported by their results as reported.

      The results are unlikely to significantly change patient treatment or T2DM screening in this population. With some additional information, as described above, the results would be of interest to the community.

      Reviewer #2 (Public Review):

      Summary:

      The study showed the impact of cancer treatment on new onset of diabetes among patients with colorectal cancer using the national database. Findings reported that individuals with rectal cancer without chemotherapy were less likely to develop diabetes but among other groups, treatment didn't show any impact on the development of diabetes. BMI still played a significant role in developing diabetes regardless of treatment types.

      Strengths:

      One of the strengths of this study is innovative findings about the prognosis of colorectal cancer treatment stratified by treatment types. Especially, as it examined the impact of treatment on the risk of new chronic disease after diagnosis, it became significant evidence that suggests practical insights in developing a proper monitoring system for patients with colorectal cancer and their outcomes after treatment and diagnosis. It is imperative for providers to guide patients and caregivers to prevent adverse outcomes like new onset of chronic disease based on BMI and types of treatment. The next strength is the national database. As the study used the national database, the generalizability is validated.

      Weaknesses:

      Even though the study attempted to examine the impact of each treatment option, the dosage of chemotherapy and the types of chemotherapy were not able to be examined due to the data source.

      Author response: No unfortunately not. We agree that this would have been valuable information. This is stated in the original manuscript as a limitation. Please refer to page 10 line 305-306.

      Changes in manuscript: None.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor things:

      There are minor inconsistencies in the methods and results regarding BMI. In the methods, the authors state that BMI <18.5 and >/=40 were excluded, but these groups are included in Table 2.

      Author response: This has been corrected

      Changes in manuscript: BMI groups <18.5 and >/=40 are now excluded in Table 2. (Page 18)

      Line 204, I believe should be BMI 18.5-24.9, not 20-24.9.

      Author response: This has been corrected

      Changes in manuscript: “For each group (type of surgery ± chemotherapy), the HR for developing T2D depending on BMI subgroups was calculated by using Cox regression analysis adjusted for age, sex, year of surgery, and ASA score using normal weight (BMI:18.5-24.9) as the reference group.” (Page 6, line 184-186)

      Rather than showing the BMI mean in Table 1, it would be interesting to see the BMI breakdown by category.

      Author response: Yes, we agree. This analysis has now been added to Table 1

      Changes in manuscript: Please refer to Table 1

      Re line 215, I would consider rewriting to remove the multiple negatives -e.g. Radiation therapy in rectal resected had did not impact the incidence rate of T2D in the Rectal-No-Chemo group or Rectal-Chemo group

      Author response: This has been corrected. Please refer to the Result section.

      Changes in manuscript: “Radiation therapy in the rectal resected groups had no impact on the incidence rate of T2D (Table 2); and the unadjusted/adjusted HR of developing T2D was non-significant when comparing Rectal-No-Radiation patients with Rectal-Radiation patients (Table 3).” (Page 7, 223-225)

      Consider changing some of the "didn't"s in the discussion to "did not"

      Author response: This has been corrected.

      Changes in manuscript: Revised and corrected throughout the discussion.

      Reviewer #2 (Recommendations For The Authors):

      Some points need to be clarified and improved.

      In the method, patients with Type 1 Diabetes were excluded in the baseline but some patients were diagnosed with Type 1 diabetes after treatment and they were included in your analysis. It is interesting to identify Type 1 Diabetes after the treatment as an outcome, do you think that this diagnosis is caused by the treatment? And incidence rate or other HRs did not seem to include Type 1 Diabetes as stated in the methods. Did you exclude every Type 1 diabetes? If not, It needs to give further explanation about this outcome since the mechanism of Type 1 Diabetes and Type 2 Diabetes is different.

      Author response: This matter has now been clarified in the Methods section.

      Changes in manuscript: “Additionally, individuals diagnosed with Type 1 diabetes (T1D) either before or after surgery were excluded, along with those diagnosed with T2D preoperatively or within the first 2 weeks postoperatively, as the last group probably represents patients with preoperatively unknown pre-existing prediabetes or diabetes.22” (Page 4, line: 125-128)

      Despite limited existing findings, some studies actually reported the incidence rates of Type 2 Diabetes among patients with CRC (Singh S, Earle CC, Bae SJ, et al. Incidence of Diabetes in Colorectal Cancer Survivors. J Natl Cancer Inst. 2016;108(6):djv402. Published 2016 Feb 2. doi:10.1093/jnci/djv402; Khan NF, Mant D, Carpenter L, Forman D, Rose PW. Long-term health outcomes in a British cohort of breast, colorectal and prostate cancer survivors: a database study. Br J Cancer. 2011;105 Suppl 1(Suppl 1):S29-S37. doi:10.1038/bjc.2011.420; Jo A, Scarton L, O'Neal LJ, et al. New onset of type 2 diabetes as a complication after cancer diagnosis: A systematic review. Cancer Med. 2021;10(2):439-446. doi:10.1002/cam4.3666) whereas your study examined the impact of the different types of treatments.

      Author response: Our findings of T2D rate among CRC patients are now commented on in discussion section, and the abovementioned studies are included as references.

      Changes in manuscript: “This national cohort study demonstrated an IR of developing T2D after CRC surgery similar to previous studies.5,11” (Page 8, line 237-238)

      To strengthen the presentation, some places should be revised.

      • Line 216: it says that Table 1 showed no impact of radiation therapy on the incidence rate of T2D. However, either the interpretation or the table number seems wrong. Table 1 does not have this information. Correct this statement.

      • Line 239: There are typo and incomplete sentence. Check the sentence and correct the sentence.

      • Line 257-261: It may be a systematic issue to separate these two paragraphs. But two paragraphs seem related so make them one paragraph.

      Author response: These suggested changes have been made. Regarding line 216 the paragraph has been adjusted to the following:

      Changes in manuscript: “Radiation therapy in the rectal resected groups had no impact on the incidence rate of T2D (Table 2); and the unadjusted/adjusted HR of developing T2D was non-significant when comparing Rectal-No-Radiation patients with Rectal-Radiation patients (Table 3).” (Page 7, 223-225)

      Reference

      (1) Araghi M, Soerjomataram I, Jenkins M, et al. Global trends in colorectal cancer mortality: projections to the year 2035. Int J Cancer. 2019;144(12):2992-3000. doi:10.1002/ijc.32055

      (2) Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global patterns and trends in colorectal cancer incidence and mortality. Gut. 2017;66(4):683-691. doi:10.1136/gutjnl-2015-310912

      (3) González N, Prieto I, del Puerto-Nevado L, et al. 2017 Update on the Relationship between Diabetes and Colorectal Cancer: Epidemiology, Potential Molecular Mechanisms and Therapeutic Implications. Vol 8.; 2017. www.impactjournals.com/oncotarget

      (4) Mills KT, Bellows CF, Hoffman AE, Kelly TN, Gagliardi G. Diabetes mellitus and colorectal cancer prognosis: A meta-analysis. Dis Colon Rectum. 2013;56(11):1304-1319. doi:10.1097/DCR.0b013e3182a479f9

      (5) Singh S, Earle CC, Bae SJ, et al. Incidence of Diabetes in Colorectal Cancer Survivors. J Natl Cancer Inst. 2016;108(6). doi:10.1093/jnci/djv402

      (6) Xiao Y, Wang H, Tang Y, et al. Increased risk of diabetes in cancer survivors: a pooled analysis of 13 population-based cohort studies. ESMO Open. 2021;6(4). doi:10.1016/j.esmoop.2021.100218

      (7) Colorectal D, Nordcan 2019. 5-Year Age-Standardised Relative Survival (%), Males and Females. Accessed September 12, 2022. “https://nordcan.iarc.fr/en/dataviz/survival?cancers=520&set_scale=0&sexes=1_2&populations=208”" has been copied into your clipboard

      (8) Nano J, Dhana K, Asllanaj E, et al. Trajectories of BMI Before Diagnosis of Type 2 Diabetes: The Rotterdam Study. Obesity. 2020;28(6):1149-1156. doi:10.1002/oby.22802

      (9) Maddatu J, Anderson-Baucum E, Evans-Molina C. Smoking and the risk of type 2 diabetes. Translational Research. 2017;184:101-107. doi:10.1016/j.trsl.2017.02.004

      (10) Lega IC, Lipscombe LL. Review: Diabetes, Obesity, and Cancer-Pathophysiology and Clinical Implications. Endocr Rev. 2020;41(1). doi:10.1210/endrev/bnz014 (11) Jo A, Scarton L, O’Neal LTJ, et al. New onset of type 2 diabetes as a complication after cancer diagnosis: A systematic review. Cancer Med. 2021;10(2):439-446. doi:10.1002/cam4.3666

      (12) Feng JP, Yuan XL, Li M, et al. Secondary diabetes associated with 5-fluorouracil-based chemotherapy regimens in non-diabetic patients with colorectal cancer: Results from a single-centre cohort study. Colorectal Disease. 2013;15(1):27-33. doi:10.1111/j.1463-1318.2012.03097.x

      (13) Lee EK, Koo B, Hwangbo Y, et al. Incidence and disease course of new-onset diabetes mellitus in breast and colorectal cancer patients undergoing chemotherapy: A prospective multicenter cohort study. Diabetes Res Clin Pract. 2021;174. doi:10.1016/j.diabres.2021.108751

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Summary:

      In the present study, authors found the ternary complex formed by NCAN, TNC, and HA as an important factor facilitating the multipolar to bipolar transition in the intermediate zone (IZ) of the developing cortex. NCAM binds HA via the N-terminal Link modules, meanwhile, TNC cross-links NCAN through the CDL domain at the C-terminal. The expression and right localization of these three factors facilitate the multipolar-bipolar transition necessary for immature neurons to migrate radially. TNC and NCAM are also involved in neuronal morphology. The authors used a wide range of techniques to study the interaction between these three molecules in the developing cortex. In addition, single and double KO mice for NCAN and TNC were analyzed to decipher the role of these molecules in neuronal migration and morphology.

      Strengths:

      The study of the formation of the cerebral cortex is crucial to understanding the pathophysiology of many neurodevelopmental disorders associated with malformation of the cerebral cortex. In this study, the authors showed, for the first time, that the ternary complex formed by NCAN, TNC, and HA promotes neuronal migration. The results regarding the interaction between the three factors forming the ternary complex are convincing.

      We appreciate the reviewers' positive assessment of our research.

      Weaknesses:

      However, regarding the in vivo experiments, the authors should consider some points for the interpretation of the results:

      • The authors did not use the proper controls in their experiments. For embryonic analysis, such as cortical migration, neuronal morphology, and protein distribution (Fig. 6, 7, and 9), mutant mice should be compared with control littermates, since differences in the results could be due to differences in embryonic stages. For example, in Fig. 6 the dKO is more developed than the WT embryo.

      It was challenging to compare double knockout mice with control littermates. When crossing Ncan and Tcn double heterozygous mice, the probability of obtaining double knockout mice is 1/16. Given an average litter size of around 8, acquiring a substantial number of double knockout mice would necessitate an impractical number of breeding pairs. Consequently, we were constrained to use non-littermate control mice. To address potential differences in developmental stages, we analyzed 19-20 embryos obtained from five individuals in each group, demonstrating that the observed differences between the two groups are more substantial than the inherent variability within each group.

      • The authors claim that NCAM and TNC are involved in neuronal migration from experiments using single KO embryos. This is a strong statement considering the mild results, with no significant difference in the case of TNC KO embryos, and once again, using embryos from different litters.

      We agree with the reviewer's comment that a single deletion of TNC has a minimal impact on neuronal migration. We have revised the Results section to reflect the mild nature of the TNC KO phenotype more accurately.

      Page 8, line 225: "In NCAN KO mice, a significantly lower percentage of labeled cells resided in the upper layer (Bin2), and more cells remained in the lower layer (Bin5) than in WT mice (Figure 7a). In contrast, the impact of a single deletion of TNC on neuronal cell migration was minimal. Although TNC KO mice exhibited a tendency to have a higher proportion of labeled cells in the lower layer (Bin4) than in WT mice, this did not reach statistical significance (Figure 7a). The delay in neuronal migration observed in the single KO mice was milder when compared to that observed in DKO mice (Figure 6a-c), suggesting that simultaneous deletion of both NCAN and TNC is necessary for a more pronounced impairment in neuronal cell migration."

      • The measurement of immunofluorescence intensity is not the right method to compare the relative amount of protein between control and mutant embryos unless there is a right normalization.

      We agree that measuring immunofluorescence intensity alone is insufficient for comparing the relative amount of protein. In Figure 8, we have employed Western blotting to compare the protein levels, revealing an approximately 50% reduction in NCAN and TNC following hyaluronidase digestion. In Figures 7b and 7c, we demonstrated alterations in the localization patterns of TNC and NCAN in Ncan KO and Tnc KO mice; however, we did not mention their quantity.

      • Page 7, line 206. "No significant abnormalities were observed in the laminar structure in 4-week-old DKO mice". The authors should be more careful with this statement since they did not check the lamination of the adult cortex. I would recommend staining, control and mutant mice, with markers of different cortical populations, such as Cux1, Ctip2, Tbr1, to asses this point.

      In response to the suggestion, we have conducted additional experiments to provide a more detailed examination of the laminar structure in the cerebral cortex. The results have been incorporated into the revised manuscript as follows:

      Page 7, line 209: "To investigate the laminar organization of the postnatal cerebral cortex, we analyzed the distribution of NeuN-positive postmitotic neurons in DKO mice at 2 weeks of age. No notable abnormalities were observed in the laminar structure of DKO mice (Figure 6-figure supplement 3a, b). Additionally, the laminar distribution of Ctip2-positive deep layer neurons showed no significant differences between WT and DKO mice (Figure 6-figure supplement 3a, c)."

      • The authors do not explain how they measured the intensity of TNC around the transfected Turbo-RFP-positive neurons.

      We added the following description to the Materials and Methods:

      Page 18, line 608: "Images were captured in the IZ region containing Turbo-RFP-positive neurons using a 100X magnification objective lens with 3.0X optical zoom on an AX R confocal microscope (Nikon). A total of 10 optical sections were acquired with a step size of 190 nm. Z-projection views were generated, and the staining intensity of TNC around Turbo-RFP-positive neurons was measured in a 59 × 59 µm area using ImageJ FIJI."

      • The loading control of the western blots should be always included.

      In Figure 6-figure supplement 1, we have incorporated western blot data using a GAPDH antibody as a loading control. We have added an explanation in the figure legend of Figure 3c, stating that we analyzed the same samples as those used in Figure 1e.

      • For Fig. 3e, I think values are represented relative to E18 instead to P2.

      Thank you for pointing that out. As suggested, we have corrected the representation in Fig. 3e to be relative to E18 instead of P2.

      • I would recommend authors use the standard nomenclature for the embryonic stages. The detection of the vaginal plug is considered as E0.5 and therefore, half a day should be added to embryonic stages (E14.5...).

      We have revised our manuscript to designate the detection of the vaginal plug as E0.5, and subsequently, we have adjusted all embryonic stages by adding half a day, such as E14.5.

      • Fig 10K: I do not see the differences in the number of neurites in the graph.

      We have modified the presentation from a box-and-whisker plot to a bar graph to enhance the visibility of differences in the average number of neurites.

      • Line 37: Not all of the cerebral cortex is structured in 6 layers but the neocortex.

      We have changed 'cerebral cortex' to 'cerebral neocortex.'

      Reviewer 2

      Summary:

      ECM components are prominent constituents of the pericellular environment of CNS cells and form complex and dynamic interactomes in the pericellular spaces. Based on bioinformatic analysis, more than 300 genes have been attributed to the so-called matrisome, many of which are detectable in the CNS. Yet, not much is known about their functions while increasing evidence suggests important contributions to developmental processes, neural plasticity, and inhibition of regeneration in the CNS. In this respect, the present work offers new insights and adds interesting aspects to the facets of ECM contributions to neural development. This is even more relevant in view of the fact that neurocan has recently been identified as a potential risk gene for neuropsychiatric diseases. Because ECM components occur in the interstitial space and are linked in interactomes their study is very difficult. A strength of the manuscript is that the authors used several approaches to shed light on ECM function, including proteome studies, the generation of knockout mouse lines, and the analysis of in vivo labeled neural progenitors. This multi-perspective approach permitted to reveal hitherto unknown properties of the ECM and highlighted its importance for the overall organization of the CNS.

      Strengths:

      Systematic analysis of the ternary complex between neurons, TNC, and hyaluronic acid; establishment of KO mouse lines to study the function of the complex, use of in utero electroporation to investigate the impact on neuronal migration;

      We appreciate the reviewers' insightful comments.

      Weaknesses:

      The analysis is focused on neuronal progenitors, however, the potential impact of the molecules of interest, in particular, their removal on differentiation and /or survival of neural stem/progenitor cells is not addressed. The potential receptors involved are not considered. It also seems that rather the passage to the outer areas of the forming cortex is compromised, which is not the same as the migration process. The movement of the cells is not included in the analysis.

      In this study, we demonstrated that the ternary complex of NCAN, TNC, and HA is predominantly localized in the subplate/intermediate zone. This region lacks neural stem/progenitor cells but serves as the initiation site for the radial migration of postmitotic neurons. Consequently, our study focused on the role of the ternary complex in neuronal migration and polarity formation. We acknowledge that we did not investigate in-depth the potential effects of ECM perturbation on the differentiation and survival of neural stem/progenitor cells. However, as highlighted by the reviewer, it is important to explore the effects on neural stem/progenitor cells. To address this concern, we analyzed Pax6-positive radial glial cells and Tbr2-positive intermediate progenitor cells in the ventricular zone of wild-type and Ncan/Tnc double knockout (DKO) mice. Immunohistochemical analysis revealed no significant differences between WT and DKO mice (Figure 6-figure supplement 4a). Furthermore, the morphology of nestin-positive radial fibers exhibited no distinguishable variations between WT and DKO mice (Figure 6-figure supplement 4b, c).

      (1) In the description of the culture of cortical neurons the authors mentioned the use of 5% horse serum as a medium constituent. HS is a potent stimulus for astrocyte differentiation and astrocytes in vitro release neurocan. Therefore, the detection of neurocan in the supernatant of the cultures as shown in Figure 1h might as well reflect release by cultivated astrocytes.

      As pointed out by the reviewer, Figure 1h did not conclusively demonstrate that neurons are the sole source of NCAN production. Indeed, in situ hybridization analysis revealed the widespread distribution of Ncan mRNA throughout the cerebral cortex (Figure 2a). This result suggests that the production of NCAN involves not only neurons but also other cell populations, including radial glial cells and astrocytes. While we acknowledge the potential contribution of other cell types to NCAN production, Ncan expression by neurons during radial migration is a crucial aspect of our findings (Figure 1i, j). We have revised the manuscript as follows:

      Page 5, line 111: "This result suggested the secretion of NCAN by developing neurons; however, we cannot rule out the involvement of coexisting glial cells in the culture system. To investigate the expression of Ncan mRNA during radial migration in vivo, we labeled radial glial cells in the VZ with GFP through in utero electroporation at E14.5 (Figure 1i, Figure 1-figure supplement 1)."

      (2) It is known that neurocan in vivo is expressed by neurons, but may be upregulated in astrocytes after lesion, or in vitro, where the cells become reactive.

      We have incorporated the following description into the discussion:

      Page 11, line 359: "Previous studies have reported an upregulation of NCAN and TNC in reactive astrocytes, indicating the potential formation of the ternary complex of NCAN, TNC, and HA in the adult brain in response to injury (Deller et al., 1997; Haas et al., 1999)."

      (3) Do NCAN KO neurons show an increase in neurite growth on the TNC substrates? The response on POL was changed (Fig. 10h-k), but the ECM substrates were not tested with the KO neurons.

      The impact of ECM substrates on NCAN KO neurons has not been investigated, and this remains an avenue for further exploration in our ongoing research. Future studies aim to elucidate the NCAN-TNC connection by identifying TNC cell surface receptors and unraveling the subsequent intracellular signaling pathways.

      (4) Do the authors have an explanation for why the ternary complex is concentrated in the SP/IZ zone?

      In the mature brain, hyaluronan acts as a scaffold that facilitates the accumulation of ECM components, including proteoglycans and tenascins around neurons. Therefore, it is conceivable that the ECM components bind to hyaluronan in the embryonic brain, resulting in its accumulation in the subplate/intermediate zone. In support of this hypothesis, enzymatic digestion of hyaluronan in the subplate/intermediate zone led to the disappearance of TNC and NCAN accumulation (Figure 8a-c). This result may account for the disparity observed, where Tnc mRNA is expressed in the ventricular zone while the TNC protein localizes to the subplate/intermediate zone.

      (5) Are hyaluronic acid synthesizing complexes (HAS) concentrated in the SP/IZ?

      According to the reviewer's comment, we have investigated the localization of Has2 and Has3 mRNA using in situ hybridization. However, due to the relatively low expression levels of these enzymes, we encountered challenges in obtaining clear signals (Author response image 1). Further research is needed to understand the mechanisms behind the localization of hyaluronan in the intermediate zone.

      Author response image 1.

      In situ hybridization analysis of Has2 and 3 mRNA on the E16.5 cerebral cortex. Upper images show results of in situ hybridization using antisense against Has2 and 3. Lower images are in situ hybridization using sense probes as negative controls.

      (6) CSPGs as well as TNC are part of the neural stem/progenitors cell niche environment. Does the removal of either of the ECM compounds affect the proliferation, differentiation, and/or survival of NSPCs, or their progeny?

      )7) This question relates to the fact that the migration process itself is not visualized in the present study, rather its outcome - the quantitative distribution of labeled neurons in the different bins of the analysis. This could also derive from modified cell numbers.

      As pointed out by the reviewer, previous studies have shown the role of CSPGs and TNC as components of the neural stem/progenitor cell niche (see reviews by (Faissner et al., 2017; Faissner and Reinhard, 2015). However, as mentioned in Response #2, based on our analyses, we did not observe a reduction in neural stem/progenitor cells in NCAN/TNC double-knockout mice. While we cannot precisely explain this discrepancy, it is worth noting that many past studies evaluated the activities of the ECM molecules in in vitro systems such as neurospheres. The observed differences may stem from variations in experimental systems.

      (8) What is the role of the ECM in the SP/IZ area? Do the cells need the ECM to advance, the reduction would then leave the neuronal progenitors in the VZ area? This somehow contrasts with interpretations that the ECM acts as an obstacle for neurite growth or cell migration, or as a kind of barrier.

      The role of the ECM is multifaceted, with certain ECM molecules known to inhibit neurite outgrowth while others facilitate it. Additionally, the effects of ECM can vary depending on the cell type. It is established that after migrating neurons adhere to radial fibers, they utilize these fibers as a scaffold to migrate toward the cortical surface. However, in the subplate/intermediate zone, migrating neurons have not yet adhered to radial fibers. This study provides evidence that multipolar neurons undergo morphological changes into bipolar cells with the assistance of the NCAN, TNC, and HA complex. Subsequently, this facilitates their movement along radial fibers.

      (9) A direct visualization of the movement of neural progenitors in the tissue as has been for example performed by the Kriegstein laboratory might help resolve some of these issues.

      As suggested by the reviewer, utilizing live imaging techniques to directly observe the movement of neural progenitors within the tissue is indeed a powerful tool. We recognize the significance of addressing these points in future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript describes the crystal structures of Streptococcus pneumoniae NOXs. Crystals were obtained for the wild-type and mutant dehydrogenase domain, as well as for the full-length protein comprising the membrane domain. The manuscript further carefully studies the enzyme's kinetics and substrate-specificity properties. Streptococcus pneumoniae NOX is a non-regulated enzyme, and therefore, its structure should provide a view of the NOX active conformation. The structural and biochemical data are discussed on this ground.

      Strengths:

      This is very solid work. The protein chemistry and biochemical analysis are well executed and carefully described. Similarly, the crystallography must be appreciated given the difficulty of obtaining good enzyme preparations and the flexibility of the protein. Even if solved at medium resolution, the crystal structure of the full-length protein conveys relevant information. The manuscript nicely shows that the domain rotations are unlikely to be the main mechanistic element of NOX regulation. It rather appears that the NADPH-binding conformation is pivotal to enzyme activation. The paper extensively refers to the previous literature and analyses the structures comprehensively with a comparison to previously reported structures of eukaryotic and prokaryotic NOXs.

      We thank the referee for these very nice comments about our work.

      Weaknesses:

      The manuscript is not always very clear with regard to the analysis of NADPH binding. The last section describes a "crevice" featured by the NADPH-binding sites in NOXs. It remains unclear whether this element corresponds to the different conformations of the protein C-terminal residues or more extensive structural differences. This point must be clarified.

      We agree with the referee that our terminology was not very clear. Responding to your comment helped us to improve our explanation: we have changed the text to emphasize the differences we observe in the distances between the FAD binding groove and the entire NADPH binding groove, which includes conserved NADPH-contacting motifs as well as the critical aromatic.

      A second less convincing point concerns the nature of the electron acceptor. The manuscript states that this NOX might not physiologically act as a ROS producer. A question then immediately arises: Is this protein an iron reductase?

      Can the authors better discuss or provide more data about this point?

      The referee has a legitimate point, which was also our first idea. In the initial work on SpNOX, where we discovered bacterial NOX enzymes (see Hajjar et al 2017 in mBio), we evaluated its possible role as an iron reductase. There we showed that SpNOX can reduce CytC directly; however, while some reduction of Fe3+-NTA complex (used classically in ferric reductase activity assay) occurred, this reduction was inhibitable by SOD and occurred indirectly by the superoxide produced, so therefore not a true iron reductase activity. This represents a mixed situation of direct and indirect reduction of an iron-containing acceptor that appears to preclude physiological iron reductase activity since it appears that the protein component of CytC allows it to interact with SpNOX. As these questions had been already addressed in a previous paper, we did not add anything here and we prefer to underline this possibility of another acceptor and to leave this question open for future works.

      Reviewer #2 (Public Review):

      The authors describe the structure of the S. pneumoniae Nox protein (SpNOX). This is a first. The relevance of it to the structure and function of eukaryotic Noxes is discussed in depth.

      Strengths and Weaknesses

      One of the strengths of this work is the effort put into preparing a pure and functionally active SpNOX preparation. The protein was expressed in E. coli and the purification and optimization of its thermostability and activity are described in detail, involving salt concentration, glycerol concentration, and pH.

      This reviewer was surprised by the fact that the purification protocol in the eLife paper differs from those in the mBio and Biophys. J. papers by the absence of the detergent lauryl maltose neopentyl glycol (LMNG). LMNG is only present in the activity assay at a low concentration (0.003%; molar data should be given; by my calculation, this corresponds to 30 μM).

      We regret this misunderstanding: our description was not clear enough. As the referee points out, in previous papers we purified the full length SpNOX with the detergent LMNG. In the current paper, we described only the protocol for SpNOX DH domain variant, a soluble cytoplasmic domain. We have now modified the text to clarify the difference between the purification of fulllength SpNOX variants, which were performed with detergent as cited in Vermot et al 2020, and the purification of DH domains, which are soluble and thus did not require detergent in the purification.

      In light of the presence of lipids in cryo-EM-solved structures of DUOX and NOX2, it is surprising that the authors did not use reconstitution of the purified SpNOX in phospholipid (nanodisk?). The issue is made more complicated by the statement on p. 18 of "structures solved in detergent like ours" when no use of detergent in the solubilization and purification of SpNOX is mentioned in the Methods section (p. 21-22).

      As stated above, detergent used to purify the full-length version of SpNOX. We did in fact perform some preliminary tests of reconstitution in nanodiscs. Different trials of negative staining studies showed heterogeneous size of SpNOX in nanodiscs and the initial images were not promising. Furthermore, in parallel, we had positive results in crystallography relatively quickly with protein in detergent. We thus focused on refining the crystals, which was a fairly long and mobilizing task; we decided to allocate time and resources to the promising avenue and did not further pursue nanodiscs.

      We did not go in theCryo-EM direction because the small size of the protein was initially believed to be a significant barrier to successful Cryo-EM. Perhaps we could have pursued this avenue: while our manuscript here was submitted to eLife, another group deposited a preprint in BioRxiv using CryoEM to solve the structure of SpNOX (see comment below). This structure was solved in detergent so even in this CryEM structure there is no information on the potential roles of lipids as asked by the referee.

      In this revised version, we have added a comment, in the last paragraph, in reference to the additional data available today thanks to the other structures generated by this other group (Murphy's group).

      Can the authors provide information on whether E. coli BL21 is sufficiently equipped for the heme synthesis required for the expression of the TM domain of SpNOX. Was supplementation with δaminolevulinic acid used

      The production of His-SpNox in E.coli C41(DE3) is without any δ-aminolevulinic acid supplementation. Supplementation was tested but no change was observed regarding the heme content (UV/Visible spectra) so we settled on the purification described by Vermot et al 2020. Initially, for the mBio paper (Haajar et al 2017), we performed heme titrations which gave stoichiometry between 1.35 to 1.5 heme/protein, indicating 2 hemes (these data were not shown). In the end in this work we observed two hemes in the crystal structure, thus confirming that E.coli, at least for this protein, did not need supplementation with δ-aminolevulinic acid .

      The 3 papers on SpNOX present more than convincing evidence that SpNOX is a legitimate Nox that can serve as a legitimate model for eukaryotic Noxes (cyanide resistance, inhibition by DPI, absolute FAD dependence, and NADPH/NADH as the donor or electrons to FAD). It is also understood that the physiological role of SpNOX in S. pneumoniae is unknown and that the fact that it can reduce molecular oxygen may be an experimental situation that does not occur in vivo.

      I am, however, linguistically confused by the statement that "SpNOX requires "supplemental" FAD". Noxes have FAD bound non-covalently and this is the reason that, starting from the key finding of Babior on NOX2 back in 1977 to the present, FAD has to be added to in vitro systems to compensate for the loss of FAD in the course of the purification of the enzyme from natural sources or expression in a bacterial host. I wonder whether this makes FAD more of a cosubstrate than a prosthetic group unless what the authors intend to state is that SpNOX is not a genuine flavoprotein.

      We believe there is some confusion between SpNOX – the full length transmembran protein -- and SpNOXDH -- the cytosolic domain only. The sentence pinpointed by the referee was in fact “The strict requirement of FAD addition for SpNOXDH activity suggests that the flavin behaves as a cosubstrate”. This statement was about the isolated cytosolic domain that does not contain the TM part of the protein.

      We agree that in WT NOX enzymes (including SpNOX) FAD is held within the enzyme structure and thus can be considered, by definition, as a prosthetic group. This is supported by the nanomolar affinity for FAD of SpNOX. We did not intend to say that NOX and SpNOX are not genuine flavoproteins.

      On the other hand, when isolated, the affinity of DH domain for flavins drops to the µM level. This µM level of affinity does not allow stable maintenance of the flavin in the active site as illustrated by the spectra of Figure 3. This is instead the typical affinity of a substrate or a co-substrate (similar to that of substrate NADPH) that can be exchangeable and diffuse in and out of the active site. The DH domain recognizes and reduces flavins but, as a consequence of its lower affinity, will release to its environment free reduced flavins. Thus the isolated DH behaves as a flavin reductase that uses flavin as substrate. Such enzymes have already been well described (and some of them are of the FNR family). Such enzymes, using flavin as substrate, typically have affinity for flavin in the µM range and share with the SpNOX DH binding properties centered on the isoalloxazine ring only.

      We understand that, in the text, to switch from the SpNOX to the SpNOX DH and for FAD from a prosthetic group to a diffusible co-substrate can be confusing. So, to make it clearer, we modified the following sentences and added references to “some flavin reductases characterization” that could provide support for the reader.

      “The strict requirement of FAD addition for SpNOXDH activity and its µM level of affinity suggests that the flavin behaves as a co-substrate rather than a prosthetic group. As an isolated domain, SpNOXDH may work as a flavin reductase enzyme (Gaudu et al, 1994; Fieschi et al 1995; Nivière et al 1996), ..”

      We hope that it will help.

      I am also puzzled by the statement that SpNOX "does not require the addition of Cyt c to sustain superoxide production". Researchers with a Cartesian background should differentiate between cause and effect. Cyt c serves merely as an electron acceptor from superoxide made by SpNOX but superoxide production and NADPH oxidation occur independently of the presence of added Cyt c.

      Thanks to the referee for pointing out this poor wording. We agree and have amended the text to clarify what we originally meant. It is now:

      “SpNOXDH requires supplemental FAD to sustain both superoxide production, which can be observed in the presence of Cyt c (Figure 2A), and NADPH oxidation, which can be observed in the absence of Cyt c (Figure 2B).”

      The ability of the DH domain of SpNOX (SpNOXDH) to produce superoxide is surprising to this reviewer.The result is based on the inhibition of Cyt c reduction by added superoxide dismutase (SOD) by 40%. In all eukaryotic Noxes superoxide is produced by the one-electron reduction of molecular oxygen by electrons originating from the distal heme, having passed from reduced FAD via two hemes. The proposal that superoxide is generated by direct transfer of electrons from FAD to oxygen deserves a more in-depth discussion and relies too heavily on the inhibitory effect of SOD. A control experiment with inactivated SOD should have been done (SOD is notoriously heat resistant and inactivation might require autoclaving).

      The initial reports of a NOX DH-domain-only construct (that of human Nox4) producing superoxide are cited in the text. Moreover, natural flavin reductases are known to produce superoxide due to the release of free reduced flavin in the medium.

      As explain above, FAD in full length SpNox is a relay for the electrons from NADPH to heme and is internal to the protein and thus devoted to this specific task.

      In the case of SpNOX DH, its flavin reductase behavior leads to the release in the medium of free reduced flavin as a nonspecific diffusible electron carrier. It has been already demonstrated that such free reduced flavin can efficiently reduce soluble O2 and be a source of superoxide.

      This has been particularly well documented in (Gaudu et al, 1994. J.Biol.Chem). We have added this reference to the text (see the modified sentence in a reply, 2 comments above).

      Furthermore, we want to point to the referee that the link between flavin and superoxide production here is not only based on the inhibition by SOD. When we added the flavin inhibitor DPI we observed no more superoxide production from the DH domain (Figure 2C). This supports the role of free-reduced flavin in both the production of superoxide and also part of direct cyt C reduction as observed.

      An unasked and unanswered question is that, since under aerobic conditions, both direct Cyt c reduction (60%) and superoxide production (40%) occur, what are the electron paths responsible for the two phenomena occurring simultaneously?

      We thank the referee for dedication to a clear understanding of the mechanism used by the SpNOXDH construct. It pushes us to develop a clear description of the mechanism at work here for the readers. Please find below a proposal mechanism describing the electron transfer from NAD(P)H to free flavin that can, as diffusible species, then reduce non-specifically either the O2 or the Cyt.C encountered.

      Author response image 1.

      However, it is important to remember that this is not physiological, and rather the result of using a DH domain isolated from the TM of SpNOX. Nonetheless, it shows that the DH domain is fully functional for NAD(P)H as well as the hydride transfer.

      This reviewer had difficulty in following the argument that the fact that the kcat of SpNOX and SpNOXDH are similar supports the thesis that the rate of enzyme activation is dependent on hydride transfer from nicotinamide to FAD.

      We have amended the text to clarify this point. If the reaction rate is not affected by the presence or absence of the hemes in the TM domain, this inevitably implies that the rate is NOT limited by the electron transfer to the heme, and ultimately to O2, from the FAD, and thus the hydride transfer step that oxidizes the FAD must be the rate limiting step.

      The section dealing with mutating F397 is a key part of the paper. There is a proper reference to the work of the Karplus group on plant FNRs (Deng et al). However, later work, addressing comparison with NOX2, should be cited (Kean et al., FEBS J., 284, 3302-3319, 2017). Also, work from the Dinauer group on the minimal effect of mutating or deleting the C-terminal F570 in NOX2 on superoxide production should be cited (Zhen et al., J. Biol. Chem. 273, 6575-6581, 1998).

      We thank the reviewer for pointing out our unintended omission of these important works; we have amended the text and added the citations.

      It is not clear why mutating F397 to W (both residues having aromatic side chains) would stabilize FAD binding.

      In a few words, trp’s double ring can establish larger and stronger vanderWaals contact with the isoalloxazine ring than the phe sidechain. Our discussion regarding this point is extensive in the structural section where we compare the structures with F and W in this position. At this time we do not think it is necessary to add anything to the text.

      Also, what is meant by "locking the two subdomains of the DH domain"? What subdomains are meant?

      The two subdomains are the NADPH-binding domain and the FAD-binding domain, which we define on p 11 (“SpNOXDH presents a typical fold of the FNR superfamily of reductase domain containing two sub-domains, the FAD-binding domain (FBD) and an NADPH-binding domain (NBD) “) and which are labeled in Fig. 4. By “locking” we meant to convey immobilizing them into a specific conformation; we have amended the text to clarify this point.

      Methodological details on crystallization (p. 11) should be delegated to the Methodology section. How many readers are aware that SAD means "Single Wavelength Anomalous Diffraction" or know what is the role of sodium bromide?

      We have amended the text to emphasize the intended point, which is the different origins of the two DH structures: the de novo structure was possible through co crystallization with bromide, and the molecular replacement structure used the de novo structure as a model.

      The data on the structure of SpNOX are supportive of a model of Nox activation that is "dissident" relative to the models offered for DUOX and NOX2 activation. These latter models suggested that the movement of the DH domain versus the TM domain was related to conversion from the resting to the activated state. The findings reported in this paper show that, unexpectedly, the domain orientation in SpNOX (constitutively active!) is much closer to that of resting NOX2. One of the criteria associated with the activated state in Noxes was the reduction of the distance between FAD and the proximal heme. The authors report that, paradoxically, this distance is larger in the constitutively active SpNOX (9.2 Å) than that in resting state NOX2 (7.6 Å) and the distance in Ca2+-activated DUOX is even larger (10.2 Å).

      A point made by the authors is the questioning of the paradigm that activation of Noxes requires DH domain motion.

      Instead, the authors introduce the term "tensing", within the DH domain, from a "relaxed" to a more rigid conformation. I believe that this proposal requires a somewhat clearer elaboration

      It is clear that the distance between the FAD and NADPH shown in the Duox and Nox2 structures is too large for the chemical reaction of hydride transfer. Wu et al used the terms ‘tense’ and ‘relaxed’ to describe conformations of the DH domain corresponding to ‘short distance’ and ‘longer distance’, respectively, between the two ligand binding sites. We quoted this terminology and have amended the text to clarify that we envision a motion of the NBD relative to the FBD, as distinct from a larger motion of the whole DH domain relative to the TM domain.

      The statement on p. 18, in connection to the phospholipid environment of Noxes, that the structure of SpNOX was "solved in detergent" is puzzling since the method of SpNOX preparation and purification does not mention the use of a detergent. As mentioned before, this absence of detergent in the present report was surprising because LMNG was used in the methods described in the mBio and Biophys. J. papers. The only mention of LMNG in the present paper was as an addition at a concentration of 0.003% in the activity assay buffers.

      Please see our response to similar points above. Detergent was present for the solubilization of the full-length SpNOX.

      The Conclusions section contains a proposal for the mechanism of conversion of NOX2 from the resting to the activated state. The inclusion of this discussion is welcome but the structural information on the constitutively active SpNOX can, unfortunately, contribute little to solving this important problem. The work of the Lambeth group, back in 1999 (cited as Nisimoto et al.), on the role of p67-phox in regulating hydride transfer from NADPH to FAD in NOX2 may indeed turn out to have been prophetic. However, only solving the structure of the assembled NOX2 complex will provide the much-awaited answer. The heterodimerization of NOX2 with p22-phox, the regulation of NOX2 by four cytosolic components, and the still present uncertainty about whether p67-phox is indeed the final distal component that converts NOX2 to the activated state make this a formidable task.

      The work of the Fieschi group on SpNOX is important and relevant but the absence of external regulation, the absence of p22-phox, and the uncertainty about the target molecule make it a rather questionable model for eukaryotic Noxes. The information on the role of the C-terminal Phe is of special value although its extension to the mechanism of eukaryotic Nox activation proved, so far, to be elusive.

      We really thank the referee for the positive comments on our work and the deep interest shown by this careful evaluation.

      We understand the arguments of the referee regarding the relevance of our work here to eukaryotic NOX, but we do not share the reservations expressed. While human NOXes need interactions with other proteins or have EF-hand or other domains that control them, SpNOX corresponds exactly to the minimal core common to any NOX isoform. In fact, because SpNOX has only this conserved core, it is unique in that it can work as a constitutively active NOX without protein-protein interactions or regulatory domains. Thus the fundamentals of electron transfer mechanisms of NOX enzyme are present in SpNOX.

      There might be some differences in the internal organization from isoform to isoform (as regarding the relative DH domain vs TM domain orientation) but considering the similarity between NOX2 and SpNOX topology we are rather confident that the SpNOX structure will turn out to be a reasonable model of the activated NOX2 structure. History will tell.

      In any case, this work on SpNOX allowed us to highlight hydride transfer as the limiting step and also to highlight some structural differences that could be at the source of the regulation in eukaryotic NOX. In itself, we think this is a significant contribution to the field.

      We warmly thank both referees for their constructive remarks and their help in the improvement of this manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • The manuscript states that the flavin "behaves" like a co-substrate and thereby reports on the Km for the flavins. I feel that this terminology might be confusing. The flavin is unchanged after the reaction, and what matters is the enzyme's affinity for the flavin and the flavin concentration needed to saturate the enzyme (to have it in the fully holo form).

      See above -- answering many questions from referee2, we have extensively commented on that point (substrate, cofactor, affinity, etc..) and made some adjustments in the text to clarify. We hope it is now satisfactory.

      • I could not find the methodological description of the experiments performed to measure the Km for the flavins, and the legend of Figure S4 does not help in this regard. I think that the data (left panels of S4) should be interpreted as binding curves with associated Kd values.

      We have changed the text to clarify the method used to measure Km for flavins.

      • A related point is that the manuscript refers to Km as an "affinity". This is inappropriate and should be avoided, as the Km is not the Kd.

      We agree with the referee that the Km is not the Kd. However, under the appropriate conditions, to which our experiments conform, Km is accepted as a relevant approximation of affinity (Srinisivan, FEBS Journal, v 289 pp 6086-6098 2022). We have added a sentence to clarify this point and cite this reference in the text.

      • The environment around the putative oxygen site should be shown. The text indicates that "the residues characteristic of the O2 reducing center in eukaryotic FRD domains of NOX and DUOX enzymes are not conserved in SpNOX." How does the site look? This point relates to the more general comment above on the oxidizing substrate used by this bacterial NOX.

      This is a really interesting point that contains many potential biological developments for future studies of this prokaryotic family of NOX enzymes. While we were submitting this work to eLife for evaluation, another group (Murphy's lab) filed a pre-publication in BioRXiv, in which they also solved the structure of SpNOX but this time by CryoEM with an unexpected level of resolution for such a small protein (their paper is not yet published but probably under peer review somewhere). In their work, they made a special effort to identify the O2 reducing center (bacterial NOX sequences alignment, mutation studies, …) They were not able to localize such a site with accuracy. There is also other complementary data between their work and ours. So, we will add a paragraph at the end of the discussion to comment on this parallel work and to emphasize on the complementarity of their studies and what it brings to the final understanding of this enzyme.

      • The section "A Close-up View of NOX's NAD(P)H Binding Domains vs the FNR Gold Standard" should be clarified.

      I found it difficult to understand. Is the different conformation of Phe397 creating the crevice? Could NADPH be modeled in NOX2 and DUOX in the same conformation observed in FNR and modeled in the bacterial NOX? Or would there be clashes, implying the necessity of larger conformational changes to bring the nicotinamide closer to the FAD?

      Please see responses above on this point; we have amended the text to clarify. In a few words, we propose that activation in the eukaryotic enzymes would entail NBD subdomain (containing NADPH site) towards the FBD subdomain (containing FAD) through an internal motion within the DH domain. Doing so, they would approach the DH domain topology of SpNOX, which models an active state.

      Reviewer #2 (Recommendations For The Authors):

      On p. 6, second line, it should be (Figure 1C and 1D). Space is missing between C and "and".

      On p. 9, in Figure 3, the labeling A and B are missing. Also, the legend of part B does not correspond to the actual graph colors. Thus, the tracing of F397W is red and not grey as indicated in the legend.

      Corrected. Thank you

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigated the factors related to understudied genes in biomedical research. It showed that understudied genes are largely abandoned at the writing stage, and it identified a number of biological and experimental factors that influence which genes are selected for investigation. The study is a valuable contribution to this branch of meta-research, and while the evidence in support of the findings is solid, the interpretation and presentation of the results (especially the figures) needs to be improved.

      We thank the editor and reviewers for their detailed and thoughtful assessment of our work. Below, we present detailed responses to reviewers’ comments and suggestions. We are also submitting a version edited for clarity of presentation and precision of interpretation.

      Following the eLife assessment, we also tried to identify further statements where results could be presented in a more precise way.

      First, in the section Subsequent reception by other scientists does not penalize studies on understudied genes, we now state “This result again opposes the hypothesis that less-investigated genes will yield articles with lower impact.”

      Second, in section Identification of biological and experimental factors associated with selection of highlighted genes, we now state:

      “We cautiously hypothesize that this might reflect on many different research groups producing reagents surrounding the genes that they actively study. The most informative continuous factor is the number of research articles about a gene (Figure 1B).”, removing claims of causality.

      Finally, for improved readability, we have moved all supplemental tables into separate .xlsx files.

      Reviewer #1 (Public Review):

      Summary and strengths

      The authors tried to address why only a subset of genes are highlighted in many publications. Is it because these highlighted genes are more important than others? Or is it because there are non-genetic reasons? This is a critical question because in the effort to discover new genes for drug targets and clinical benefit, we need to expand a pool of genes for deep analyses. So I appreciate the authors' efforts in this study, as it is timely and important. They also provided a framework called FMUG (short for Find My Understudied Gene) to evaluate genes for a number of features for subsequent analyses.

      We thank the reviewer for their insightful comments and are pleased that the reviewer shares our appreciation for the gravity of these questions. As the reviewer emphasizes, it is critical to understand whether the choice of genes reflects their importance or non-genetic reasons. Previously we and others demonstrated that this choice does not reflect biological importance, when the latter is assessed through unbiased genome-wide data (e.g.: Haynes et al., 2018; Stoeger et al. 2018). Now we contribute to this critical question by systematically evaluating individual non-genetic reasons. We address the reviewer’s comments below.

      Weaknesses

      Many of the figures are hard to comprehend, and the figure legends do not sufficiently explain them.

      For example, what was plotted in Fig 1b? The number of articles increased from results -> write-ups -> follow-ups in all four categories with different degrees. But it does not seem to match what the authors meant to deliver.

      We apologize for the lack of clarity. We identified two interrelated elements that we have now fixed: i) the prior figure legend provided for each genomics approach n number of articles, such as “GWAS (n=450 articles)”; ii) the prior y-axis was labelled “Number of articles”.

      Addressing the first element, we now rephrased the legend for clarity:

      “b, We identified articles reporting on genome-wide CRISPR screens (CRISPR, 15 focus articles and 18 citing articles), transcriptomics (T-omics, 148 focus articles and 1,678 citing articles), affinity purification–mass spectrometry (AP-MS, 296 focus articles and 1,320 citing articles), and GWAS (450 focus articles and 3,524 citing articles). Focusing only on protein-coding genes (white box plot), we retrieved data uploaded to repositories describing which genes came up as “hits” in each experiment (first colored box plot). We then retrieved the hits mentioned in the titles and abstracts of those articles (second colored box plot) and hits mentioned in the titles and abstracts of articles citing those articles (third colored box plot). Unique hit genes are only counted once.”

      The number of genes in each box plot is now reported in the x-axis labels for each step. For example, the results for CRISPR were obtained from 15 focus studies (original research) and 18 subsequent studies (papers citing focus articles). Those 15 studies identified 9,268 genes where loss-of-function changed phenotypes but, in their titles and abstracts, mentioned only 18 of those 9,268 genes. While the 9,268 hit genes have received similar research attention to the entirety of protein-coding genes, the 18 hit genes mentioned in the title or abstract are significantly more well studied. The articles citing the focus articles also only mentioned in their titles and abstracts 19 highly studied hit genes.

      Addressing the second element, we updated the axis label to “Number of articles about gene”, to distinguish it from number of articles mentioned in the legend, convey that this is the number of articles about each gene that were published independently of the genomics assays we inspect. To further underscore this point we now label the “20% highest-studied genes” that we mention in the main text, and reworded the figure caption to better capture where the critical increase occurs: “A shift in focus towards well-studied genes occurs during the summarization and write-up of results and remains in subsequent studies.”.

      Fig 4 is also confusing. It appears that the genes were clustered by many features that the authors developed. But does it have any relationship with genes being under- or over-studied?

      We again apologize for the lack of clarity. As is described in the main text, while the results of Figs. 1-2 suggest that gene popularity may be predict the highlighting of a differentially expressed gene in the title or abstract, we want to conduct a systematically analysis of the factors that correlate with such a decision. We thus build a set of 45 factors that have been discussed as factors explaining why some genes receive increased research attention.

      The data in Fig. 4 shows that those 45 factors are not independent but that some are highly correlated. Because of those correlations, we are able to select a smaller number as representative of the full set. Those are the default factors shown to users of FMUG. While users can choose all factors that are significantly correlated with the highlighting in title or abstract, the default of presenting factors representing different clusters of factors enabled us to limit the number of factors that are initially displayed.

      Please note that following the suggestion of Reviewer 3, we have now moved this Figure to the supplemental material, as Figure S11.

      Reviewer #2 (Public Review)

      Summary and strengths

      In this manuscript the authors analyse the trajectory of understudied genes (UGs) from experiment to publication and study the reasons for why UGs remain underrepresented in the scientific literature. They show that UGs are not underrepresented in experimental datasets, but in the titles and abstracts of the manuscripts reporting experimental data as well as subsequent studies referring to those large-scale studies. They also develop an app that allows researchers to find UGs and their annotation state. Overall, this is a timely article that makes an important contribution to the field. It could help to boost the future investigation of understudied genes, a fundamental challenge in the life sciences. It is concise and overall well-written, and I very much enjoyed reading it. However, there are a few points that I think the authors should address.

      We thank the reviewer for their kind assessment.

      Weaknesses

      The authors conclude that many UGs "are lost" from genome-wide assay at the manuscript writing stage. If I understand correctly, this is based on gene names not being reported in the title or abstract of these manuscripts. However, for genome-wide experiments, it would be quite difficult for authors to mention large numbers of understudied genes in the abstract. In contrast, one might highlight the expected behaviour of a well-studied protein simply to highlight that the genome-wide study provides credible results.

      We agree that it is not reasonable to expect a title or abstract to highlight hundreds or even thousands of differentially expressed genes. We’ve now extended our Study Limitations section to address this:

      “we take a gene being mentioned in the title or abstract of an article as a proxy for a gene receiving attention by the article’s authors. The title and abstract are space-limited and thus cannot accommodate discussion of large numbers of genes.”

      We also agree that highlighting the expected behavior of a well-studied protein may provide credibility to a study and increase confidence on other results. The soundness of such a strategy was quantitatively studied in a study by Uzzi et al. (Science 2013), which we now include in the section on study limitations as:

      “authors beginning manuscripts with something familiar before introducing something new”.

      To convey the practical limitation of abstracts needing to be concise, we added the following sentence to our discussion section, when suggesting controlled trials that add genes to abstracts:

      “This intervention would need to be carefully designed since abstracts are limited in their size.”

      To avoid over-interpretation we have in the discussion also extended the sentence on “lost in a leaky pipeline” to “lost to titles and abstracts of research articles in a leaky pipeline”.

      Our focus on titles and abstracts has been equally motivated by their availability (full text still is often behind paywalls and/or not accessible for bulk-download and text-mining) and by abstracts being the most visible and most read parts of research articles (e.g.: bioRxiv estimates that for the preprint for the present manuscript, the abstract was read ~10 times more frequently than full-text HTML and 4 times more frequently than the pdf).

      Could this bias the authors' conclusions and, if so, how could this be addressed? For example, would it be worth to normalise studies based on the total number of genes they cover?

      We previously described that – in line with the reviewer’s expectations – unstudied genes are preferentially added to the title or abstract of articles that feature more genes in the title or abstract (Stoeger et al., Plos Biology, 2022; Fig. 2B). Normalizing by the total number of genes should thus preserve the pronounced division between well-studied genes and unstudied genes show in Figure 1B. In line with these predictions, we randomly select one gene per title/abstract and find that the effect remains (see new Figure S7).

      Author response image 1.

      Figure 1B is confusing in its present form. I think the plot and/or the legend need revising. For example, what "numbers to the right of each box plot" are the authors referring to? Also, I assume that the filled boxes are understudied genes and the empty/white box is "all genes", but that's not explained in the legend. In the main text, the figure is referred to with the sentence "we found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature ". I cannot follow how the figure shows this. My interpretation is that the y-axis is not showing the number of articles, but represents the percentage of articles mentioning a gene in the title/abstract, displayed on a log scale. If so, perhaps a better axis labels and legend text could be sufficient. But then one would also need to somehow connect this to the statement in the main text about the 20% highest-studied genes (a dashed line?). Alternatively, the authors could consider other ways of plotting these data, e.g. simply plotting the "% of publication in which a gene appears" from 0-100% or so.

      Reviewer 1 raised a similar point on overall figure clarity. We identified two interrelated elements that contribute to overall confusion and have now fixed them (see response to Reviewer 1 beginning on page 2 of this document).

      We attempted an alternative plotting of Fig 1B according to the reviewer’s suggestion. In the version below, the y-axis instead shows the percent of gene-related articles that are about each gene. We chose to keep the original y-axis (showing number of articles about each gene) as it additionally conveys the absolute scale of scholarship on individual genes.

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary and strengths

      The manuscript investigated the factors related to understudied genes in biomedical research. It showed that understudied are largely abandoned at the writing stage and identified biological and experimental factors associated with selection of highlighted genes.

      It is very important for the research community to recognize the systematic bias in research of human genes and take precautions when designing experiments and interpreting results. The authors have tried to profile this issue comprehensively and promoted more awareness and investigation of understudied genes.

      We thank the reviewer for their kind assessment of our work.

      Weaknesses

      Regarding result section 1 "Understudied genes are abandoned at synthesis/writing stage", the figures are not clear and do not convey the messages written in the main text. For example, in Figure 1B, figure S5 and S6,

      • There is no "numbers to the right of each box plot".

      The “numbers to the right” statement in the caption was an erroneous inclusion from an earlier version of the figure. We apologize for our error and have now removed this statement.

      • Do these box plots only show understudied genes? How many genes are there in each box plot? The definition and numbers of understudied genes are not clear.

      The x-axis describes genes featured in each stage of the publication process (from all protein-coding genes to genes found as hits in genome-wide screen to genes found in the title/abstract to genes found in the title/abstract of citing articles) and the y-axis describes the number of articles annotated to those genes. We have also now added the number of genes in each box plot to the figure. This information is also in Materials and Methods under each technology’s heading (see also response to Reviewer 1 beginning on page 2 of this document).

      Author response image 3.

      • "We found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature (Figure 1B)". This is not clear from the figure.

      We have revised Figure 1B and its caption to better communicate the main point of the figure: that genes which make it to the title/abstract of the reporting article tend to be more popular than genes which are hits in genome-wide experiments from those articles. We have added a horizontal line that shows the cutoff for the top 20% most popular genes.

      Regarding result section 2 "Subsequent reception by other scientists does not penalize studies on understudied genes", the authors showed in figure 2 that there is a negative correlation between articles per gene before 2015 and median citations to articles published in 2015. Another explanation could be that for popular genes, there are more low-quality articles that didn't get citations, not necessarily that less popular genes attract more citations.

      We believe that both explanations for the observed phenomenon are not mutually exclusive. Previously, we focused on the median of citations to articles about a gene to capture the typical effect. In a new analysis, we also find support for the possibility outlined by the reviewer and believe that adding this to our manuscript complements and balances our analysis of citations. Specifically, in the new Figure S8B we find that most popular genes are slightly more likely to be among least cited papers (and in Figure S8A that the least studied genes have been much more likely to be among the most cited papers). In-text, we state:

      “Further, since 1990, articles about the least popular genes have at times been 3 to 4 times more likely to be among the most cited articles than articles on the most popular genes whereas articles on the most popular genes have been slightly less to be highly cited than lowly cited (Figure S8)”.

      We thank the reviewer for their suggestion, which strengthens our manuscript. The figure caption reads:

      “Figure S8: Likelihoods of being highly cited (top 5% of citations among all articles about genes, panel a) or lowly cited (bottom 5% of citations among all articles about genes, panel b) for articles about the most popular genes (top 5% accumulated articles) versus articles about the least popular genes (bottom 5% accumulated articles) by year of publication. Only articles with a single gene in the title/abstract are considered. Shaded regions show ±1 standard error of the proportion."

      Author response image 4.

      Regarding result section 3 "Identification of biological and experimental factors associated with selection of highlighted genes", in Figure 3 and table s2, the author stated that "hits with a compound known to affect gene activity are 5.114 times as likely to be mentioned in the title/abstract in an article using transcriptomics", The number 5.144 comes out of nowhere both in the figure and the table. In addition, figure 4 is not informative enough to be included as a main figure.

      This is the result of both a typo and imprecise terminology. The number should read 4.262 (the likelihood ratio of being mentioned in the title/abstract between genes with and without a compound), which corresponds to an odds ratio of 4.331. We have clarified this in the table caption, stating:

      “e.g. hits with a compound known to affect gene activity are 4.262 times as likely to be mentioned in the title/abstract in an article using transcriptomics, corresponding to an odds ratio of 4.331".

      We have removed Figure 4 as a main-text figure and added a version, with revised color scheme along comments of Reviewer 1, as Figure S11. We added to the figure caption “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      • Fig 2a shows that papers highlighting understudied genes are actually cited more. I wonder why authors only looked at data before 2015. Fig 2b shows an increased correlation since 2015. Please consider redrawing Fig 2a to include data from 2015-2020?

      We highlight data from 2015 since, from our used version of iCite (v32, released July 2022, covering citations made through most of 2021), papers published in 2015 have had about 6 years to accumulate citations. With fewer years to accumulate citations, insufficient signal may cause correlation to converge toward zero. Below, we repeat the analysis in Figure 2 but only considering citations made within a year of an article’s publication, which substantially reduces correlation (although remaining significant).

      Author response image 5.

      We added a note to the figure caption:

      “We forgo depicting more recent years than 2015 to allow for citations to accumulate over multiple years, providing a more sensitive and robust readout of long-term impact.”

      For Figure 2B, we add:

      “For more recent years, where articles have had less time to accumulate citations, insufficient signal may cause correlation to converge toward zero.”

      • Can FMUG be posted on the web for easy access by researchers with non-computational backgrounds?"

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #2 (Recommendations for the authors):

      • Related to the first weakness in my public review: The observed disparity between CRISPR and GWAS study in terms of which genes they promote to the abstract is interesting. I wonder if this has to do with the application of these techniques. GWAS studies will often highlight that they retrieve known associations between a gene and a phenotype, to show that a screen is working. I guess often the point is to subsequently identify more genes associated with a particular phenotype, but often it is unclear how to validate/verify newly found associations. In contrast, CRISPR screens might be more focussed on functionally/mechanistically understanding unknown processes, e.g. observing a phenotype that appears/disappears in response to a gene deletion. In such studies, the follow-up of a previously unknown gene could be more straightforward and relevant to the outcome. Does that mean CRIPSR screens are better than GWAS studies for addressing the UG problem? Perhaps the authors could briefly discuss this issue.

      The number of studies we included featuring CRISPR screens is relatively small (n = 15 compared to n = 450 for GWAS). Thus, it is not possible to conclude in a statistically sound manner whether authors of CRISPR screens are truly more likely to highlight understudied genes.

      However, the reviewer raises compelling reasons for why this might be the case, and we now embed the broader discussion point that some techniques might be more powerful toward understudied genes.

      The discussion now includes:

      “Further, the observed discrepancy between the popularity of hits highlighted by GWAS versus other technologies suggests that some -omics technologies may be more powerful than others for characterizing understudied genes. This possibility merits further research and researchers participating in unknomics should consider the relative strengths of each technology towards providing tractable results for follow-up.”

      • Affinity capture mass spectrometry (Aff-MS): Perhaps I misunderstood this, but typically this is referred to as affinity purification MS (AP-MS)

      Thank you for the clarification. We have changed ‘Aff-MS’ to ‘AP-MS’ throughout the manuscript.

      • Page 3, line 96. The sentence "The first possibility is that seemingly understudied genes are, in fact, not understudied as they would rarely be identified through experiments.". Would they not still be understudied, just not intentionally?

      We have rephrased this sentence to:

      “The first possibility is that some genes are less studied because they are rarely identified as hits in experiments.”

      • Fig 4 is very interesting, but I also found it a bit confusing. First, the choice of colour scheme, where blue shows the absence and white shows the presence of something, seems counterintuitive, especially on a white background. Second, I find it confusing that only some of the experiments are labelled in the heatmap. Could the authors not simply use Fig S9 as Fig 4? Or alternatively, only include the 8 labelled factors in the simplified figure.

      In line with this feedback and that of Review #1 and #3, we have removed Figure 4 as a main-text figure and instead include this figure as Supplementary Figure S11. We have reversed the color scheme so that purple indicates one and white indicates zero. We also now label all factors. Previously we had only listed the default features of FMUG. We also now updated the figure legend to convey how it assisted the choice of default factors in FMUG. It reads:

      “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)”.

      • The FMUG app is fantastic and sounds exactly like something that is required to boost the visibility of understudied genes and overcome the understudied gene bias. However, I did not understand the choice of reporting this in the Discussion section.

      We thank the reviewer for their enthusiasm, and have now moved FMUG into the results section.

      • To further increase usability of the FMUG app, is there a way it could be deployed online? I appreciate this could require a major amount of coding work, which would not be reasonable to demand. So please consider this a suggestion, potentially for a future implementation.

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #3 (Recommendations for the authors):

      Table s2 and s3: p values are indicated by star signs. However, with so many hypothesis tests, the p values should be corrected for multiple tests.

      We have now applied Benjamini-Hochberg multiple hypothesis correction to these tables, correcting p-values within each of the four technologies. We update our significance calling to read:

      “We identified 45 factors that relate to genes and found 33 (12 out of 23 binary factors and 21 out of 22 continuous factors) associated with selection in at least one assay type at Benjamini-Hochberg FDR < 0.001.”

      Figure S1 - S4

      These figures contain too many noninformative boxes. In all the figures, only the last three boxes are informative (reports assessed for eligibility, reports excluded, and studies included in review). The rest boxes convey little information and should be simplified.

      We have simplified these diagrams, removing boxes which contained no information.

      Figure S6: what does it mean by "prior to the publication of the first article represented in this sample"? What is "this sample"?

      “This sample” refers to the collection of 450 GWAS articles, 296 articles using AP-MS, 148 transcriptomics articles, and 15 genome-wide CRISPR screen articles. We have rephrased this sentence to make this clear. It now reads:

      “Variant of Figure 1B only considering articles published in 2002 or before, prior to the publication of any of the articles featuring -omics experiments which we considered for this analysis.”

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study by Lee et al. is a direct follow-up on their previous study that described an evoluBonary conservancy among placental mammals of two moBfs (a transmembrane moBf and a juxtamembrane palmitoylaBon site) in CD4, an anBgen co-receptor, and showed their relevance for T-cell anBgen signaling. In this study, they describe the contribuBon of these two moBfs to the CD4-mediated anBgen signaling in the absence of CD4-LCK binding. Their approach was the comparison of anBgen-induced proximal TCR signaling and distal IL-2 producBon in 58-/- T-cell hybridoma expressing exogenous truncated version of CD4 (without the interacBon with LCK), called T1 with T1 version with the mutaBons in either or both of the conserved moBfs. They show that the T1 CD4 can support signaling to the extend similar to WT CD4, but the mutaBon of the conserved moBfs substanBally reduced the signaling. The authors conclude that the role of these moBfs is independent of the LCK-binding.

      Strengths:

      The authors convincingly show that T1 CD4, lacking the interacBon with LCK supports the TCR signaling and also that the two studied moBfs have a significant contribuBon to it.

      Weaknesses:

      The study has several weaknesses.

      (1) The whole study is based on a single experimental system, geneBcally modified 58-/- hybridoma. It is unclear at this moment, how the molecular moBfs studied here contribute to the signaling in a real T cell. The evoluBonary conservancy suggests that these moBfs are important for T cell biology. However, the LCK-binding moBf is conserved as well (perhaps even more) and it plays a very minor role in their model. Without verifying their results in primary cells, the quanBtaBve, but even qualitaBve, importance of these moBfs for T-cell signaling and biology is unclear. Although the authors discuss this issue in the Discussion, it should be noted in all important parts of the manuscript, where conclusions are made (abstract, end of introducBon, perhaps also in the Btle) that the results are coming from the hybridoma cells.

      We appreciate the Reviewer’s thoughWul comments and suggesBon. We now state in the abstract and introducBon that wet-lab experiments were performed with T cell hybridomas. We have also beXer highlighted work from Killeen and LiXman (PMID: 8355789) wherein they showed that C-terminally truncated CD4, which lacked the moBfs that mediate CD4-Lck interacBons, can drive CD4+ T cell development, proliferaBon, and T-helper funcBon because we now provide mechanisBc data to help explain those in vivo results. Also, as noted by the reviewer, we discuss how the sum of our data provides jusBficaBon for the investment in and use of mouse models to interrogate how the funcBonally important residues/moBfs idenBfied and studied here influence T cell biology.

      We will take the opportunity to reiterate here that, while the study is based on a well characterized, albeit single, wet-lab experimental system, the whole study is based on two lines of invesBgaBon. The other approach was a systems biology computaBonal approach that analyzes data from real-world experiments in a variety of jawed vertebrate species over evoluBon. Specifically, we used a computaBonal reconstrucBon of the evoluBonary history of CD4 by performing mulBple analyses of CD4 from 99 jawed vertebrates spanning ~435 million years of evoluBon. This analysis allowed us to idenBfy residues, and networks of evoluBonarily coupled residues, that are predicted to be funcBonally important in vivo. Like other systems biology approaches, this allowed us to look at the larger picture by evaluaBng data points that have emerged from constant tesBng and adjustments of CD4 funcBon in vivo through selecBon on an evoluBonary Bmescale in more jawed vertebrate species, and under more real-world condiBons, than can be tested in the laboratory. Our structure-funcBon analysis provided a second, wet-lab reducBonist experimental system to cross-validate that the residues idenBfied by our evoluBonary analysis are funcBonally significant. This experimental validaBon is criBcal and elevates the relevance of our studies above ad hoc observaBons. Our work also provides mechanisBc insights for why the residues studied here are funcBonally significant (i.e., key determinants of pMHCII-specific signaling iniBaBon). In short, using both systems allowed us to cross-validate the funcBonal significance of the residues within the GGXXG and (C/F)CV+C moBfs studied here by two independent methods.

      (2) Many of the experiments lack the negaBve control. I believe that two types of negaBve controls should be included in all experiments. First, hybridoma cells without CD4 (or with CD4 mutant unable to bind MHCII). Second, no pepBde control, i.e., acBvaBon of the hybridoma cells with the APC not loaded with the cognate pepBde. These controls are required to disBnguish the basal levels of phoshorylaBon and CD4-independent anBgen-induced phosphorylaBon to quanBfy, what is the contribuBon of the parBcular moBfs to the CD4-mediated support. Although these controls are included in some of the experiments, they are missing in other ones. The binding mutant appears in some FC results as a horizontal bar (without any error bar/variability), showing that CD4 does not give a huge advantage in these readouts. Why don't the authors show no pepBde controls here as well? Why the primary FC data (histograms) are not shown? Why neither of these two controls is shown for the % of responders plots? Although the IL-2 producBon is a very robust and convincing readout, the phosphoflow is much less sensiBve. It seems that the signaling is elevated only marginally. Without the menBoned controls and showing the raw data, the precise interpretaBon is not possible.

      These comments, and those in point #3, concern our flow cytometry-based analysis of early intracellular signaling events where we asked: how do the moBfs under invesBgaBon impact phosphorylaBon of CD3z, ZAP-70, and PLCg1 in response to agonist pMHCII? Thank you for poinBng out areas of confusion regarding these analyses. We will try to clarify here and have worked to clarify the text.

      Our approach was to mutate consBtuent residues within the moBfs that our evoluBonary analysis predicted to be funcBonally significant, compare the performance of the mutants to that of controls bearing WT moBfs, and then infer the funcBon of the moBfs based on the differenBal phenotype of the mutants relaBve to their controls. In most cases, the C-terminally truncated CD4-T1 mutant served as the appropriate CD4 control backbone against which to evaluate the phenotypes of the GGXXG and (C/F)CV+C moBf mutants. This is a convenBonal structure-funcBon strategy.

      All experiments included APCs expressing null pMHCII (Hb:I-Ek) as negaBve controls. These were a necessary component of the data analysis, explained further below, which involved background subtracBon of the signal from control or mutant T cell hybridomas bound to these negaBve control APCs from those bound to the agonist pMHCII (MCC:I-Ek). Doing so allowed us to establish a true signal over background for calculaBng percent responders and signaling intensity. These negaBve controls served the same purpose of APCs expressing I-Ek not loaded with cognate pepBde requested by the reviewer. It is important to note that we previously published that TCR-CD3-pMHCII interacBons reciprocally increase CD4-pMHCII dwell Bme, and vice versa, such that dwell Bmes of the 5c.c7 TCR and CD4 to the null Hb:I-Ek are both basal in this system relaBve to antagonist, weak agonist, and agonist pMHCII (PMID 29386113). A recent study using different techniques also concluded that TCR-CD3 and CD4 cooperaBvely enhance signaling to pMHCII (PMID 36396644). The use of the null pMHCII, Hb:I-Ek, in each experiment thus serves as a well-characterized negaBve control for both TCR and CD4 engagement in this experimental system with regards to assembly of the TCR-CD3 and CD4 around pMHCII to drive signaling. In our view, it is the most important negaBve control for interpreBng our results, and it is present in each experiment. In Fig 1B and related supplemental figures we compare the Cterminally truncated CD4-T1 mutant to the full-length WT CD4 to evaluate the contribuBons of the intracellular domains to early signaling events. We found no significant differences for pCD3z, pZAP-70, and pPLCg1 levels demonstraBng that, in our system, CD4 WT and T1 are staBsBcally indisBnguishable.

      In Fig 1C we asked: what is the contribuBon of CD4-pMHCII interacBons made by CD4 T1, which lacks the intracellular domain, using our CD4 T1Dbind mutant. Fig 2C and Table 3 show that pCD3z levels for T1Dbind were ~54% of T1, meaning that CD4 binding to pMHCII roughly doubles pCD3z levels (even without the intracellular domain). We also showed that the percent of responders were not different between the CD4 T1 and T1Dbind mutant in Fig 2C. The impact on ZAP-70 and PLCg1 are shown in Figure 2—figure supplement 4. These differences, including the magnitude of the decrease, were observed reproducibly (p<0.001) in three independently generated sets of lines. We believe that this analysis saBsfies the request by the reviewer for an analysis of the contribuBons of CD4 binding to pMHCII. We did not include this as a negaBve control in experiments evaluaBng the contribuBons of the GGXXG and (C/F)CV+C moBfs to CD4 T1 signaling because the quesBon being asked in those experiments was how do the moBfs impact signaling in the absence of the intracellular domain (i.e., within the CD4 T1 backbone, making CD4 T1 the proper comparator for the quesBon we were asking). We showed the average normalized intensity for the T1Dbind mutant, relaBve to T1, for this lower bound of signaling mediated by TCR-CD3-only as a doXed line in those figures to provide a reference point for the readers to evaluate and put into perspecBve how the mutants we generated impacted the overall contribuBon of CD4 to these early signaling events. The T1Dbind mutants were not always measured in the same experiment at the same Bme with other mutants, because the cell lines used were not always made at the same Bme, so we did not think it appropriate to graph the results together.

      We do not know how to interpret the comment “Although the IL-2 producBon is a very robust and convincing readout, the phosphoflow is much less sensiBve. It seems that the signaling is elevated only marginally.” We will offer our perspecBve that we do not know how to equate the sensiBvity of the phos-flow to the IL-2. Because the IL-2 is a signaling output, it results from signaling amplificaBon from the membrane to the nucleus. If CD3z phosphorylaBon is the iniBaBng event for a signaling cascade that leads to IL-2 gene transcripBon and transducBon, as is widely believed, our data strongly suggests that the ~2-fold difference in pCD3z levels between CD4 T1 and T1Dbind (Fig 2C/Table 3 data) contributes to the difference between no IL-2 output for T1Dbind and IL-2 output by T1 in this experimental system. Because CD4 WT and T1 have significantly different levels of IL-2 output, but show no significant differences in pCD3z, pZAP-70, or pPLCg1 levels, there are likely to be other differences we did not measure via other pathways that intersect at the nucleus. At many levels, biology works on gradients such that small differences can Bp a system in one direcBon or another. The kineBc discriminaBon model (PMID 8643643), which is thought to be a reasonable descripBon of the relaBonship between pMHC engagement and signaling outcomes, suggests that very small differences in molecular interacBons at the earliest stages of a response can lead to big differences in signaling outcome. We therefore have no basis at this juncture to think that ~2-fold differences in pCD3z levels could not account for bigger differences in signaling output such as IL-2.

      (3) The processing of the data is not clear. Some of the figures seem to be overprocessed. For instance, I am not sure what "Normalized % responders of pCD3zeta" means (e.g., Fig. 1C and elsewhere)? Why do not the authors show the actual % of pCD3zeta+ cells including the gaBng strategy? Why do the authors subtract the two histograms in Fig. 2- Fig.S3? It is very unusual.

      We did develop and implement a novel strategy for measuring the impact of our mutaBons on CD3z, ZAP-70, and PLCg1 phosphorylaBon. This was explained in more detail in our prior study. The instrucBons to authors indicated that we should not repeat methods in the current manuscript. However, we will go through the approach here, and address why we did not show primary FC histograms for all experiments from above. First, we think that a brief explanaBon as to what moBvated us to develop our approach will add to a beXer understanding:

      (1) For experimental and staBsBcal rigor, our goal was to perform both experimental and biological replicates by measuring and comparing the average of at least three independently generated sets of paired WT/T1 control Vs. mutant cells lines generated at different Bmes to determine the staBsBcal significance of the difference, if any, between averages of the control and mutant lines.

      (2) Our quesBons necessitated that we measure signals generated naturally by the cooperaBve engagement of cognate pMHCII by TCR-CD3 and CD4 on APCs, rather than through aCD3/aCD4 crosslinking.

      (3) We chose to use flow cytometry rather than bulk cell analysis by Western Bloung to analyze signaling occurring in cells that were engaged to the agonist APC in order to avoid diluBon of that signal by cells that are not engaged to APCs and not signaling. 4. For each experiment, we wanted to subtract background signals from cells bound to APCs expressing a null pMHCII (Hb:I-Ek) from signals generated by cells bound to APCs expressing agonist pMHCII (MCC:I-Ek). Doing so allowed us to idenBfy cells that are signaling (responders) to agonist over null pMHCII. The goal here was to quanBtate the level of signaling in an objecBve manner with a method that can be applied to all samples uniformly rather than seung a flow cytometry gate on posiBve cells (e.g. pCD3z) because gaBng is subjecBve and can vary from experiment to experiment. To put that another way, as detailed below, we used our subtracBon method to idenBfy signaling responders rather than seung a signaling gate on the posiBve populaBon.

      Regarding gaBng schemes, controls, and data processing:

      Figure 2—figure supplement 3 of the current study and Figure 6—figure supplement 1 of our prior study are designed to walk the reader through our experimental design, gaBng, data processing and thinking. Here we will provide a detailed explanaBon to complement the figure legend as well as the methods provided in our prior manuscript (see pt #4 below).

      We will refer to Figure 2—figure supplement 3 here:

      Panel A. The dot plots show our approach to idenBfying 5c.c7+ CD4+ 58a-b- T cell hybridomas (yaxis, GFP posiBve) coupled to M12 cells (x-axis, TagIt Violet) expressing the null pMHCII Hb:I-Ek (lev) or agonist pMHCII MCC:I-Ek (right). The gaBng shows the frequency of GFP+ T cell hybridomas that are bound to TagIt violet posiBve APCs (i.e., cell couples). The histogram on the right then shows the staining intensity for pCD3z on the x-axis for the 10,000 coupled events collected wherein the APCs express the null pMHCII (filled cyan) or the agonist pMHCII (black line).

      Panel B. The data presented here is the same as in Panel A, but for CD4 T1 cells.

      Panel C. The data presented here walks through how we idenBfy 5c.c7+ CD4+ 58a-b- T cell hybridomas responding (i.e., signaling) to agonist pMHCII, as well as the mean signaling intensity of the responding populaBon, in a gaBng-independent manner aver background subtracBon. For the lev graph, we exported the data for the histograms shown in Panel A from FlowJo 10 sovware and ploXed them here using Prism 9 as smoothed lines (500 nearest neighbors). The cyan line is therefore a replicate of the flow cytometry histogram shown in Panel A for pCD3z intensity from 5c.c7+ CD4+ 58a-b- T cell hybridomas coupled to M12 cells expressing the null pMHCII (Hb:I-Ek), while the black histogram is a replicate of the pCD3z intensity for 5c.c7+ CD4+ 58a-b- T cell hybridomas coupled to M12 cells expressing the agonist pMHCII (MCC:I-Ek). Next, to idenBfy the responding populaBon in a gaBng-independent manner, we used Excel to subtract the pCD3z intensity for the null pMHCII (cyan) negaBve control populaBon on a bin-by-bin bases from the pCD3z intensity for the agonist pMHCII (black) responding populaBon. We then transferred the background subtracted values to Prism 9 for smoothing and ploung (grey line: MCC:I-Ek minus Hb:I-Ek). The middle graph shows the same data processing for the data from Panel B for the CD4 T1 cells. Please note that the background subtracted grey line has negaBve values and posiBve values. The negaBve values represent intensity bins where signaling in response to agonist pMHCII leads to fewer cells per bin than in the null pMHCII populaBon that is not signaling, while the posiBve values represent bins of intensity where signaling cells outnumber non-signaling cells. The right graph in this panel shows the populaBons aver background subtracBon for intensity bins that had more cells with pCD3z signal in the agonist pMHCII populaBon than the null pMHCII populaBon (grey = WT full length CD4 and blue = T1). In short, the right graph shows idenBficaBon of those cells that are signaling in response to agonist pMHCII. This approach miBgated the need for subjecBve gaBng in FlowJo to idenBfy signaling cells (i.e., pCD3z posiBve) and allowed for background subtracBon which could not be done in FlowJo. We used this approach for all analyses of pCD3z, pZAP-70, and pPLCg1 in this study.

      The number of cells in these background-subtracted populaBons were divided by 10,000 (the number of events collected and analyzed) to calculate the percent of responding 5c.c7+ CD4+ 58a-b- T cell hybridomas, while the mean fluorescent intensity for the cells within these populaBon represent the signaling intensity.

      Panel D. The graph on the lev shows the mean fluorescence intensity (MFI) ± SEM for the posiBve signaling populaBon from the right graph of panel C. We see in this example comparing a WT and T1 cell line, generated at the same Bme from the same parental 58a-b- T cell hybridoma populaBon, that the T1 MFI is significantly greater than the WT. These intensity values represent one of the paired intensity values used in the main Fig 2B (Lev graph), where we show the paired MFI analysis of responding populaBons from 5 independently generated sets of cell lines. Please note that these single MFI values are directly derived from the flow cytometry histograms aver background subtracBon. Figure 2B, and similar figures, therefore equate to a disBllaBon of all of the histograms for the populaBons tested in a manner that we consider easier to digest than either overlaying all histograms or showing mulBple panels individually. It also conserves more space. This is why we only showed representaBve flow cytometry histograms, rather than all histograms.

      The graph on the right shows the % responders for the posiBve signaling populaBon from the right graph of panel C. Specifically, the total number of cells that were determined to be signaling in response to agonist pMHCII was divided by 10,000 (the number of coupled cells collected by flow cytometry) to determine the percent responders. These values represent one of five sets of values used to determine the average normalized percent responders (all normalized to WT). There was no significant difference between these two populaBons in terms of percent responders.

      Regarding graphing normalized values for the mean MFI for signaling intensity or the percent responders: in our first manuscript, we presented the individual MFI intensity values for matched pairs of cells as well as the actual percent responders per group. The feedback we received from colleagues on this presentaBon was that it was confusing, distracBng, and otherwise hard to digest. It was suggested to us by mulBple individuals that the normalized values would be preferable because it is easier and faster to understand. Upon reflecBon, we agreed with this feedback because the normalized presentaBon with staBsBcs allows for the two key relevant quesBons to be quickly evaluated: 1. Are the mutants different than the control? 2. By how much? We have lev the raw intensity values and well as the normalized intensity values in the version of record. Given the Reviewer’s comments, we have now graphed the average % responders instead of normalized values in the figures, and lev the normalized values in Table 3.

      (4) The manuscript lacks Materials and Methods. It only refers to the previous paper, which is very unusual. Although most of the methods are the same, they sBll should be menBoned here. Moreover, some of the mutants presented here were not generated in the previous study, as far as I understand. Perhaps the authors plan to include Materials and Methods during the revision...

      Because we submiXed this as a Research Advances arBcle we followed the journal instrucBons to reference the Materials and Methods in our prior publicaBon, upon which this work builds, as the methods used are the same. They are detailed in that study. We have now included a copy of the Materials and Methods for the eLife staff to determine how best to link with this manuscript. We have also included the gene sequences for the novel constructs used in this study. Thank you for poinBng out the omission.

      (5) Membrane rafts are a very controversial topic. I recommend the authors stick to the more consensual term "detergent resistant microdomains" in all cases/occurances.

      We agree this is a controversial topic with a variety of viewpoints. Because we are not experts in the field of membrane composition, we turned to the literature to inform our view of how best to refer to these membrane subdomains. In our reading, we found a 2006 meeting report from a Keystone symposium on lipid rafts and cell function authored by Linda Pike (PMID 16645198). At this meeting, a central focus was reaching a consensus on how best to refer to these domains. The consensus term agreed upon by this group was “membrane rafts”. Specifically, we will quote from this report published in the Journal of Lipid Research, ‘Together, the discussions permitted the generation of a definition for “lipid rafts” in an ad hoc session on the final day of the meeting. All participants were invited to contribute to this effort, and the work product reflects the consensus of this broad-based group…… First and foremost, the term “lipid raft” was discarded in favor of the term “membrane raft.”’ We chose to use the term “membrane raft” based on this consensus opinion.

      (6) Last, but not least, the mechanistic explanation (beyond the independence of LCK binding) of the role of these motifs is very unclear at the moment.

      We agree with this comment. One goal in making these results, and those in our prior study, available to the field at large is to provide evidence in support of our view that the dominant paradigm that is thought to explain the earliest events in T cell signaling needs re-evaluating. How T cell signaling is initiated in response to pMHCII is clearly more complex than is currently thought. However, out data is inconsistent with the dominant paradigm in which CD4 recruits Lck to TCR-CD3 to phosphorylate ITAMs to initiate signaling.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kuhn and colleagues follows upon a 2022 eLife paper in which they identified residues in CD4 constrained by evolutionary purifying selection in placental mammals and then performed functional analyses of these conserved sequences. They showed that sequences distinct from the CXC "clamp" involved in recruitment of Lck have critical roles in TCR signaling, and these include a glycine-rich motif in the transmembrane (TM) domain and the cyscontaining juxtamembrane (JM) motif that undergoes palmitoylation, both of which promote TCR signaling, and a cytoplasmic domain helical motif, also involved in Lck binding, that constrains signaling. Mutations in the transmembrane and juxtamembrane sequences led to reduced proximal signaling and IL-2 production in a hybridoma's response to antigen presentation, despite retention of abundant CD4 association with Lck in the detergent-soluble membrane fraction, presumably mislocalized outside of lipid rafts and distal to the TCR. A major conclusion of that study was that CD4 sequences required for Lck association, including the CXC "clasp" motif, are not as consequential for CD4 co-receptor function in TCR signaling as the conserved TM and JM motifs. However, the experiments did not determine whether the functions of the TM and JM motifs are dependent on the Lck-binding properties of CD4 - the mutations in those motifs could result in free Lck redistributing to associate with CD4 in signaling-incompetent membrane domains or could function independently of CD4-Lck association. The current study addresses this specific question.

      Using the same model system as in the earlier eLife paper (the entire methods section is a citation to the earlier paper), the authors show that truncation of the Lck-binding intracellular domain resulted in a moderate reduction in IL-2 response, as previously shown, but there was no apparent effect on proximal phosphorylation events (CD3z, Lck, ZAP70, PLCg1). They then evaluated a series of TM and JM motif mutations in the context of the truncated Lck-nonbinding molecule, and showed that these had substantially impaired co-receptor function in the IL-2 assay and reduced proximal signaling. The proximal signaling could be observed at high ligand density even with a MHC non-binding mutation in CD4, although there was still impaired IL-2 production. This result additionally illustrates that phosphorylation of the proximal signaling molecules is not sufficient to activate IL-2 expression in the context of antigen presentation.

      Strengths:

      The strength of the paper is the further clear demonstration that the classical model of CD4 coreceptor function (MHCII-binding CD4 bringing Lck to the TCR complex, for phosphorylation of the CD3 chain ITAMs and of the ZAP70 kinase) is not sufficient to explain TCR activation. The data, combined with the earlier eLife paper, further implicate the gly-rich TM sequence and the palmitoylation targets in the JM region as having critical roles in productive co-receptordependent TCR activation.

      Weaknesses:

      The major weakness of the paper is the lack of mechanistic insight into how the TM and JM motifs function. The new results are largely incremental in light of the earlier paper from this group as well as other literature, cited by the authors, that implicates "free" Lck, not associated with co-receptors, as having the major role in TCR activation. It is clear that the two motifs are important for CD4 function at low pMHCII ligand density. The proposal that they modulate interactions of TCR complex with cholesterol or other membrane lipids is an interesting one, and it would be worth further exploring by employing approaches that alter membrane lipid composition. The JM sequence presumably dictates localization within the membrane, by way of palmitoylation, which may be critical to regulate avidity of the TCR:CD4 complex for pMHCII or TCR complex allosteric effects that influence the activation threshold. Experiments that explore the basis of the mutant phenotype could substantially enhance the impact of this study.

      We appreciate these thoughtful comments and suggestions. We will restate what we wrote in our preliminary response to the reviews to explain the scope of the current study:

      To address comments about the limited scope of this study and referencing of the Methods secBon to our prior study, we would like to note that we submiXed the current study via the Research Advance mechanism. Our goal was to build upon the conclusions of our 2022 eLife publicaBon (PMID: 35861317) and address an unresolved quesBon from that study (as nicely summarized by Reviewer #2). In the current manuscript we present data from reducBonist experiments that were designed specifically for this purpose and, as noted by the reviewers, we provide answers to the quesBon being asked. We think that the Research Advance mechanism is an ideal opportunity to make these results available to the field given the stated purpose of such arBcles (for reference: “A Research Advance might use a new technique or a different experimental design to generate results that build upon the conclusions of the original research by, for example, providing new mechanis=c insights or extend the pathway under inves=ga=on…”). Now that we have provided evidence that CD4 does not recruit Lck to phosphorylate TCR-CD3 ITAMs in our system, nor do the GGXXG and (C/F)CV+C motifs play a role in enabling CD4 to regulate Lck proximity to TCR-CD3, we agree that it is important to form and test alternative hypotheses for how TCR-CD3 signaling is initiated.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study addresses the fundamentally unresolved question of why many thousands of small-effect loci contribute more to the heritability of a trait than the large-effect lead variants. The authors explore resource competition within the transcriptional machinery as one possible explanation with a simple theoretical model, concluding that the effects of resource competition would be too small to explain the heritability effects. The topic and approximation of the problem are very timely and offer an intuitive way to think about polygenic variation, but the analysis of the simple model appears to be incomplete, leaving the main claims only partially supported.

      We thank eLife for recognizing the importance of our work. We hope the revised manuscript addresses the reviewers’ reservations.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study explores whether the extreme polygenicity of common traits can be explained in part by competition among genes for limiting molecular resources (such as RNA polymerases) involved in gene regulation. The authors hypothesise that such competition would cause the expression levels of all genes that utilise the same molecular resource to be correlated and could thus, in principle, partly explain weak trans-regulatory effects and the observation of highly polygenic architectures of gene expression. They study this hypothesis under a very simple model where the same molecule binds to regulatory elements of a large number m of genes, and conclude that this gives rise to trans-regulatory effects that scale as 1/m, and which may thus be negligible for large m.

      We thank the reviewer for their thorough and thoughtful review of our manuscript.

      The main limitation of this study lies in the details of the mathematical analysis, which does not adequately account for various small effects, whose magnitude scales inversely with the number m of genes that compete for the limiting molecular resource. In particular, the fraction of "free" molecule (which is unbound to any of the genes) also scales as 1/m, but is not accounted for in the analysis, making it difficult to assess whether the quantitative conclusions are indeed correct.

      It is explicitly accounted for in the supplement.

      Second, the questions raised in this study are better analysed in the framework of a sensitivity or perturbation analysis, i.e., by asking how changes in expression level or binding affinity at one gene (rather than the total expression level or total binding affinity) affect expression level at other genes. In the context of complex traits, where an increase in gene expression can either increase or decrease the trait, we believe the most important quantity of interest is variation in expression and, therefore, trait variation. Nevertheless, our results do show that the relative change in expression due to competition is also small.

      Thus, while the qualitative conclusion that resource competition in itself is unlikely to mediate trans-regulatory effects and explain highly polygenic architectures of gene expression traits probably holds, the mathematical reasoning used to arrive at this conclusion requires more care.

      In my opinion, the potential impact of this kind of analysis rests at least partly on the plausibility of the initial hypothesis- namely whether most molecular resources involved in gene regulation are indeed "limiting resources". This is not obvious, and may require a careful assessment of existing evidence, e..g., what is the concentration of bound vs. unbound molecular species (such as RNA polymerases) in various cell types?

      We intentionally looked at the most extreme case of extreme resource limitation, and we conclude that since extreme resource limitation is a small effect, the same would be true of weak resource limitation, when unbound molecules play an important role. We put more emphasis on this point in our revised text.

      Reviewer #1 (Recommendations For The Authors):

      While the main conclusion that resource competition in itself is unlikely to mediate trans effects and explain high levels of polygenicity may well be correct, I am not convinced that the mathematical reasoning presented in support of this conclusion is entirely correct. I will attempt to outline my concerns mainly in the context of section 2, since the arguments in sections 3 and 4 build upon this.

      (a) The key assumption underlying the approximations in equations 3, 4, and 5 is that there is very little free polymerase, in other words /_0 is a small quantity. However, the second and third terms that emerge in equation 7 are also small quantities and (as far as I can see) of the same order as /_0. Thus, one cannot simply use equation 4 or 5 as a starting point to derive eq. 7 and should instead use the exact x_i = (g_i [G])/ (1+g_tot [G]), in order to make sure that all (and not just some) terms that are similar in order of magnitude are accounted for in the analysis.

      The concentration of free polymerase is marked as [P], and we explicitly assume (just before eq. 2) that [P]<<[P]0 with [P]0 being the overall concentration of polymerase. This is a conservative assumption – we consider extreme resource competition with little free polymerase and since we since only a small effect in this extreme scenario we assume it would be a small effect also for less extreme scenarios. We put more emphasis on this point in our revised text.

      More concretely, the difference between the exact x_i = (g_i [G])/ (1+g_tot [G]) and the approximate x_i = (g_i / g_tot) is precisely 1/m (for large m) in the example considered line 246 onwards. Thus, I suspect that the conclusion that Var[x_i] = (1-1/m)Var[g_i] in that example is just an artefact of starting with eqs. 4 and 5. As a sanity check, it may be useful to actually simulate resource competition explicitly (maybe using a deterministic simulation) under the explicit model [PG_i] = g_i [G] and _0 = + Sum[[PG]_i , i=1,m] without making any further approximations to see if perturbations in g_i actually produce Order [1/m] effects in the variance of x_i for the example considered line 246 onwards (this would require simulating with a few different m and plotting Var[x_i] vs. m for example).

      The exact equation the reviewer is alluding to describes a scenario of non-extreme resource competition. If g_tot [G]>>1, i.e. if most polymerase is bound to a gene then x_i is equal to g_i/g_tot and this is the scenario we are considering of extreme competition. If g_tot [G]<<1, then x_i=g_i [G] and competition has no effect. While the intermediate case is interesting, we see no reason for the effects to be larger than in the extreme competition case. We have added the results of simulations in the supplement to validate our arguments.

      Lines 231-239: Because of the concerns highlighted above and questions about the validity of equation 7, I am not convinced that the interpretations given here and also in section 4 are correct.

      (b) Lines 219-230 (including equations 6 and 7): I think to address the question of whether genetic changes in cis-regulatory elements for a given gene have an effect on other genes (under this model of resource competition), it is better to spell out the argument in terms of Var[ dx_i ] rather than Var[x_i], where dx_i is the change in expression level at gene i due to changes at all m genes, dg_i is the change in gene activity due to (genetic) changes in the relevant regulatory elements associated with gene i etc. Var[ dx_i ] can then be expressed as a sum of Var[dg_i], Var[dg_tot] and Cov[d g_i, dg_tot]. However, I suspect that to do this correctly, one should not start with the approximate x_i=g_i/g_tot : see previous comment.

      The variance of the deviation from the mean is mathematically identical to the overall variance, Var[ dx_i ]= Var[ x_i ]. Our analysis is therefore equivalent to the suggested analysis.

      Somewhere in all of this, there is also an implicit assumption that E[dg_i] is zero, i.e, mutations are as likely to increase as to decrease binding affinities so that one needs to only consider Var[dx_i] and not E[dx_i]; this assumption should be spelled out.

      Our results concern the variation around trait means and therefore we have not included a possible mean effect of mutation, which would not affect the results but just shift the mean.

      Some minor comments (mostly related to the introduction and general context):

      • I think it would be worth connecting more with the literature on molecular competition and gene regulation (see e.g., How Molecular Competition Influences Fluxes in Gene Expression Networks, De Vos et al, Plos One 2011). Even though this literature does not frame questions in terms of "polygenicity of traits", these analyses address the same basic questions: to what extent do perturbations in gene expression at one gene affect other genes, or to what extent is there crosstalk between different genes or pathways?

      We have expanded our introduction to refer to De Vos et al, as well as a few other papers we have recently become aware of. (e.g., Jie Lin & Ariel Amir Nature Communications volume 9, Article number: 4496 (2018))

      • Lines 88-89: "supports the network component of the model" is a vague phrase that does not convey much. It would be useful to clarify and make this more precise.

      We have clarified this phrasing in the text.

      • Lines 113-114: In the context of "selective constraint", it may also be worth discussing previous work by one of the authors: "A population genetic interpretation of GWAS findings for human quantitative traits". What implications would stabilizing selection on multiple traits (as opposed to simple purifying selection) have for the distribution of variances across trait loci and the extent to which trait architectures appear to be polygenic?

      While most definitely of great interest to some of the authors, the distribution of variance across loci does not affect our results.

      References: Barton and Etheridge 2018 in line 54 is not the correct reference; it should be Barton et al 2017 (paper with Amandine Veber). Fisher 1919 in line 52 is actually Fisher 1918. The formatting of references in the next paragraph (and in various other places in the paper) is also a bit unusual, with some authors referred to by their full names and others only by their last. I believe that it may be useful to crosscheck references throughout the paper.

      We have crosschecked the references in the paper.

      Line 164: Some word appears to be missing here. Maybe bound -> bound to ?

      Fixed

      Reviewer #2 (Public Review):

      The question the authors pose is very simple and yet very important. Does the fact that many genes compete for Pol II to be transcribed explain why so many trans-eQTL contribute to the heritability of complex traits? That is, if a gene uses up a proportion of Pol II, does that in turn affect the transcriptional output of other genes relevant or even irrelevant for the trait in a way that their effect will be captured in a genome-wide association study? If yes, then the large number of genetic effects associated with variation in complex traits can be explained but such trans-propagating has effects on the transcriptional output of many genes.

      This is a very timely question given that we still don't understand how, mechanistically, so many genes can be involved in complex traits variation. Their approach to this question is very simple and it is framed in classic enzyme-substrate equations. The authors show that the trans-propagating effect is too small to explain the ~70% of heritability of complex traits that are associated with trans-effects. Their conclusion relies on the comparison of the order of magnitude of a) the quantifiable transcriptional effects due to Pol II competition, and b) the observed percentage of variance explained by trans effects (data coming from Liu et al 2019, from the same lab).

      The results shown in this manuscript rule out that competition for limited resources in the cell (not restricted to Pol II, but applicable to any other cellular resource like ribosomes, etc) could explain the heritability of complex traits.

      We thanked the Reviewer for his resounding support of our paper!

      Reviewer #2 (Recommendations For The Authors):

      The authors rely on simulated data, and although the conclusions hold in a biologically-realistic scenario given the big difference in effect sizes, I wonder if the authors could provide data from the literature (if available) that give the reader a point of reference for the steady state of cells in terms of free/occupied Pol II molecules and/or free/occupied transcription binding sites. This information won't change the conclusion of the manuscript, but it will put it in the context of real biological data.

      We have scoured the literature, but have not found readily available data with which to validate our results (beyond that which is already referenced).

      Reviewer #3 (Public Review):

      Human complex traits including common diseases are highly polygenic (influenced by thousands of loci). This observation is in need of an explanation. The authors of this manuscript propose a model that competition for a single global resource (such as RNA polymerase II) may lead to a highly polygenic architecture of traits. Following an analytical examination, the authors reject their hypothesis. This work is of clear interest to the field. It remains to be seen if the model covers the variety of possible competition models.

      We thank the Reviewer for his assessment, support and comments.

      Reviewer #3 (Recommendations For The Authors):

      This manuscript provides a straightforward and elegant quantitative argument that the competition for the RNA polymerase is not a significant source of trans-eQTLs and, more generally, of genetic variance of complex polygenic phenotypes. This is an unusual manuscript because the authors propose a hypothesis that they confidently reject based on a calculation. This negative result is intuitive. Still, the manuscript is of interest. Progress in understanding the highly polygenic architecture of complex traits is welcome, and the resource competition hypothesis is quite natural. I have three specific comments/concerns listed below.

      (1) The manuscripts states that V(x_i)=V(g_i/g_tot). Unless I am missing something, this seems to result from a very strong implicit assumption that all genetic variance is due to variation in the binding of RNA polymerase, while x_i_max is a constant. I would expect that x_i_max may also be genetically variable due to many effects unrelated to the Pol II binding (e.g. transcription rate, bursting, presence of R-loops etc.). I guess that the assumption made by the authors is conservative.

      Indeed. We made conservative assumptions throughout, aiming to consider the most extreme scenario in which resource competition may affect trait variation. Our logic being that if even under the most extreme scenario resource competition is a small effect then it is a small effect in all scenarios. We put more emphasis on this point in our revised text.

      (2) The manuscript focuses on the competition for RNA polymerase but suggests that the lesson learned is highly generalizable. However, it is an example of a single global limiting resource resulting in first-order kinetics. What happens in a realistic scenario of competition for multiple resources associated with transcription and with downstream processes (free ribonucleotides, spliceosome, polyadenylation machinery, ribosome, post-translational modifications)? It is possible that in most cases a single resource is a limiting factor, but an investigation (or even a brief discussion) of this question would support the claim that the results are generalizable.

      We expect competition for multiple resource to result in similarly weak effects. Since there is not a great number of such resources, we do not expect it to change our qualitative result. We added language to that effect in the main text.

      (3) Alternatively, what happens in a scenario of competition for multiple local resources shared by a few genes (co-factors, substrates, chaperones, micro-RNAs, post-translational modification factors such as kinases, degradation factors, scaffolding proteins)? In this case, each gene would compete for resources with a few other genes increasing polygenicity without a global competition with all other genes. Intuitively, a large set of such local competitions may lead to a highly polygenic architecture.

      This is indeed a scenario in which competition may be a large effect which we mention in our discussion. “the conclusions may differ in contexts where a very small number of genes compete for a highly limited resource, such as access to a particular molecular transporter”

    2. Reviewer #1 (Public Review):

      This study explores whether the extreme polygenicity of common traits (the fact that variation in such traits is explained by a very large number of genetic variants) could be explained in part by competition among genes for limiting molecular resources involved in gene regulation, which would cause the expression of most genes to be correlated. While the hypothesis is interesting, I still have some concerns about the analysis and interpretation.

      As the authors say in their rebuttal, assuming extreme resource limitation, i.e., going from equation 2 to 5 essentially assumes assuming that 1/(gtot [G] ) <<1 and that terms that are order [ 1/(gtot [G] ) ] can neglected. However, then the authors derive so-called resource competition terms that are order (1/m) where m is the number of genes, so that gtot is proportional to m. My main criticism (which I am not sure was addressed) is thus: can we reliably derive small order (1/m) effects while neglecting order [ 1/(gtot [G] ) ] terms, when both are presumably similar in order of magnitude? Is this mathematically sound?

      I do not think the supplement that the authors have added actually gets to this. For example, section 7.1 just gives the textbook derivation of Michelis-Menten kinetics, and does not address my earlier criticism that the terms neglected in going from eq. 16 to eq. 17 (or from eq. 2 to 3) may be similar in magnitude to the terms being derived and interpreted in eqs. 6 and 7.<br /> Similarly, it is unclear from section 7.2 how the authors are doing the simulations. Are these true Michelis-Menten simulations involving equation 2? If yes, then what is the value of [G] and [P_0] in the simulations? If these are not true Michelis-Menten simulations, but instead something that already uses equation 5, then this still does not address my earlier criticism.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their reading of the manuscript, and their suggestions. We have extensively addressed all these concerns in the text, and also included several new data and figures in the revised version of the manuscript. We hope that our response and the new experimental data fully address the concerns raised by the reviewers. We include a detailed, pointby-point response to each of the reviewer concerns, pointing to new data and specific changes made in the main manuscript.

      Note: Do note that these new data have resulted in a new figure-figure 6, a new supplementary figure -figure 2-figure supplement 2, and an increase in the number of panels in each figure, as well as supplementary figures.

      General response comments, highlighting a few aspects missed by the reviewers

      This manuscript has an enormous amount of data in it. This is understandable, since in part we are proposing an entirely new hypothesis, and way to think about mitochondrial repression, built around substantial circumstantial evidences from diverse literature sources. But to keep the narrative readable and the main idea understandable, a lot of information had to be only very briefly mentioned in the text, and is therefore included as supplemental information. Due to that, it may not always be apparent that this study has set several technical benchmarks. These experiments are extremely challenging to perform, took many iterations to standardize, and in themselves are a first in the field. Yeast cells have the highest known rate of glycolytic flux for any organism. Measuring this glycolytic rate using the formation of intermediates is hard, and all current estimates have been in vitro, and using a stop-flow type set up. In this study, we optimized and directly measured the glycolytic flux using isotope labelled glucose (13C-glucose), which has never been reported before in highly glycolytic cells such as yeast. This is due to the very rapid label saturation (within seconds) after 13C glucose pulse (as is now shown in the figure 2-figure supplement 1). For brevity, this is summarized in this study with sufficient information to reproduce the method, but we will put out a more detailed, associated methodology paper describing several challenges, infrastructure requirements, and resources to be able to carry out these types of experiments using yeast. An added highlight of these experiments with WT and Ubp3 deletion strains is the most direct till date experimental demonstration that glycolytic flux in yeast in high glucose follows zero-order kinetics, and depends entirely on the amounts of the glycolytic enzymes (presumably operating at maximal activity). This nicely complements the recent study by Grigatis 2022 (cited in the discussion), that suggests this possibility.

      Separately, this study required the estimation of total inorganic phosphates, as well as mitochondrial pools of phosphates. Till date, there are no studies that have estimated mitochondrial pools of phosphate (for a variety of reasons). In this study, we also experimentally determined the changes in mitochondrial phosphate pools. For this, we had to establish and standardize a rapid mitochondrial isolation method in yeast. Thus, this study provides the first quantitative estimates of mitochondrial Pi amounts (in the context of measured mitochondrial outputs), as shown now in Figure 4. This component on mitochondrial isolation in yeast to assess metabolites may also be explored in future as a methods paper.

      Specific responses to the Reviews:

      Reviewer #1 (Public Review):

      The study by Vengayil et al. presented a role for Ubp3 for mediating inorganic phosphate (Pi) compartmentalization in cytosol and mitochondria, which regulates metabolic flux between cytosolic glycolysis and mitochondrial processes. Although the exact function of increased Pi in mitochondria is not investigated, findings have valuable implications for understanding the metabolic interplay between glycolysis and respiration under glucose-rich conditions. They showed that UBP3 KO cells regulated decreased glycolytic flux by reducing the key Pidependent-glycolytic enzyme abundances, consequently increasing Pi compartmentalization to mitochondria. Increased mitochondria Pi increases oxygen consumption and mitochondrial membrane potential, indicative of increased oxidative phosphorylation. In conclusion, the authors reported that the Pi utilization by cytosolic glycolytic enzymes is a key process for mitochondrial repression under glucose conditions.

      (1) However, the main claims are only partially supported by the low number of repeats and utilizing only one strain background, which decreased the overall rigor of the study. The fullpower yeast model could be utilized with testing findings in different backgrounds with increased biological repeats in many assays described in this study. In the yeast model, it has been well established that many phenotypes are genotype/strain dependent (Liti 2019, Gallone 2016, Boekout 2021, etc...). with some strains utilizing mitochondrial respiration even under high glucose conditions (Kaya 2021). It would be conclusive to test whether wild strains with increased respiration under high glucose conditions would also be characterized by increased mitochondrial Pi.

      “However, the main claims are only partially supported by the low number of repeats and utilizing only one strain background, which decreased the overall rigor of the study. The full-power yeast model could be utilized with testing findings in different backgrounds with increased biological repeats in many assays described in this study.”

      Thank you for the suggestion. We agree that a larger, universal statement cannot be made with data from a single strain, since yeasts do have substantial diversity. In this study, we had originally used a robust, prototrophic industrial strain (CEN.PK background). We have now utilized multiple, diverse strains of S. cerevisiae to test our findings. This includes strains from the common laboratory backgrounds – W303 and BY4742 – which have different auxotrophies, as well as another robust, highly flocculent strain from a prototrophic Σ1278 background. Using all these strains, we now comprehensively find that the role of altered Pi budgeting as a constraint for mitochondrial respiration, and the role of Ubp3 as a regulator of mitochondrial repression is very well conserved. In all tested strains of S. cerevisiae the loss of Ubp3 increases mitochondrial activity (as shown by increased mitochondrial membrane potential and increased Cox2 levels in Figure 6A, B). These data now expand the generality of our findings, and strengthen the manuscript. These results are included in the revised manuscript as a new figure- Figure 6 and the associated text.

      Some of the included data in the revised manuscript are shown below:

      Author response image 1.

      Mitochondrial activity and Cox2 levels in ubp3Δ in different genetic backgrounds

      We also used the W303 strain to assess Pi levels, and its role in increasing mitochondrial respiration. We find that the loss of Ubp3 in this genetic background also increases Pi levels and that the increased Pi is necessary for increasing mitochondrial respiration (Figure 6C, D).

      Author response image 2.

      Basal OCR in WT vs ubp3Δ (W303 strain background) in normal vs low Pi

      These experiments collectively have strengthened our findings on the critical role of intracellular Pi budgeting as a general constraint for mitochondrial respiration in high glucose.

      “It would be conclusive to test whether wild strains with increased respiration under high glucose conditions would also be characterized by increased mitochondrial Pi.”

      Addressed partially above. Right now the relative basal respiration in glucose across different strains is not well known. We measured mitotracker activity in high glucose in multiple WT strains of S. cerevisiae (W303, Σ1278, S288C and BY4742, compared to the CEN.PK strain). These strains all largely had similar mitotracker potential, except for a slight increase in mitochondrial membrane potential in Σ1278 strain, but not in other strains. We further characterized this using Cox2 protein levels as well as basal OCR, and found that these do not increase. These data is shown below, and is not included in the main text since it does not add any new component to the study.

      Author response image 3.

      Mitochondrial respiration in different WT strains

      We did find this suggestion very interesting though, and are exploring directions for future research based on this suggestion. Since we have now identified a role for intracellular Pi allocation in regulating the Crabtree effect, an interesting direction can be to understand the glucose dependent mitochondrial Pi transport in Crabtree negative yeast strains. We will have to bring in a range of new tools and strains for this, so these experiments are beyond the focus of this current study.

      We hope that these new experiments in different genetic backgrounds increases the breadth and generality of our findings, and stimulates new lines of thinking to address how important the role of Pi budgeting as a constraint for mitochondrial repression in high glucose might be.

      (2) It is not described whether the drop in glycolytic flux also affects TCA cycle flux. Are there any changes in the pyruvate level? If the TCA cycle is also impaired, what drives increased mitochondrial respiration?

      Thank you for pointing this out, and we agree this should be included. We have addressed these concerns in the revised version of the manuscript

      Since glucose derived pyruvate must enter the mitochondrial TCA cycle, one possibility is that a decrease in glycolytic rate could decrease the TCA flux. An alternate possibility is that the cells coincidently increase the pyruvate transport to mitochondria, to thereby maintain the TCA cycle flux comparable to that of WT cells. To test both these possibilities, we first measured the steady state levels of pyruvate and TCA cycle intermediates in WT vs ubp3Δ cells. We do not observe any significant change in the levels of pyruvate, or TCA cycle intermediates (except malate, which showed a significant decrease in ubp3Δ cells). This data is now included in the revised manuscript as Figure 2 – figure supplement 1D and figure supplement 2 A, along with associated text.

      Author response image 4.

      Pyruvate levels in WT vs ubp3Δ

      Author response image 5.

      Steady state TCA cycle intermediate levels

      Next, in order to address if the TCA cycle flux is impaired in ubp3Δ cells, we also measured the TCA cycle flux in WT vs ubp3Δ cells by pulsing the cells with 13C glucose and tracking 13C label incorporation from glucose into TCA cycle intermediates. This experiment first required substantial standardization, for the time of cell collection and quenching post 13C glucose addition, by measuring the kinetics of 13C incorporation into TCA cycle intermediates at different time points after 13C glucose addition. The standardization of this method is now included in the revised manuscript as Figure 2 – figure supplement 2 C, along with associated text, and is shown below for reference.

      Author response image 6.

      Kinetics of 13C labelling in TCA cycle intermediates

      Actual TCA cycle flux results: For measuring the TCA cycle flux, cells were treated with 1% 13C glucose, quenched and samples were collected at 7 mins post glucose addition which is in the linear range of 13C label incorporation (Figure 2- Figure 2 – figure supplement 2 C).

      Result: We did not observe any significant changes in the relative 13C label incorporation in TCA cycle intermediates. This data is included in the revised manuscript as Figure 2 – figure supplement 2 D, along with associated text, and is below for your reference.

      Author response image 7.

      TCA cycle flux

      What these data show is that the TCA cycle flux itself is not altered in ubp3Δ. A likely interpretation of this data is that this is due to the increase in the pyruvate transport to mitochondria in ubp3Δ cells, as indicated by the ~10-fold increase in Mpc3 (mitochondrial pyruvate transporter) protein levels (shown in Figure 5-figure supplement 5H), allowing the net same amount of pyruvate into the mitochondria. This increased mitochondrial pyruvate transport could support maintaining the TCA flux in ubp3Δ cells, and supporting the increased respiration. Putting a hierarchy together, the increased respiration in ubp3Δ cells could therefore be primarily due to increased Pi transport, followed by a consequent increase in ETC proteins. We leave it to the readers of this study to make this conclusion.

      We hope that we have addressed all concerns that the reviewer has with respect to TCA cycle flux in ubp3Δ cells.

      (3) In addition, some of the important literature was also missed in citation and discussion. For example, in a recent study (Ouyang et al., 2022), it was reported that phosphate starvation increases mitochondrial membrane potential independent of respiration in yeast and mammalian cells, and some of the conflicting results were presented in this study.

      We are very aware of the recent study by Ouyang et al, which reports that Pi starvation increases mitochondrial membrane potential independent of respiration. However, this study is distinct from the context of our case due to the reasons listed below.

      (a) The reviewer may have misinterpreted our low Pi condition as Pi starvation. There is no Pi ‘starvation’ in this study. Here, we cultured ubp3Δ and tdh2Δtdh3Δ cells in a low Pi medium with 1 mM Pi concentration in order to bring down the intracellular free Pi to that of WT levels. These cells are therefore not Pi-starved, but have been manipulated to have the same intracellular Pi levels as that of WT cells, as shown in Figure 4-figure supplement 1D. The Pi concentration in the medium is still in the millimolar range, and the cells are grown in this medium for a short time (~4 hrs) till they reach OD600 ~ 0.8. This is entirely different from the conditions used in Ouyang et al., 2022, where the cells were grown in a Pi-starvation condition with 1-100 micromolar Pi in the medium for a time duration of 6-8 hrs. Since cells respond differentially to changes in Pi concentrations over time (Vardi et al., 2014), the response to low Pi vs Pi starvation will be completely different.

      (b) In our study, mitochondrial membrane potential is used as only one of the readouts for mitochondrial activity. Our estimations of mitochondrial respiration are established by including other measurements such as Cox2 protein levels (as an indicator of the ETC) and basal OCR measurements (measuring respiration), all of which provide distinct information. The mitochondrial membrane potential can be regulated independent of mitochondrial respiration state (Liu et al., 2021), using membrane potential alone as a readout to estimate mitochondrial respiration can therefore be limiting in the information it provides. As indicated earlier, mitochondrial membrane potential can change, independent of mitochondrial respiration (Ouyang et al., 2022) and ATP synthesis (Liu et al., 2021). Since the focus of our study is mitochondrial respiration, and not just the change in membrane potential, making conclusions based on potential alone are ambiguous. Most studies in the field have in fact not used the comprehensive array of distinct estimates that we use in this study, and we believe the standards set in this study should become a norm for the field.

      (c) The only mutant that is similar to the Ouyang et al study is the Mir1 deletion mutant, which results in acute Pi starvation in mitochondria. In this strain, we find an increase in mitochondrial membrane potential. The data is not included in the manuscript but is shown below.

      Author response image 8.

      Mitochondrial potential in WT vs mir1Δ

      As clear from this data, mitochondrial membrane potential is significantly high in mir1Δ cells. However, the basal OCR and Cox2 protein levels clearly show decreased mitochondrial respiration which is expected in this mutant (Figure 5 A,B). This in fact highlights the limitations of solely relying on mitochondrial membrane potential measurements to draw conclusions, as doing so will lead to a misinterpretation of the actual mitochondrial activity in these cells. We do not wish to highlight limitations in other studies, but hope we make our point clear.

      (4) An additional experiment with strains lacking mitochondrial DNA under phosphate-rich and restricted conditions would further strengthen the result.

      Strains lacking mitochondrial DNA (Rho0 cells) cannot express the mitochondrially encoded ETC subunit proteins. These strains are therefore incapable of performing mitochondrial respiration. Since Rho0 cells are known to utilize alternate mechanisms to maintain their mitochondrial membrane potential (Liu et al., 2021), using mitotracker fluorescence as a readout of mitochondrial respiration in these strains under different Pi conditions is inconclusive and misleading due to the reasons mentioned in point number 3(b and c). However, since this was a concern raised by the reviewer, we now measured basal OCR in WT and Rho0 strains with Ubp3 deletion under normal vs low Pi medium. As expected, Rho0 cells show extremely low basal OCR values, an entire order of magnitude lower than WT cells. At these very low (barely detectable) levels the deletion of Ubp3 or change in Pi concentration in the medium does not change basal OCR, since these strains are not capable of respiration. We have included this data as Figure 4-figure supplement 1G.

      Author response image 9.

      Basal OCR in Rho0 cells

      (5) Western blot control panels should include entire membrane exposure, and non-cut western blots should be submitted as supplementary.

      The non-cut western blot images and the loading controls are now included in the revised manuscript as a supplementary file 2.

      (6) In Figure 4, it is shown that Pi addition decreases basal OCR to the WT level. However, the Cox2 level remains significantly higher. This data is confusing as to whether mitochondrial Pi directly regulates respiration or not.

      As described in the previous point, the Cox2 levels and the OCR provide distinct pieces of information. In figure 4, we show that culturing ubp3Δ in low Pi significantly decreases both Cox2 protein levels and basal OCR. Since Cox2 protein levels and basal OCR are different readouts for mitochondrial activity, there could be differences in the extent by which Pi availability controls each of these factors. Basal OCR is a direct readout for mitochondrial respiration, and is regulated by multiple factors including ETC protein levels, rate of ATP synthesis, rate of Pi transport etc. In figure 4, we find that culturing ubp3Δ in low Pi decreases basal OCR to WT level. This strongly suggests that high Pi levels are necessary to increase basal OCR in ubp3Δ.

      (7) Representative images of Ubx3 KO and wild-type strains stained with CMXRos are missing.

      Thank you for noticing this. This data is now included in the revised manuscript as Figure 1figure supplement 1C.

      Author response image 10.

      (8) Overall, mitochondrial copy number and mtDNA copy number should be analyzed in WT and Ubo3 KO cells as well as Pi-treated and non-treated cells, and basal OCR data should be normalized accordingly. The reported normalization against OD is not appropriate.

      This is a valid concern raised by the reviewer, and something we had extensively considered during the study. To normalize the total mitochondrial amounts in each strain, we always measure the protein levels of the mitochondrial outer membrane protein Tom70. While we had described this in the methods, it may not have been obvious in the text. But this information is included in Figure 1-figure supplement 1G. We did not observe any significant change in Tom70 levels, suggesting that the total mitochondrial amount does not change in ubp3Δ, and we have noted this in the manuscript (results section relevant to Figure 1). As an additional control, to directly measure the mitochondrial amount in these conditions, we have now measured the mitochondrial volume in ubp3Δ cells and WT cells treated with Pi. For this, we used a strain which encodes mitochondria targeted with mNeon green protein (described in Dua et al., JCB, 2023), and which can therefore independently assess total mitochondrial amount. We do not observe any changes in mitochondrial volume or amounts in ubp3Δ cells or WT+Pi, compared to that of WT cells. Therefore, the change in mitochondrial respiration in Ubp3 deletion and Pi addition are not due to changes in total amounts of mitochondria in these conditions. Given all these, the normalization of basal OCR using total cell number is therefore the most appropriate way for normalization. This is also conventionally used for basal OCR normalization in multiple studies.

      We have now included these additional data on mitochondrial volumes and amounts in the revised manuscript as Figure1-figure supplement 1F and Figure5-figure supplement 1D, and associated text, and is shown below.

      Author response image 11.

      Mitochondrial volume in WT vs ubp3Δ cells

      Author response image 12.

      Mitochondrial volume in WT and WT+Pi

      These data collectively address the reviewer’s concerns regarding changes in mitochondrial amounts in all the conditions and strains used in this study.

      Reviewer #2 (Public Review):

      Summary:

      Cells cultured in high glucose tend to repress mitochondrial biogenesis and activity, a prevailing phenotype type called Crabree effect that is observed in different cell types and cancer. Many signaling pathways have been put forward to explain this effect. Vengayil et al proposed a new mechanism involved in Ubp3/Ubp10 and phosphate that controls the glucose repression of mitochondria. The central hypothesis is that ∆ubp3 shifts the glycolysis to trehalose synthesis, therefore leading to the increase of Pi availability in the cytosol, then mitochondria receive more Pi, and therefore the glucose repression is reduced.

      Strengths:

      The strength is that the authors used an array of different assays to test their hypothesis. Most assays were well-designed and controlled.

      Weaknesses:

      I think the main conclusions are not strongly supported by the current dataset.

      (1) Although the authors discovered ∆ubp3 cells have higher Pi and mitochondrial activity than WT in high glucose, it is not known if WT cultured in different glucose concentration also change Pi that correlate with the mitochondrial activity. The focus of the research on ∆ubp3 is somewhat artificial because ∆ubp3 not only affects glycolysis and mitochondria, but many other cellular pathways are also changed. There is no idea whether culturing cells in low glucose, which derepress the mitochondrial activity, involves Ubp3 or not. Similarly, the shift of glycolysis to trehalose synthesis is also not relevant to the WT cells cultured in a low-glucose situation. “The focus of the research on ∆ubp3 is somewhat artificial because ∆ubp3 not only affects glycolysis and mitochondria, but many other cellular pathways are also changed. There is no idea whether culturing cells in low glucose, which de-repress the mitochondrial activity, involves Ubp3 or not.”

      We would like to clarify that the focus of this research is not on Ubp3, or to address mechanistic aspects of how Ubp3 regulates mitochondrial activity, or to identify the targets of Ubp3. That would be an entirely distinct study, with a very different approach.

      In this study, while carrying out a screen, we serendipitously found that ubp3Δ cells showed an increase in mitochondrial activity in high glucose. Subsequently, we used this observation, bolstered by diverse orthogonal approaches, to identify a general, systems-level principle that governs mitochondrial repression in high glucose. Through this, we identify a role of phosphate budgeting as a controller of mitochondrial repression in high glucose. In this study, our entire focus has been to use orthogonal approaches, as well as parsimonious interpretations, to establish this new hypothesis as a possibility. We hope this idea, supported by these data, will now enable researchers to pursue other experiments to establish the generality of this phenomenon.

      We have not focused our effort in identifying the role of Ubp3, or its regulation upon changes in glucose concentration in this context. That is a very specific, and separate effort, and misses the general point we address here. It is entirely possible that Ubp3 might also regulate mitochondrial activity by additional mechanisms other than mitochondrial Pi availability (such as via the reduction of key glycolytic enzymes at nodes of glycolysis, resulting in reduced glycolytic flux and rerouted glucose metabolism). Had the goal been to identify Ubp3 substrates, it is very likely that we would not have found the role of Pi homeostasis in controlling mitochondrial respiration. This is particularly because the loss of Ubp3 does not result in an acute disruption of glycolysis, unlike say a glycolytic enzyme mutant, which would have resulted in severe effects on growth and overall metabolic state. This would have made it difficult to dissect out finer details of metabolic principles that regulate mitochondrial respiration.

      In order to further corroborate our findings, we used the glycolysis defective mutant tdh2Δtdh3Δ cells, where we find a similar change in Pi balance. This complements the key observations made using ubp3Δ cells. Distinctly, we utilized the glycolytic inhibitor 2DG to independently assess the role of mitochondrial Pi transport in regulating respiration. Together, in this study we do not just relying on genetic mutants, but combine the Ubp3 deletion strain with a reduced GAPDH activity strain, and pharmacologic inhibition of glycolysis. Distinctly, we find that mitochondrial Pi transporter levels are repressed under high glucose (Figure 5C, Figure 5-figure supplement 1B). Further, we find that mitochondrial Pi transport is important in increasing mitochondrial respiration upon shift to low glucose and glycolytic inhibition by 2-DG. Therefore, we collectively unravel a more systems level principle that regulates glucose mediated mitochondrial repression, as opposed to a mechanistic study of Ubp3 targets.

      Of course, given the conservation of Ubp3, we are very excited to pursue a mechanistic study of Ubp3 targets in future. This is a general challenge for deubiquitinase enzymes, and till date there are very few bona fide substrates known for any deubiquitinase enzyme, from any cellular system (due to challenges in the field that we discuss separately, and have included in the discussion section of this text).

      “Similarly, the shift of glycolysis to trehalose synthesis is also not relevant to the WT cells cultured in a low-glucose situation”

      The reviewer is correct in pointing out that in low-glucose, the shift to trehalose synthesis might not be as relevant. We observe that the glycolysis defective mutant tdh2Δtdh3Δ cells does not show an increase in trehalose synthesis (Figure 3-figure supplement 1E). However, in this context, the decrease in the rate of GAPDH catalysed reaction alone appears to be sufficient to increase the Pi levels (Figure 3F) even without an increase in trehalose. Therefore, there might be differences in the relative contributions of these two arms towards Pi balance, based on whether it is low glucose in the environment, or a mutant such as ubp3 that modulates glycolytic flux. In ubp3Δ cells, the combination of low rate of GAPDH catalyzed reaction and high trehalose will happen (based on how glycolytic flux is modulated), vs only the low rate of GAPDH catalyzed reaction in tdh2Δtdh3Δ cells. As an end point the increase in Pi happens in both cases, but with slightly differing outcomes. It is also to be noted that in terms of free Pi sources a low-glucose condition (with low glycolytic rate) is very different from a no-glucose, respiratory condition (where cells perform very high gluconeogenesis). In high respiration conditions such as ethanol, cells switch to high gluconeogenesis, where there is a huge increase trehalose synthesis as a default (eg see Varahan et al 2019). In this condition, trehalose synthesis could be a major source for Pi (eg see Gupta 2021), and could support the increased mitochondrial respiration. In an ethanol medium, the directionality of GAPDH reaction is reversed. Therefore, this reaction will also now become an added source of Pi, instead of a consumer of Pi (see illustration in Figure 3G). Therefore, a reasonable interpretation is that a combination of increased trehalose and increased 1,3 BPG to G3P conversion can be a major Pi source to increasing mitochondrial respiration in a non-glucose, respiratory medium.

      “it is not known if WT cultured in different glucose concentration also change Pi that correlate with the mitochondrial activity”

      This is valid point raised by the reviewer. We have already found that the protein levels of mitochondrial Pi transporter is increased in a non-glucose respiratory (ethanol) medium and a low (0.1%) glucose medium (see Figure 5C, Figure5-figure supplement 1B). In addition, we have tried measuring mitochondrial Pi levels in cells grown in a high glucose medium vs a respiratory, ethanol medium. The results are shown below for the reviewer’s reference. Reviewer response image 3 – Mitochondrial Pi levels in ethanol vs glucose

      Author response image 13.

      We observe a clear trend where mitochondrial Pi levels are high in cells grown in ethanol medium compared to that of cells grown in high glucose. However, the estimation of Pi, and normalising the Pi levels in isolated mitochondria is extremely difficult in this condition (note that this has never been done before). This is likely due to a rapid rate of conversion of ADP and Pi to ATP (in ethanol) which increases the variation in the estimation of steady state Pi levels, and the high amounts of mitochondria in ethanol grown cells. Since the date shows high variation, we have not included this data in the manuscript, but we are happy to include it here in the response.

      Indeed, this study opens up the exciting question of addressing how intracellular Pi allocation is regulated in different conditions of glucose. This can be further extended to Crabtree negative strains such as K. lactis which do not show mitochondrial repression in high glucose. All of these are rich future research programs.

      (2) The central hypothesis that Pi is the key constraint behind the glucose repression of mitochondrial biogenesis/activity is supported by the data that limiting Pi will suppress mitochondrial activity increase in these conditions (e.g., ∆ubp3). However, increasing the Pi supply failed to increase mitochondrial activity. The explanation put forward by the authors is that increased Pi supply will increase glycolysis activity, and somehow even reduce the mitochondrial Pi. I cannot understand why only the increased Pi supply in ∆ubp3, but not the increased Pi by medium supplement, can increase mitochondrial activity. The authors said "...that ubp3Δ do not increase mitochondrial Pi by merely increasing the Pi transporters, but rather by increasing available Pi pools". They showed that ∆ubp3 mitochondria had higher Pi but WT cells with medium Pi supplement showed lower Pi, it is hard to understand why the same Pi increase in the cytosol had a different outcome in mitochondrial Pi. Later on, they showed that the isolated mito exposed to higher Pi showed increased activity, so why can't increased Pi in intact cells increase mito activity? Moreover, they first showed that ∆ubp3 had a Mir1 increase in Fig3A, then showed no changes in FigS4G. It is very confusing.

      “I cannot understand why only the increased Pi supply in ∆ubp3, but not the increased Pi by medium supplement, can increase mitochondrial activity.”

      This is an interesting point, that requires a nuanced explanation, which we try to provide below.

      For mitochondrial respiration to increase in the presence of high Pi, the cytosolic Pi has to be transported to the mitochondria sufficiently. In ubp3Δ the increased free Pi (as a consequence of rewired glycolysis) is transported to the mitochondria (Figure 4). This increased mitochondrial Pi can therefore increase mitochondrial respiration in ubp3Δ.

      In case of WT+Pi, the externally supplemented Pi cannot further enter mitochondria (as shown in Figure 5-Figure supplement 1C) and is most likely restricted to the cytosol. Because of this inability of the Pi to access mitochondria, the mitochondrial respiration does not increase in WT+Pi (Figure 5-Figure supplement 1E).

      The likely reason for this difference in mitochondrial Pi transport in ubp3Δ vs WT+Pi is the relative difference in their glycolytic rate. The glycolytic rate is inherently decreased in ubp3Δ, but not in WT+Pi. To dissect this possibility of glycolytic rate itself contributing to the Pi availability in the mitochondria, we inhibited glycolysis in WT cells (using 2DG), and then supplemented Pi. Compared to cells in the same glucose condition (with 2DG, but without supplementing excess Pi), now the WT+Pi (+2DG) has higher mitochondrial respiration (Figure 5-Figure supplement 1F). This suggests that a combination of low glycolysis and high Pi is required for increasing mitochondrial respiration (as elaborated in the discussion section of the manuscript).

      An obvious question that arises out of this observation is how does the change in glycolytic rate regulate mitochondrial Pi transport. One consequence of altering the glycolytic rate is a change in cytosolic pH. This itself will bear on the extent of Pi transport into mitochondria, as discussed in detail below.

      In mitochondria, Pi is co-transported along with protons. Therefore, changes in cytosolic pH (which changes the proton gradient) can control the mitochondrial Pi transport (Hamel et al., 2004). Glycolytic rate is a major factor that controls cytosolic pH. The cytosolic pH in highly glycolytic cells is ~7, and decreasing glycolysis results in cytosolic acidification (Orij et al., 2011). Therefore, under conditions of decreased glycolysis (such as loss of Ubp3), cytosolic pH becomes acidic. Since mitochondrial Pi transport is dependent on the proton gradient, a low cytosolic pH would favour mitochondrial Pi transport. Therefore, under conditions of decreased glycolysis (2DG treatment, or loss of Ubp3), where cytosolic pH would be acidic, increasing cytosolic Pi might indirectly increase mitochondria Pi transport, thereby leading to increased respiration.

      To explain this and integrate all these points, we have extended a discussion section in this manuscript. We include this section below:

      “Supplementing Pi under conditions of low glycolysis (where mitochondrial Pi transport is enhanced), as well as directly supplementing Pi to isolated mitochondria, increases respiration (Figure 5, Figure 5-figure supplement 1). Therefore, in order to derepress mitochondria, a combination of increased Pi along with decreased glycolysis is required. An additional systems-level phenomenon that might regulate Pi transport to the mitochondria is the decrease in cytosolic pH upon decreased glycolysis (60, 61). The cytosolic pH in highly glycolytic cells is ~7, and decreasing glycolysis results in cytosolic acidification (60, 61). Therefore, under conditions of decreased glycolysis (2DG treatment, deletion of Ubp3, and decreased GAPDH activity), cytosolic pH becomes acidic. Since mitochondrial Pi transport itself is dependent on the proton gradient, a low cytosolic pH would favour mitochondrial Pi transport (62). Therefore, under conditions of decreased glycolysis (2DG treatment, or loss of Ubp3, or decreased GAPDH activity), where cytosolic pH would be acidic, increasing cytosolic Pi might indirectly increase mitochondria Pi transport, thereby leading to increased respiration. Alternately, increasing mitochondrial Pi transporter amounts can achieve the same result, as seen by overexpressing Mir1 (Figure 5).”

      This possibility of changes in cytosolic pH regulating mitochondrial Pi transport and thereby respiration is a really interesting future research question, and an idea that has not yet been explored till date. This can stimulate new lines of thinking towards finding conserved biochemical principles that control mitochondrial repression in high glucose.

      “Moreover, they first showed that ∆ubp3 had a Mir1 increase in Fig3A, then showed no changes in FigS4G. It is very confusing”

      increase in Mir1 in ubp3Δ shown in figure 3A comes from the analysis of the proteomics dataset from a previous study (Isasa et al., 2015). Subsequently, we more systematically experimentally assessed Mir1 levels directly, and did not observe an increase in Mir1 (Figure 4figure supplement 1H in revised manuscript). It is entirely possible that in a large-scale study (as in Isasa 2015), some specific proteomic targets might not fully reproduce when tested very specifically (as is described in Handler et al., 2018 and Mehta et al., 2022). We do clearly indicate this in the text, but given the density of information in this study, it is understandable that this point was missed by the reviewer.

      (3) Given that there is no degradation difference for these glycolytic enzymes in ∆ubp3, and the authors found transcriptional level changes, suggests an alternative possibility where ∆ubp3 may signal through unknown mechanisms to parallelly regulate both mitochondrial biogenesis and glycolytic enzyme expression. The increase of trehalose synthesis usually happens in cells under proteostasis stress, so it is important to rule out whether ∆ubp3 signals these metabolic changes via proteostasis dysregulation. This echoes my first point that it is unknown whether wild-type cells use a similar mechanism as ∆ubp3 cells to regulate the glucose repression of mitochondria.

      We appreciate this point raised by the reviewer, but this again requires some clarification (as made earlier). The goal of this study was to identify systems-level principles that explain mitochondrial repression in high glucose. Although we started by performing a screen to identify proteostatic regulators of mitochondrial activity in high glucose, and identified Ubp3 as a mediator of mitochondrial activity, our approach was to use ubp3Δ cells as a model to understand the metabolic principles that regulate mitochondrial repression. This has been reiterated repeatedly in the manuscript – for example lines 123-124 “We therefore decided to use ubp3Δ cells to start delineating requirements for glucose-mediated mitochondrial repression.” and again in the discussion section – lines 442-460, where we discuss some unique advantages of using ubp3Δ cells to understand a general basis of mitochondrial regulation. To test this hypothesis, we also used orthogonal approaches, as well as other mutants and conditions with defective glycolysis, such as tdh2Δtdh3Δ cells and 2DG treatments. Only with these multiple converging evidences do we infer that there might be a role of the change in Pi balance (due to changes in glycolytic rate) in regulating mitochondrial activity.

      We certainly agree that there is great value in identifying the mechanistic details of how Ubp3 regulates mitochondria. But this requires very distinct approaches not pursued in this study. This is not the question that we are addressing in this story. Separately, identifying targets of DUBs is one of the exceptional challenges in biology, since there are currently no straightforward chemicalbiology approaches to do so for this class of proteins. Unlike kinase/phosphatase systems, or even ubiquitin ligases, substrate trapping mutants etc have proven to be abject failures in identifying direct targets of DUBs. A quantitative proteomics study might suggest some proteins/cellular processes regulated by Ubp3. This has been attempted for several DUBs, but rarely have any direct substrates of DUBs every been identified, in any system. A high quality quantitative, descriptive proteome dataset of ubp3Δ cells is already available from a previous study (Isasa et al., 2015), which we cite extensively in this manuscript, and indeed was invaluable for this study. We cannot improve the outstanding quality dataset already available. Interestingly, the findings of this study actually help substantiate our idea of an increased mitochondrial activity and change in Pi homeostasis in ubp3Δ cells. The Isasa et al dataset finds proteins involved in mitochondrial respiration that are high in ubp3Δ cells, and the glycolytic enzymes and PHO regulon proteins are reduced. In our study, using these data references, we were able to conceptually piece together how changes in glycolytic flux can alter Pi balance.

      Apart from identifying changes in protein levels, a separate challenge in making sense of this quantitative proteomics data is the difficulty in pinpointing any target of Ubp3 that specifically regulates these processes. A single DUB can have multiple substrates, and this could regulate the cellular metabolic state in a combinatorial manner. This is the essence of all signaling regulators in how they function, and it is therefore important to understand what their systems-level regulation of cell states are (separate from their specific individual substrates). Therefore, identifying the specific target of Ubp3 responsible for this metabolic rewiring can be very challenging. These experiments are well beyond the scope or interest of the current manuscript.

      If we had pursued that road in this study, we would not have made any general findings related to Pi balance, nor would this more general hypothesis have emerged.

      (4) Other major concerns:

      (a) The authors selectively showed a few proteins in their manuscript to support their conclusion. For example, only Cox2 and Tom70 were used to illustrate mitochondrial biogenesis difference in line 97. Later on, they re-analyzed the previous MS dataset from Isasa et al 2015 and showed a few proteins in Fig3A to support their conclusion that ∆ubp3 increases mitochondrial OXPHOS proteins. However, I checked that MS dataset myself and saw that many key OXPHOS proteins do not change, for example, both ATP1 and ATP2 do not change, which encode the alpha and beta subunits of F1 ATPase. They selectively reported the proteins' change in the direction along with their hypothesis.

      To clarify, we observe an increase in Cox2 protein levels but not in Tom70 levels which suggests that there is no increase in mitochondrial biogenesis. The increase is specific to some respiration related mitochondrial proteins such as Cox2 (Figure 1E, Figure 3A). We have clearly pointed out this in the manuscript. We used Cox2 protein levels as an additional readout for ETC activity, to validate our observations coming from the potentiometric mitotracker readouts, and basal oxygen consumption rate (OCR) measurements. This was for 3 reasons: Cox2 is a mitochondrial genome encoded subunit of the complex IV (cytochrome c oxidase) in the ETC, and has a redox centre critical for the cytochrome c oxidase activity. The biogenesis and assembly of complex IV subunits have been studied with respect to multiple conditions such as glucose availability and hypoxia and the expression and stability of the mitochondrial encoded complex IV subunits are exceptionally well correlated to changes in mitochondrial respiration (Fontanesi et al., 2006). Cox2 is very well characterised in S. cerevisiae, and the commercially available Cox2 antibodies are outstanding, which makes estimating Cox2 levels by western blotting unambiguous and reproducible.

      We re-analyzed the proteomic dataset from Isasa et al to find out additional information regarding the key nodes that are differentially regulated in ubp3Δ. We have not claimed at any point in the manuscript that all OXPHOS related proteins are upregulated in ubp3Δ, nor is there any need for that to be so. We identified Ubp3 from our screen, observed an increase in mitochondrial potential, basal OCR, and Cox2 levels. We later found out that the proteomic data set for ubp3Δ also supports our observations that mitochondrial respiration is upregulated in ubp3Δ. The reviewer points out that we “showed a few proteins in Fig3A to support their conclusion that ∆ubp3 increases mitochondrial OXPHOS proteins”. Our conclusion is that the deletion of Ubp3 increases mitochondrial respiration. The combined readouts which we used to reach this conclusion (OCR, mitochondrial potential, mitochondrial ATP production, Cox2 levels) are far more direct, comprehensive and conclusive than showing an increase in a few proteins related to OXPHOS, as also explained earlier toward a distinct reviewer query. Since different mitochondrial proteins are regulated by different mechanisms, we need not see an increase in all the OXPHOS proteins in a mutant like ubp3Δ where mitochondrial respiration is high. An increase in some key proteins would be sufficient to increase the respiration as seen in our case.

      To summarise, the proteomic dataset supports our observation, but our conclusions are not dependent on the increase in OXPHOS proteins observed in the dataset.

      (b) The authors said they deleted ETC component Cox2 in line 111. I checked their method and table S1, I cannot figure out how they selectively deleted COX2 from mtDNA. This must be a mistake.

      Yes, we understand that for mitochondrially encoded proteins, a simple knock-out strategy has limitations. However, we first tried to generate the Cox2 deletion mutant by a standard PCR mediated gene deletion strategy (Longtine 1998), with the optimistic assumption that even if all Cox2 is not lost, a substantial fraction of the Cox2 genes would be lost via recombination. We selected the transformants after strong antibiotic selection, and then we measured the Cox2 protein levels. Gratifyingly, we found that the mutant strain had substantially decreased Cox2 protein levels (but not a complete loss), and this was retained across generations. The data is shown below.

      Author response image 14.

      Cox2 levels in WT vs Cox2 mutants

      Since the mutants have decreased Cox2 levels, we went ahead and performed growth assays using this strain, in a WT or Ubp3 deletion background. Deletion of Ubp3 in the Cox2 mutant resulted in a more severe growth defect.

      However, we fully agree that this strain is not a complete Cox2 knockout, and it is possible that the decrease in Cox2 is due to modifications in some other unelated gene. In the text, we should also not have named this cox2Δ. Since we are not sure of the exact genetic modification in this mutant, we have removed this data from the revised manuscript.

      Instead, we have now repeated all experiments, utilizing a fully characterised Cox2 mutant -cox262, described in (5) which has defective respiration. In this revised version, we find that deletion of Ubp3 in this strain retains the originally observed severe growth defect in glucose. This is consistent with our conclusion that a functional mitochondria is required for proper growth in ubp3Δ mutant. To separately validate this conclusion, we also utilized a Rho0 strain which does not have mitochondrial DNA and thereby cannot perform mitochondrial respiration. We show that deletion of Ubp3 results in a more severe growth defect in a Rho0 strain. These results are included in the revised manuscript as figure 1-figure supplement 1 I.

      Author response image 15.

      Also, we further confirmed that the Rho0 strain and Rho0 ubp3 strain is incapable of respiration, using seahorse assay. This data is included in the revised manuscript as Figure 4-figure supplement 1G.

      Author response image 16.

      Basal OCR in Rho0 cells

      We hope that these new data address the reviewer’s concerns about the Cox2 mutant.

      (c) They used sodium azide in a lot of assays to inhibit complex IV. However, this chemical is nonspecific and broadly affects many ATPases as well. Not sure why they do not use more specific inhibitors that are commonly used to assay OCR in seahorse.

      We have now performed growth assays for WT and ubp3Δ cells in the presence of specific mitochondrial OXPHOS inhibitors - oligomycin and FCCP. We observe a more severe growth defect in ubp3Δ cells compared to WT cells in the presence of oligomycin and FCCP, similar to the results observed with sodium azide. All these data are now included in the revised manuscript as Figure 1I, Figure1-figure supplement 1H, along with associated text.

      Author response image 17.

      Growth rate in the presence of FCCP

      Author response image 18.

      Figure1-figure supplement 1H- Growth rate in the presence of oligomycin

      We hope that these new data addresses the reviewer’s concerns.

      (d) The authors measured cellular Pi level by grinding the entire cells to release Pi. However, this will lead to a mix of cytosolic and vacuolar Pi. Related to this caveat, the cytosol has ~50mM Pi, while only 1-2mM of these glycolysis metabolites, I am not sure why the reduction of several glycolysis enzymes will cause significant changes in cytosolic Pi levels and make Pi the limiting factor for mitochondrial respiration. One possibility is that the observed cytosolic Pi level changes were caused by the measurement issue mentioned above.

      The Pi estimation shown in figure 3 C, E, F and G is a measure of total Pi in the cells. The vacuole is a major storehouse of phosphate in cells. However, unlike plant cells where free phosphate is stored in vacuoles, yeast vacuoles store phosphate only in the form of polyphosphates (Yang et al., 2017, Hürlimann et al., 2007). The free Pi formed from the hydrolysis of polyphosphate is subsequently transported to cytosol via the exporter Pho91 (Hürlimann et al., 2007). This therefore makes cytosol and mitochondria the major storage of usable free Pi in yeast. Since the malachite green assay that we use for phosphate estimation is specific to free Pi, and not polyphosphate, the Pi estimates that we show in figure 3 come from a combination of cytosolic and mitochondrial Pi. As explained earlier, in order to specifically measure mitochondrial Pi, we have established methods to rapidly isolate mitochondria, and then followed this by estimating Pi in these isolated mitochondria (Figure 4B). Here we clearly see a large increase in mitochondrial Pi in the Ubp3 deletion cells. This allows us to estimate the changes in Pi levels that specific to mitochondria, without relying only on total Pi changes.

      “the cytosol has ~50mM Pi, while only 1-2mM of these glycolysis metabolites, I am not sure why the reduction of several glycolysis enzymes will cause significant changes in cytosolic Pi levels and make Pi the limiting factor for mitochondrial respiration”

      The reviewer has completely missed the fact that the glycolytic rate in yeast is the highest known for any cell. While the steady state levels of glycolytic metabolites might be ~2 mM, the process of glycolysis is not static but is rapid and continuous. Glucose is continuously broken down and converted to pyruvate, along with the consumption of Pi and generation of ATP. This is the reason for the rapid 13C label saturation (within seconds of 13C glucose addition) in yeast cells (Figure 2-figure supplement 1F). This instantaneous label saturation makes accurate flux measurements arduous because of which we had to optimize a method for measuring glycolytic flux in yeast cells (Figure 2-D, Figure 2-figure supplement 1F). Indeed, for that reason, our measurements of glycolytic flux in yeast are the first time this is being reported in the field. This in itself is an enormously challenging experiment, and establishes a new benchmark.

      In highly glycolytic cells, most of the ATP is synthesized via glycolysis and the rate of glycolysis and ATP synthesis is very high. In the reaction catalysed by GAPDH, Pi and ADP is converted to ATP. This ATP formed acts as a Pi donor to most of the Pi consuming reactions in the cells. Some of these processes such a protein translation utilizes ATP, but releases Pi and ADP and this Pi enters the cellular Pi pool. Several other reactions such as nucleotide biosynthesis, polyphosphate biosynthesis and protein phosphorylation use ATP as a Pi donor and the Pi is fixed in biomolecules. Increasing the rates of these ‘Pi sinks’ therefore can result in a decrease in Pi pools. This is a concept we have earlier tried to clarify more elaborately in (Gupta and Laxman, 2021). In fact, increasing nucleotide biosynthesis and polyphosphate synthesis has earlier been suggested to decrease available free Pi (Austin and Mayer 2020, Desfougères et al., 2016). When glycolytic flux is high, this is coupled/tuned to the consumption of Pi which will be correspondingly high due to increased ATP, nucleotide and polyphosphate synthesis. Pi levels rapidly decrease upon glucose addition, due to the continuous Pi consumption during glycolysis (Hohmann et al., 1996, Van Heerden et al., 2014 , Koobs et al., 1972). Therefore, changes in glycolytic rate due to change in glycolytic enzyme levels can result in significant changes in Pi levels due to changes in Pi consumption rate.

      Our results also show that the apart from Pi levels, the glycolytic state can regulate mitochondrial Pi transport as well. This is the reason for mitochondrial Pi levels and basal OCR not increasing merely by adding Pi to cells. We show that basal OCR can be increased by adding Pi in the presence of 2DG. This regulation of mitochondrial Pi transport is a major limiting factor for mitochondrial respiration and could be mediated partly by the regulating of Mir1 levels and also by the changes in the cytosolic pH which regulates the rate of mitochondrial Pi transport. We have discussed these points in the discussion section in our manuscript.

      We hope that this clarifies the reviewer’s concerns regarding how changes in glycolytic rate can regulate changes in cytosolic Pi levels.

      (e) The authors used ∆mir1 and MIR1 OE to show that Pi viability in the mitochondrial matrix is important for mitochondrial activity and biogenesis. This is not surprising as Pi is a key substrate required for OXPHOS activity. I doubt the approach of adding a control to determine whether Pi has a specific regulatory function, while other OXPHOS substrates, like ADP, O2 etc do not have the same effect.

      To clarify, we only used the mir1Δ cells to understand the requirement for Pi transport from cytosol to mitochondria in controlling respiration. The reviewer is correct in stating that deletion of Mir1 would reduce Pi import to mitochondria and thereby inhibit respiration. This is exactly the conclusion we suggest from this experiment as stated in the manuscript – “These data suggest that mitochondrial Pi transport (via Mir1) is critical for maintaining basal mitochondrial activity even in high glucose”. We have only used these experiments to support the idea that even though glycolysis and mitochondria are in different compartments, a change in Pi balance in one compartment (cytosol) can affect Pi levels in the other (mitochondria) since there is Pi transport between these two compartments. Since mitochondria has its own polyphosphate reserves, in the absence of these experiments with mir1Δ cells it can be imagined that mitochondria PolyP can be an additional source of Pi to support respiration, and therefore changes in cytosolic Pi may have only a minor effect on mitochondrial respiration. Our experiments with mir1Δ and Mir1-OEcells indubitably suggest that Pi transport to mitochondria from cytosol is important for respiration, and therefore changes in cytosolic Pi levels (or maintaining cytosolic Pi at a lower level due to the rate of glycolysis) will have rippling effects in mitochondrial Pi availability. Further, these data suggest that for example under glycolytic inhibition (low glucose, or 2DG), while all factors (signalling, substrate availability etc) favour respiration (and mitochondrial derepression), cells cannot unable to achieve this in the absence of ample Pi transport from cytosol. This therefore places Pi at the centre stage in controlling mitochondrial respiration.

      We conclude that Pi is a major, but not the only constraint for mitochondrial respiration. There certainly could be a role for ADP, oxygen availability etc in regulating respiration. However, these are beyond the scope of our study. We have discussed about the potential role of ADP in regulating mitochondrial repression in the discussion section. “An additional consideration is the possible contribution of changes in ADP in regulating mitochondrial activity, where the use of ADP in glycolysis might limit mitochondrial ADP. Therefore, when Pi changes as a consequence of glycolysis, it could be imagined that a change in ADP balance can coincidentally occur. However, prior studies show that even though cytosolic ADP decreases in the presence of glucose, this does not limit mitochondrial ADP uptake, or decrease respiration, due to the very high affinity of the mitochondrial ADP transporter.”

      We hope that this clarifies the reviewer’s concerns regarding the use of Mir1 OE and mir1Δ strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Some of the experiments should be repeated in other strain backgrounds for reproducibility and rigor.

      As discussed in the response to point number 1, we have now utilized multiple strains of S. cerevisiae to test our findings. We now find that our discoveries regarding the role of altered Pi budgeting as a constraint for mitochondrial respiration, and the role of Ubp3 as a regulator of mitochondrial repression are conserved across multiple genetic backgrounds of S. cerevisiae. These results are included in the revised manuscript as a new figure- Figure 6 and associated text. We used the W303, Σ1278 and BY4742 strains of S. cerevisiae to show that deletion of Ubp3 increases mitochondrial activity (as shown by increased mitochondrial membrane potential and increased Cox2 levels). Using the W303 strain we show that the deletion of Ubp3 increases Pi levels and that the increased Pi is necessary for increasing mitochondrial respiration (Figure 6C, D). These added experiments have substantially broadened the generality of our findings.

      The number of biological repeats needs to be increased in all experiments.

      We have increased the number of biological repeats in key experiments that shows that the increased Pi levels are necessary for the increased mitochondrial respiration in ubp3Δ and tdh2Δtdh3Δ cells (revised Figure 4F). Apart from a few basal OCR measurements and mitotracker data in supplementary figure, all our experiments are performed for 3 biological repeats. In case of basal OCR measurements, yeast cells have to be aliquoted to poly-L-lysine coated seahorse plates and centrifuged to ensure that the cells are properly settled. This is due to the non-adherent nature of yeast cells. During the centrifugation step, the wells in the two end rows cannot be utilized due to uneven settling of cells which affects the basal OCR readings in these wells. In case of several experiments that involve multiple samples, we were therefore limited to restrict the number of biological replicates to 2 (repeated independently), so that all samples could be accommodated in the plate.

      Full western blot images should be supplemented along with the other data.

      The complete western blot images are now included in the revised manuscript as supplementary file 2.

      TCA cycle flux should be analyzed and presented in the study to conclude some of the findings.

      As discussed in detail in the response to point number 2, we have performed steady state and flux measurements for TCA cycle intermediates. This data is now included as a new supplement figure- Figure 2-figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Fig. 2A, they should also include the gluconeogenesis enzymes (fructose 1,6 bisphosphatase, PEP carboxykinase, and pyruvate carboxylase) to exclude the possibility that glycolytic intermediates are not rerouted to gluconeogenesis.

      We measured the protein levels of Fbp1 (fructose 1,6 bisphosphatase) and Pck1 (PEP carboxykinase). We observed an increase in the protein levels in both enzymes in ubp3Δ. The data is shown below.

      Author response image 19.

      Fbp1 and Pck1 protein levels

      While we agree that this is an interesting observation which might help us in understanding the metabolic rewiring in ubp3Δ, we have not included this data in the current revised version of the manuscript due to two main reasons.

      (1) Since ubp3Δ cells have a defective glycolysis and therefore a defective glucose repression, the mRNA and protein levels of gluconeogenic enzymes which are usually glucose-repressed might increase. This might be a response at the level of transcription and translation of these enzymes and might or might not change the rate of gluconeogenesis in these cells. This is because of multiple other factors that regulate gluconeogenic flux such as allostery, mass action etc. Therefore, to avoid confounding our main points and since we cannot make a conclusive assumption on the gluconeogenic metabolism in these mutants, we don’t include this data. The primary focus of our story is the mitochondrial repression component. Understanding the feedback controls that alter gluconeogenesis in these mutants is beyond the scope of this study and could be addressed in a separate future study.

      (2) As we highlight extensively in the response letter and in the manuscript, our aim is not to understand the specific mechanistic role of Ubp3. In this manuscript, we identify the conserved constraints that control mitochondrial repression without focusing just on the role of Ubp3 in regulating this. Whether Ubp3 regulates gluconeogenesis is a question that could be addressed in a future study that focuses on identifying the altered signalling mechanisms in ubp3Δ and the targets of Ubp3.

      (2) In line 292, page 10, there is a typo "dermine".

      We apologize for this mistake. Corrected.

      (3) In Figure 5A, is there a reason why they chose 0.1% glucose condition as a low glucose condition? Also, is there a dose-dependent change in OCR or other mitochondrial functions according to the concentration of glucose?

      The glucose concentration of 0.1% was selected to decrease (but not completely remove) the available glucose. 0.1% glucose is considered as a standard low glucose condition in S. cerevisiae (Yin et al., 2003) and the effect of this glucose concentration on cellular processes has been extensively studied (Yin et al., 2003, Takeda et al., 2015 etc). <0.2% glucose is the critical threshold for activating respiratory metabolism (Takeda et al., 2015) and shifting cells to 0.1% glucose in our experiments will activate respiration, as we show in our data. However, this is very different from completely removing glucose or using an alternate carbon source such as ethanol, because this would result in full activation of gluconeogenesis. We further find that when cells are grown in ethanol, the gluconeogenic activation will also change the Pi homeostasis. This will in part be a result of the fully reversed direction of the GAPDH catalysed reaction (Figure 3G). If such a condition is used, it could lead to misinterpretations, and confound the conclusions that we make from these set of experiments where Pi homeostasis play a major role. In 0.1% glucose it has been shown that gluconeogenesis is still partly repressed (Yin et al., 2003). The pathways utilizing alternate carbon sources still remain repressed (even though to a lower extend compared to 2% glucose) in 0.1% glucose (Yin et al., 2003). We hope that this clarifies the concerns regarding the rationale behind using 0.1% glucose in our experiments.

      The extent of glucose repression is dependent on the concentration of glucose. Glucose concentration >1% has been shown to activate degradation of mRNAs involved in alternate carbon utilization. Different signaling pathways involved in growth under glucose and glucose repression is regulated by glucose concentration. This is discussed in detail in Yin et al., 2002. We (Figure 5figure supplement 1A) also observe a dose dependent increase in mitochondrial membrane potential in the presence of 2DG. This also suggests that the rate of glycolysis (which could be also mediated by changes in glucose concentration) can regulate the extent of mitochondrial derepression.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides important insights into the degradation of a host tRNA modification enzyme TRMT1 by SARS-CoV-2 protease nsp5. The data convincingly support the main conclusions of the paper. These results will be of interest to virologists interested in studying the alterations in tRNA modifications, host methyltransferases, and viral infections.

      Public Reviews:

      Response to Public Reviews

      We appreciate the reviewers’ assessment that our findings are well supported and provide important insight to the field. We also thank the reviewers for their comments and suggestions that have improved the quality of this manuscript. Through the requested edits and experiments, we provide additional results in this revision that further support and extend our original findings.

      We acknowledge the major questions that remain to be addressed, including the biological relevance of TRMT1 cleavage by Nsp5. We note that elucidating the biological role of host protein cleavage by viral proteases has been a long-standing challenge. For example, several endogenous proteins have been identified as cleavage targets of HIV protease, but the functional relevance for many of these cases took decades to resolve or remain unknown to this day. Nonetheless, we have added additional experiments that suggest a possible role for TRMT1 and TRMT1 cleavage in SARS-CoV-2 pathobiology.

      Key additions in the revised manuscript include:

      • Subcellular localization of full-length TRMT1 and TRMT1 fragments (Supplemental Figure 4).

      • Experiments demonstrating that TRMT1 levels are reduced to near background levels in SARS-CoV-2 infected human cells at higher MOI (Figure 6C and D).

      • Results showing that expression of the non-cleavable TRMT1 mutant can promote virion particle infectivity (Figure 8).

      • The addition of an “Ideas and Speculation” subsection that is now being offered to authors by eLife.

      Reviewer #1 (Public Review):

      Zhang et al. investigate the hypothesis that tRNA methyl transferase 1 (TRMT1) is cleaved by NSP5 (nonstructural protein 5 or MPro), the SARS-CoV-2 main protease, during SARS-CoV-2 infection. They provide solid evidence that TRMT1 is a substrate of Nsp5, revealing an Nsp5 target consensus sequence and evidence of TRMT1 cleavage in cells. Their conclusions are exceptionally strong given the co-submission by D'Oliveira et al showing cleavage of TRMT1 in vitro by Nsp5. Separately, the authors convincingly demonstrate widespread downregulation of RNA modifications during CoV-2 infection, including a requirement for TRMT1 in efficient viral replication. This finding is congruent with the authors' previous work defining the impact of TRMT1 and m2,2g on global translation, which is most likely necessary to support infection and virion production. What still remains unclear is the functional relevance of TRMT1 cleavage by Nsp5 during infection. Based on the data provided here, TRMT1 cleavage may be an act by CoV2 to self-limit replication, as the expression of a non-cleavable TRMT1 (versus wild-type TRMT1) supports enhanced viral RNA expression at certain MOIs. Theoretically, TRMT1 cleavage should inactivate the modification activity of TRMT1, which the authors thoroughly and elegantly investigate with rigorous biochemical assays. However, only a minority of TRMT1 undergoes cleavage during infection in this study and thus whether TRMT1 cleavage serves an important functional role during CoV-2 replication will be an important topic for future work. The authors fairly assess their work in this regard. This study pushes forward the idea that control of tRNA expression and functionality is an important and understudied area of host-pathogen interaction.

      We thank the reviewer for the thoughtful assessment of our study.

      We acknowledge that only a minority of TRMT1 undergoes cleavage during infection at the originally tested MOI. However, the ~40% reduction in TRMT1 levels after infection with SARS-CoV-2 is quite substantial considering that the TRMT1 in the nucleus and mitochondria are likely to be inaccessible to Nsp5. Moreover, we detected a reduction in m2,2G modification in the infected human cells, providing evidence for a functional impact on TRMT1 activity (Figure 1C).

      To further test the effects of SARS-CoV-2 infection on endogenous TRMT1, we infected 293T cells at a higher MOI and measured TRMT1 levels. At MOI=5, we found that SARS-CoV-2 infection led to near complete depletion of TRMT1 in human cells. This result suggests that SARS-CoV-2 infection could have a profound impact on TRMT1 levels during pathogenesis. We have added this new experiment as Figures 6C and D.

      Weaknesses noted:

      The detection of the N-terminal TRMT1 fragment by western blot is not robust. The polyclonal antibody used to detect TRMT1 in this work cross-reacts with a non-specific protein product. Unfortunately, this obstructs the visualization of the predicted N-terminal TRMT1 fragment. It is unclear how the authors were able to perform densitometry, given the interference of the nonspecific band. Additionally, the replicates in the source data make it clear that the appearance of the N-terminal fragment "wisp" under the non-specific band is not seen in every replicate. Though the disappearance of this wisp with mutant Nsp5 and uncleavable TRMT1 is reassuring, the detection of the N-terminal fragment with the TRMT1 antibody should be assessed critically. Considering this group has strong research interests in TRMT1, I assume that attempts to make other antibodies have proved unfruitful. Additionally, N-terminal tagging of TRMT1 is predicted to disrupt the mitochondrial targeting signal, eliminating the potential for using alternative antibodies to see the N-terminal fragment.

      We agree that the anti-TRMT1 antibody used here is sub-optimal for detection of the N-terminal TRMT1 fragment. However, as noted by the Reviewer, we provided multiple ways of corroborating that the lower-molecular weight band detected in human cells expressing Nsp5 corresponds to the N-terminal TRMT1 fragment. We have shown that the TRMT1 cleavage band is not detectable in human cells expressing GFP or inactive Nsp5. This indicates that the lower molecular weight TRMT1 band only arises when active Nsp5 protease is expressed. Moreover, the TRMT1 cleavage band is not detectable in TRMT1-KO cell lines, demonstrating that the band arises from TRMT1 cleavage rather than a non-specific protein. We have also detected the C-terminal fragment if TRMT1 is over-expressed with Nsp5. In addition, we have shown that the mutation of the predicted Nsp5 cleavage site in TRMT1 abolishes the appearance of the N- and Cterminal cleavage fragments.

      Despite the drawbacks of this antibody, we identified gel running conditions that resolves the non-specific band from the N-terminal TRMT1 cleavage fragment. Thus, for quantification, we measured the total signal of both the cleavage band and the nonspecific band in all lanes (Figure 3). After normalization to actin, the total signal from the cleavage band and the non-specific band in the control lane from cells expressing GFP was subtracted from the lanes with cells expressing Nsp5 to calculate the signal arising from the cleavage band. We have updated our Materials and Methods to provide details on how we quantified the TRMT1 cleavage band.

      While we did test other antibodies against TRMT1, none of them were sensitive enough to detect TRMT1 cleavage fragments at endogenous levels. For example, we included results with an antibody targeting the C-terminus of TRMT1 that could not detect TRMT1 cleavage products at endogenous levels (Supplemental Figure 3). However, the antibody could detect the C-terminal TRMT1 fragments if TRMT1 was overexpressed with Nsp5 (Supplemental Figure 3).

      These technical issues reiterate the fact that the functional significance of TRMT1 cleavage during CoV-2 infection remains unclear. However, this study demonstrates an important finding that the tRNA modification landscape is altered during CoV-2 infection and that TRMT1 is an important host factor supporting CoV-2 replication.

      We agree that the functional relevance of TRMT1 cleavage by Nsp5 remains an open question. Thus, we have added an experiment to test the functional impact of TRMT1 on virion particle production and infectivity (Figure 8). We find that TRMT1 expression is required for optimal virus production, consistent with our observation that TRMT1deficient cells exhibit reduced viral RNA replication. In addition, we find that expression of the non-cleavable TRMT1 mutant can promote virion particle infectivity (Figure 8, TRMT1-Q530N). These results are consistent with the Reviewer’s conclusion that “TRMT1 cleavage may be an act by CoV-2 to self-limit replication, as the expression of a non-cleavable TRMT1 (versus wild-type TRMT1) supports enhanced viral RNA expression at certain MOIs”. We discuss the potential implications of this result and their functional relevance in the “Ideas and Speculation” subsection.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript titled 'Proteolytic cleavage and inactivation of the TRMT1 tRNA modification enzyme by SARS-CoV-2 main protease' from K. Zhang et al. demonstrates that several RNA modifications are downregulated during SARS-CoV-2 infection including the widespread m2,2G methylation, which potentially contributes to changes in host translation. To understand the molecular basis behind this global hypomodification of RNA during infection, the authors focused on the human methyltransferase TRMT1 that catalyzes the m2,2G modification. They reveal that TRMT1 not only interacts with the main SARS-CoV-2 protease (Nsp5) in human cells but is also cleaved by Nsp5. To establish if TRMT1 cleavage by Nsp5 contributes to the reduction in m2,2G levels, the authors show compelling evidence that the TRMT1 fragments are incapable of methylating the RNA substrates due to loss of RNA binding by the catalytic domain. They further determine that expression of full-length TRMT1 is required for optimal SARS-CoV-2 replication in 293T cells. Nevertheless, the cleavage of TRMT1 was dispensable for SARS-CoV-2 replication hinting at the possibility that TRMT1 could be an off-target or fortuitous substrate of Nsp5. Overall, this study will be of interest to virologists and biologists studying the role of RNA modification and RNA modifying enzymes in viral infection.

      We thank the reviewer for the thoughtful assessment of our study.

      We agree with the possibility that TRMT1 could be a fortuitous substrate of Nsp5 due to the coincidental presence of a Nsp5 cleavage site in TRMT1. As considered in our Discussion section, TRMT1 cleavage could be a collateral effect of SARS-CoV-2 infection. While TRMT1 could be an off-target substrate during viral infection, the subsequent effect on tRNA modification levels could have physiological consequences on downstream processes that affect cellular health. This information could still be useful for understanding the pathophysiological consequences of SARS-CoV-2 infection in tissues.

      Strengths:

      • The authors use a state-of-the-art mass spectrometry approach to quantify RNA modifications in human cells infected with SARS-CoV-2.

      • The authors go to great length to demonstrate that SARS-CoV-2 main protease, Nsp5, interacts, and cleaves TRMT1 in cells and perform important controls when needed. They use a series of overexpression with strategically placed tags on both TRMT1 and Nsp5 to strengthen their observations.

      • The use of an inactive Nsp5 mutant (C145A) strongly supports the claim of the authors that Nsp5 is solely responsible for TRMT1 cleavage in cells.

      • Although the direct cleavage was not experimentally determined, the authors convincingly show that TRMT1 Q530N is not cleaved by Nsp5 suggesting that the predicted cleavage site at this position is most likely the bona fide region processed by Nsp5 in cells.

      • To understand the impact of TRMT1 cleavage on its RNA methylation activity, the authors rigorously test four protein constructs for their capacity not only to bind RNA but also to introduce the m2,2G modification. They demonstrate that the fragments resulting from TRMT1 cleavage are inactive and cannot methylate RNA. They further establish that the C-terminal region of TRMT1 (containing a zinc-finger domain) is the main binding site for RNA.

      • While 293T cells are unlikely an ideal model system to study SARS-CoV-2 infection, the authors use two cell lines and well-designed rescue experiments to uncover that TRMT1 is required for optimal SARS-CoV-2 replication.

      Weaknesses:

      • Immunoblo0ng is extensively used to probe for TRMT1 degradation by Nsp5 in this study. Regretfully, the polyclonal antibody used by the authors shows strong non-specific binding to other epitopes. This complicates the data interpretation and quantification since the cleaved TRMT1 band migrates very closely to a main non-specific band detected by the antibody (for instance Fig 3A). While this reviewer is concerned about the cross-contamination during quantification of the N-TRMT1, the loss of this faint cleaved band with the TRMT1 Q530N mutant is reassuring. Nevertheless, the poor behavior of this antibody for TRMT1 detection was already reported and the authors should have taken better precautions or designed a different strategy to circumvent the limitation of this antibody by relying on additional tags.

      We acknowledge the sub-optimal performance of the commercial anti-TRMT1 antibody used in our study. Nevertheless, we have provided multiple lines of evidence indicating that the lower molecular weight band detected using this antibody corresponds to the N-terminal TRMT1 fragment. As noted by the reviewer, we have shown that the lower molecular weight band disappears using the TRMT1-Q530N non-cleavable mutant. The lower molecular weight signal is also absent in TRMT1-KO cell lines expressing Nsp5. Moreover, we have shown that the TRMT1 cleavage band is undetectable in human cells expressing GFP or inactive Nsp5. We have also detected the C-terminal fragment when TRMT1 is over-expressed with Nsp5.

      As discussed in the response to Reviewer 1, we did consider alternative approaches for detecting the N-terminal fragment. We thought about tagging TRMT1 at the N-terminus so that we could detect the cleavage band using a different antibody. However, as noted by Reviewer 1, the tagging of TRMT1 at the N-terminus is likely to disrupt the mitochondrial targeting signal and alter the localization of TRMT1. In addition, we spent considerable time and effort testing alternative antibodies against TRMT1. However, none of them were effective at detecting the N- or C-terminal TRMT1 fragments. For example, we included results with a different antibody targeting the C-terminus of TRMT1 that could not detect TRMT1 cleavage products at endogenous levels but could detect them when TRMT1 was overexpressed with Nsp5 (Supplemental Figure 3).

      • While 293T cells are convenient to use, it is not a well-suited model system to study SARS-CoV2 infection and replication. Therefore, some of the conclusions from this study might not apply to better-suited cell systems such as Vero E6 cells or might not be observed in patient-infected cells.

      We acknowledge the potential caveats associated with using 293T human embryonic cells as a system for testing SARS-CoV2 replication. However, we note that 293T cells have been used as a physiological model for discovering and characterizing key aspects of SARS-CoV-2 biology, including viral replication. For example, SARS-CoV-2 has been shown to exhibit significant replication and virion production in 293T cells expressing ACE2 that can be inhibited by known SARS-CoV-2 antiviral compounds:

      https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(20)300045/fulltext

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9444585/

      https://www.science.org/doi/10.1126/sciadv.add3867

      https://www.pnas.org/doi/full/10.1073/pnas.2025866118

      293T cells have also been demonstrated to exhibit cytopathic effects upon SARS-CoV-2 infection that are dependent upon the ACE2 receptor and mirror that of infected lung cells in culture and in patient tissues:

      https://www.embopress.org/doi/full/10.15252/embj.2020106267

      https://journals.asm.org/doi/full/10.1128/jvi.00002-22

      https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1009715

      https://www.nature.com/articles/s41559-021-01407-1

      In addition to 293T cells, we have demonstrated that infection of MRC5 human pulmonary fibroblast cells with SARS-CoV-2 results in a decrease in TRMT1 levels and m2,2G modification (Figure 1). The reduction in TRMT1 levels in MRC5 cells after SARS-CoV-2 infection is similar to that observed in 293T cells.

      • The reduction of bulk TRMT1 levels is minor during infection of MRC5 cells with SARS-CoV-2 (Fig 1). This does not seem to agree with the more dramatic reduction in m2,2G modification levels. Cellular Localization experiments of TRMT1 would help clarify this. While TRMT1 is found in the cytoplasm and nucleus, it is possible that TRMT1 is more dramatically degraded in the cytoplasm due to easier access by Nsp5.

      We agree that the processing of newly synthesized TRMT1 in the cytoplasm is likely to be the main cause for the reduction of TRMT1 levels in the infected MRC5 cells. Thus, we followed the Reviewer’s suggestion to conduct cellular localization experiments of TRMT1 (Supplemental Figure 4). Through these experiments, we show that full-length TRMT1 exhibits localization to the cytoplasm, mitochondria, and nucleus, consistent with prior findings from our group and others. This result supports the conclusion that cytoplasmic TRMT1 is the likely target of Nsp5 cleavage while TRMT1 in the nucleus and mitochondria are inaccessible to Nsp5. We also note that the decrease in cytoplasmic TRMT1 could account for the reduction in m2,2G modifications if the cytoplasmic pool of TRMT1 is responsible for modifying any exported tRNAs that were not modified in the nucleus.

      • In Fig 6, the authors show that TRMT1 is required for optimal SARS-CoV-2 replication. This can be rescued by expressing TRMT1 (Fig 7). Nevertheless, it is unknown if the methylation activity of TRMT1 is required. The authors could have expressed an inactive TRMT1 mutant (by disrupting the SAM binding site) to establish if the RNA modification by TRMT1 is important for SARS-CoV-2 replication or if it is the protein backbone that might contribute to other processes.

      We agree that it would be interesting to test if the methylation activity of TRMT1 is important for optimal SARS-CoV-2 replication. However, the present study focuses on the cleavage of TRMT1 by Nsp5 and the biological effects of this cleavage. Thus, we feel that generating another human cell line lies outside the scope of this paper and would be an excellent idea for future studies. We thank the reviewer for the proposed experiment.

      • Fig 7, the authors used the Q530N variant to rescue SARS-CoV-2 replication in TRMT1 KO cells. This is an important experiment and unexpectedly reveals that TRMT1 cleavage by Nsp5 is not required for viral replication. To strengthen the claim of the authors that TRMT1 is required to promote viral replication and that its cleavage inhibits RNA methylation, the authors could express the TRMT1 N-terminal construct in the TRMT1 KO cells to assess if viral replication is restored or not to similar levels as WT TRMT1. This will further validate the potential biological importance of TRMT1 cleavage by Nsp5.

      Indeed, we did not expect to find that human cells expressing the TRMT1-Q530N variant exhibit higher levels of viral replication. This suggests that cleavage of TRMT1 is inhibitory for viral replication. To provide further support for this observation, we analyzed the viral titer and infectivity of supernatants derived from human cells expressing wildtype TRMT1 or TRMT1-Q530N. Consistent with our finding that TRMT1-Q530N cells contain more viral RNA, the media supernatants from TRMT1Q530N expressing cells exhibit higher viral titer and infectivity compared to supernatants from TRMT1-KO cells expressing wildtype TRMT1. These results provide additional evidence that TRMT1 is required to promote viral replication. Moreover, these findings suggest that TRMT1 cleavage and reduced protein synthesis could selflimit viral replication. The additional results have been added as Figure 8.

      • Fig 7 shows that the TRMT1 Q530N variant rescues SARS-CoV-2 replication to greater levels then WT TRMT1. The authors should discuss this in greater detail and its possible implications with their proposed statement. For instance, are m2,2G levels higher in Q530N compared to WT? Does Q530N co-elute with Nsp5 or is the interaction disrupted in cells?

      These are excellent points brought up by the Reviewer. As noted above, we have added an additional experiment that tests the functional relevance of TRMT1 expression and cleavage on virion production and infectivity (Figure 8). Moreover, we have followed the Reviewer’s suggestion and discussed the potential implications of these findings in the “Ideas and Speculation” subsection.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors have used biochemical approaches to provide compelling evidence for the cleavage of TRMT1 by SARS-CoV-2 Nsp5 protease. This work is of wide interest to biochemists, cell biologists, and structural biologists in the coronavirus (CoV) field. Furthermore, it substantially advances the understanding of how CoV's interact with host factors during infection and modify cellular metabolism.

      We thank the reviewer for the thoughtful assessment of our study.

      Strengths:

      The authors provide multiple lines of biochemical evidence to report a TRMT1-Nsp5 interaction during SARS-CoV-2 infection. They show that the host enzyme TRMT1 is cleaved at a specific site and that it generates fragments that are incapable of functioning properly. This is an important result because TRMT1 is a critical player in host protein synthesis. This also advances our understanding of virus-host interactions during SARS-CoV-2 infections.

      Weaknesses:

      The major weakness is the lack of mechanistic insights into TRMT1-Nsp5 interactions. The authors have provided commendable biochemical data on proving the TRMT1-Nsp5 interaction but without clear mechanistic insights into when this interaction takes place in the context of SARS-CoV-2 propagation, what are the functional consequences of this interaction on host biology, and does this somehow benefit the infecting virus? I feel that the authors played it a bit safe despite having access to several reagents and an extremely promising research direction.

      We agree that our findings have prompted questions on the mechanistic and functional relevance of TRMT1 cleavage by Nsp5. To begin addressing the latter point, we have included a new experiment testing the impact of TRMT1 expression and cleavage on SARS-CoV-2 virus production and infectivity (Figure 8). We find that TRMT1-deficient cells infected with SARS-CoV-2 exhibit less virion production and the viruses produced are less infectious. Intriguingly, we find that expression of the non-cleavable TRMT1-Q530N variant in TRMT1-KO cells promotes an increase of viral titer as well as infectivity compared to expression of wildtype TRMT1. These results provide evidence for an unexpected role for TRMT1 expression in virus production and the generation of optimally infectious SARS-CoV-2 particles. We discuss the potential implications of this finding in the “Ideas and Speculation” subsection.

      We agree that understanding the timing and effects of Nsp5-TRMT1 interaction will be an important area of investigation moving forward. We would like to include additional time points beyond 24- and 48-hours post-infection. However, we have found that the MRC5-ACE2 cells exhibited increased levels of cell death at 72 and 96-hours postinfection that could confound results (Raymonda et al 2022). Moreover, we would like to know how the reduction in m2,2G modifications affects host tRNA biology and translation. However, these experiments involve large-scale methods such as tRNA sequencing and ribosome profiling which are outside the scope of our current studies and will be the subject of future efforts.

      We acknowledge the Reviewer’s assessment that we “played it a bit safe” in discussing the functional consequences of Nsp5-TRMT1 interaction. We aimed for a circumspect interpretation of our results and their biological implications, but might have been too cautious in our conclusions. Thus, we have added an “Ideas and Speculation” subsection that discusses possible reasons for how TRMT1 cleavage and interaction with Nsp5 could benefit the virus. We thank the Reviewer for pointing out this issue in our initial manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Having reviewed an earlier version of this manuscript, I appreciated the recent progress made by the authors. I felt the entire body of work is quite solid and the interpretations are clear and not overstated. One piece of data I thought deserved a sentence or two of discussion was the complementation assay with Q530N TRMT1. This experiment suggests the possibility that cleavage of TRMT1 by Nsp5 may be an act to self-limit replication, although this result could also be due to the elevated levels of Q530N TRMT1 expression compared to WT. I still think it is worthy of discussion. Another thing I would recommend is to include the length of infection by SARS-CoV-2 in the figure legends.

      We thank the reviewer for their positive response and constructive comments.

      We have followed the Reviewer’s suggestion to further discuss how cleavage of TRMT1 may act to self-limit replication in the “Ideas and Speculation” subsection. We have also included the length of infection by SARS-CoV-2 in the figure legends.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the comments mentioned in the public review, this reviewer encourages the authors to address the following points:

      • Please clarify the rationale behind choosing 24 and 48 hours post-infection as time points for the analyses (Fig 1). One would expect even lower levels of TRMT1 and RNA modification after 72 and 96 hours post-infection.

      We chose the 24 and 48-hour time points since we have shown that MRC5 cells exhibit elevated accumulation of viral RNA at these time points (Raymonda et al 2022). However, at 72 and 96-hours post-infection, we have found that the MRC5-ACE2 cells exhibited cytopathic effects indicative of cell death that could confound results. We have included the rationale for these time points in our revised manuscript.

      • In Supplementary Figure 3, please add in the legend the meaning of the asterisk symbol.

      The asterisks denote non-specific bands that are still detectable in the TRMT1-KO cell line. We have updated the Figure Legend and thank the Reviewer for catching this omission.

      • In Supplementary Figure 3B, there is an intermediate band in lane 3 with C145A when using the antibody 609-659. The authors should clarify what that band is.

      The intermediate band in lane 3 (and in lane 6) of Supplemental Figure 3B represents non-specific detection of the Nsp5-C145A variant that exhibits extremely high levels of expression since it cannot self-cleave. We have clarified the identity of the band in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      I have only minor comments:

      Although the authors have done a commendable job of providing compelling biochemical evidence of TRMT1 cleavage by Nsp5, it is not clear how this enhances viral infection. The discussion presents the experimental findings and prior publications as a series of correlated observations without clearly specifying the mechanistic benefits of TRMT1 hijacking towards CoV propagation, or even proposing a mechanistic hypothesis to this end.

      We agree with the Reviewer that providing a mechanistic hypothesis on how TRMT1 cleavage impacts virus biology will help inform future studies. We have followed the Reviewer’s suggestion and discuss potential mechanisms in the “Ideas and Speculation” subsection.

      How do these experiments inform us about the cell biology of SARS-CoV- infections? Does Nsp5-mediated degradation start early in infection? Is the loss of TRMT1 sustained over the course of the infection? Do Nsp5 concentrations or relative amounts correlate with TRMT1 loss during this period? For instance, is there only a modest increase in Nsp5 levels from 24h to 48h? I would suggest adding a few more data points than just 24h and 48h in the cell culture experiments. As the manuscript stands right now, it will be a bit difficult for readers to appreciate the relevance of this study in its present form.

      These are excellent questions raised by the Reviewer. The temporal effects of SARSCoV-2 infection on TRMT1 levels will be an important area to dissect moving forward.

      As mentioned above, we would like to include additional time points beyond 24- and 48-hours post-infection. However, at 72 and 96-hours post-infection, we have found that the MRC5-ACE2 cells exhibited increased levels of cell death that could confound results.

      However, we do observe a correlation between the level of infection and the amount of TRMT1 depletion. In our newly added Figure 6C and 6D, we show that increasing the MOI leads to a concomitant increase in N-protein production that correlates with the amount of TRMT1 depletion. Moreover, we have added additional experiments to explore the biological relevance of our findings in terms of virion particle production and infectivity. We thank the reviewer for these insightful questions that have improved our manuscript and provide a foundation for future studies.

      Related to this previous comment: how do the authors rationalize their inference that TRMT1 is essential for SARS-CoV-2 infection, yet it is cleaved during the infection? What seems to be the advantage of this seemingly contradictory but possibly quite intriguing inference?

      We acknowledge the paradox that TRMT1 seems to be essential for SARS-CoV-2 replication but is cleaved during the infection. We propose several hypotheses to explain these findings:

      Hypothesis 1: TRMT1 could be a bystander target. The loss of TRMT1 expression leads to a decrease in modifications that impacts translation. This decrease in translation capacity of the infected cells would lead to decreased production of viral proteins and reduced viral replication. This could explain why TRMT1-deficient cells exhibit less virus production. This could also account for why the TRMT1-Q530N mutant might produce more virus. In this case, the cleavage of TRMT1 and biological effects on viral replication and virion production are coincidental. However, even if TRMT1 cleavage and inactivation does not impact viral replication or production, it would still be important to know the cellular impacts that contribute to disease pathogenesis.

      Hypothesis 2: The slight diminishment of viral replication due to host translation inhibition could outweigh the benefits of shutting down host responses dependent upon protein synthesis. The decrease in TRMT1-catalyzed tRNA modification caused by Nsp5 cleavage could severely inhibit host translation while viral translation can still be maintained through a tRNA pool optimized for viral translation, albeit at a slightly lower rate than if TRMT1 is not cleaved.

      Hypotheses 3: The Nsp5-TRMT1 interaction could allow the virus to bind tRNAs that are packaged in viral particles as suggested previously (Pena et al., 2022). The finding that expression of the non-cleavable TRMT1-Q530N variant enhances viral replication and infectivity supports the hypothesis that TRMT1 could facilitate tRNA uptake into viral particles. The packaging of specific tRNAs in viral particles could enhance viral translation in the subsequent round of infection, thereby enhancing infectivity and perhaps facilitating the species jump of SARS-CoV-2 towards hosts with incompatible codon bias.

      We have included these hypotheses in the new “Ideas and Speculation” subsection.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers:

      We would like to thank all the reviewers and the editors for their thorough and helpful feedback on our work. Before addressing specific questions and points, we would like to make a general comment on a mechanistic aspect of this study. The reviewers correctly pointed out that our study does not reveal the molecular mechanism that leads to centromeric histone depletion specifically from meiotic chromosomes. Identifying this mechanism requires a deep and thorough understanding of how centromeric histones are loaded and centromeres are established each cell cycle, and how they are maintained over time in different cell types. To our knowledge, these mechanisms have not been described in plants. To add a further layer of complexity, it appears that the mechanisms governing CENH3 maintenance may be (partially) different in plant mitotic and meiotic cells, and the mechanistic basis of this difference is unknown. Obviously, these are interesting but also complex questions and their resolution will require considerable resources and effort, which we believe is beyond the scope of this manuscript. Nevertheless, our finding that CENH3 maintenance and centromere function in meiotic cells are sensitive to heat stress is an unexpected discovery with profound implications for plant adaptation, which provides a strong incentive for further exploration of centromere maintenance mechanisms in plants.

      Furthermore, we would like to apologize to reviewers for poor quality of pictures in the original submission. It was decreased by conversion to a pdf format during submission.

      eLife assessment

      This important study reports how heat stress affects centromere integrity by compromising the loading of the centromere protein CENH3 and by prolonging the spindle assembly checkpoint during male meiosis in Arabidopsis thaliana. The evidence supporting the claims by live cell imaging is convincing, although deeper mechanistic insight is lacking, making the study overall somewhat preliminary in nature. This work will be of interest to a broad audience of biologists working on how chromatin states are affected by stress conditions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Khaitova and co-workers present here an analysis of centromere composition and function during elevated temperatures in the plant Arabidopsis. The work relates to the ongoing climate change during which spikes in high temperatures will be found. Hence, the paper addresses a timely subject.

      The authors start by confirming earlier studies that high temperatures reduce the fertility of Arabidopsis plants. Interestingly, a hypomorphic mutant of the centromeric histone variant CENH3 (CENP-A), which was previously described by the authors, sensitizes plants to heat and results in a drop in viable pollen and silique length. The drop in fertility coincides with the formation of micronuclei in meiosis and an extension of meiotic progression as revealed by live cell imaging. Based on this finding, the authors then show that at high temperatures, the fluorescence intensity of a YFP:CENH3 declines in meiosis but remarkably not in the surrounding cells (tapetum cells). In addition, the amount of BMF1 (a Bub1 homolog and part of the spindle assembly checkpoint) also appears to decline on the kinetochores of meiocytes as judged by BMF1 reporter line. However, whether this is dependent on a decline of CENH3 or represents a separate pathway is not clear.

      We provide new data in Figure S6 showing that BMF1 loading on centromeres is substantially reduced in cenh3-4 mutants. Thus, efficient tethering of BMF1 to centromeres depends on CENH3.

      Finally, the authors measure the duration of the spindle checkpoint and find that it is extended under high temperatures from which they conclude that the attachment of spindle fibers to kinetochores is compromised under heat.

      Strengths:

      This is an interesting and important paper as it links centromere organization/function to heat stress in plants. A major conclusion of the authors is that weakened centromeres, presumably by heat, may be less effective in establishing productive interactions with spindle microtubules.

      Weaknesses:

      The paper does not explain the molecular reason why CENH3 levels in meiocyctes are reduced or why the attachment of spindle fibers to kinetochore is less efficient at high versus low temperatures.

      While we cannot explain the molecular mechanism underlying temperature-dependent depletion of CENH3 in meiocytes, the less efficient attachment of microtubules to the kinetochores at higher temperatures is likely caused by reduced levels of CENH3, which result in smaller centromeres that are less effective in establishing productive microtubule-kinetochore attachments. Here (new Figure S6) and in our previous study (Capitao et al. 2021), we have shown that amount of centromere/kinetochore proteins is reduced at centromeres in cenh3-4 mutants, and that these plants exhibit prolonged SAC and slower chromosome biorientation.

      Reviewer #2 (Public Review):

      Summary:

      This work investigates how increased temperature affects pollen production and fertility of Arabidopsis thaliana plants grown at selected temperature conditions ranging from 16C to 30C. They report that pollen production and fertility decline with increasing temperature. To identify the cause of reduced pollen and fertility, they resort to living cell imaging of male meiotic cells to identify that the duration of meiosis increases with an increase in temperature. They also show that pollen sterility is associated with the increased presence of micronuclei likely originating from heat stress-induced impaired meiotic chromosome segregation. They correlate abnormal meiosis to weakened centromere caused by meiosis-specific defective loading of the centromere-specific histone H3 variant (CenH3) to the meiotic centromeres. Similar is the case with kinetochore-associated spindle assembly checkpoint(SAC) protein BMF1. Intriguingly, they observe a reverse trend of strong CENH3 presence in the somatic cells of the tapetum in contrast to reduced loading of CENH3 in male meiocytes with increasing temperature. In contrast to CENH3 and BMF1, the SAC protein BMF3 persists for longer periods than the WT control, based on which authors conclude that the heat stress prolongs the duration of SAC at metaphase I, which in turn extends the time of chromosome biorientation during meiosis I. The study provides preliminary insights into the processes that affect plant reproduction with increasing temperatures which may be relevant to develop climate-resilient cultivars.

      Strengths:

      The authors have mastered the live cell imaging of male meiocytes which is a technically demanding exercise, which they have successfully employed to examine the time course of meiosis in Arabidopsis thaliana plants exposed to different temperature conditions. In continuation, they also monitor the loading dynamics and resident time of fluorescently tagged centromere/kinetochore proteins and spindle assembly checkpoint proteins to precisely measure the time duration of respective proteins to study their precise dynamics and function in male meiosis.

      Weaknesses:

      Here the authors use only one representative centromere protein CENH3, one kinetochore-associated SAC protein BMF1, and the SAC protein BMF3 to conclude that heat stress impairs centromere function and prolongs SAC with increased temperatures. Centromere and its associated protein complex the kinetochores and the SAC contain a multitude of proteins, some of which are well characterized in Arabidopsis thaliana. Hence the authors could have used additional such tagged proteins to further strengthen their claim.

      Indeed, several other proteins have recently been characterized as centromere/kinetochore components and could have been included in the study to further validate the results presented. To strengthen our argument, we have added new experimental data (Figure S4) showing temperature-induced depletion of CENH3 in wild-type plants by immunocytology. Thus, we convincingly show that temperature stress reduces the amount of CENH3. This is likely to affect the loading of most kinetochore and centromeric proteins. Here (new Figure S6) and in our previous study (Capitao et al., 2021), we have shown that genetic depletion of CENH3 in cenh3-4 mutants results in reduced loading of CENPC, MIS12 and BMF1 at mitotic centromeres and reduced loading of BMF3 and BMF1 at meiotic centromeres. We also attempted to assess the levels of CENPC and MIS12 on meiotic chromosomes by immunocytology, but our antibodies, which work on mitotic spreads, did not stain meiotic chromosomes.

      Though the results presented here are interesting and solid, the study lacks a deeper mechanistic understanding of what causes the defective loading of CenH3 to the centromeres, and why the SAC protein BMF3 persists only at meiotic centromeres to prolong the spindle assembly checkpoint. Also, this observation should be interpreted in light of the fact that SAC is not that robust in plants as several null mutants of plant SAC components are known to grow as healthy as wild-type plants at normal growth conditions without any vegetative and reproductive defects.

      Thank you for raising this point. We are of the opinion that SAC operates and it is important in plants - we have added a citation to a preprint from the Schnittger lab (Lampou et al., 2023, BioRxiv) that was published while this manuscript was under review. We think this is the most comprehensive analysis of plant SAC to date, clearly showing that SAC delays progression to anaphase in the presence of spindle inhibitors, although adaptation eventually occurs and the cell cycle progresses. This is very similar to the situation in animals, which also undergo spindle adaptation in similar situations. The difference between plants and animals may be due to subsequent events, where plants are better able to tolerate genome instability and resume cell division in the presence of abnormal chromosome numbers. Robustness and redundancy may be another reason why plant mutants deficient in SAC do not show obvious growth retardation.

      One of the immediate responses to heat stress is the production of heat shock proteins(Hsps), which act as molecular chaperones to safeguard the proteome. It will be interesting to see if the expression levels of known HsPs can be correlated with their role in stabilizing the structure of SAC proteins like BMF1 to prolong its presence at the meiotic kinetochores.

      Indeed, the heat stress response is likely to be involved in this process. We sought to investigate the role of this pathway by analyzing Arabidopsis mutants deficient in HEAT-SHOCK FACTOR BINDING PROTEIN (HSBP), which acts as a negative regulator of the heat shock response. This experiment was prompted by the observation that hsbp mutants have reduced fertility. We expected that an unrestricted heat stress response might affect meiosis and pollen formation. However, our initial experiments did not show altered pollen viability in response to heat stress in hsbp plants and we did not pursue this line of research further.

      Reviewer #3 (Public Review):

      Summary:

      Khaitova et al. report the formation of micronuclei during Arabidopsis meiosis under elevated temperatures. Micronuclei form when chromosomes are not correctly collected to the cellular poles in dividing cells. This happens when whole chromosomes or fragments are not properly attached to the kinetochore microtubules. The incidence of micronuclei formation is shown to increase at elevated temperatures in wild-type and more so in the weak centromere histone mutant cenH3-4. The number of micronuclei formed at high temperatures in the recombination mutant spo11 is like that in wild-type, indicating that the increased sensitivity of cenh3-4 is not related to the putative role of cenh3 in recombination. The abundance of CENH3-GFP at the centromere declines with higher temperature and correlates with a decline in spindle assembly checkpoint factor BMF1-GFP at the centromeres. The reduction in CENH3-GFP under heat is observed in meiocytes whereas CENH3-GFP abundance increases in the tapetum, suggesting there is a differential regulation of centromere loading in these two cell types. These observations are in line with previous reports on haploidization mutants and their hypersensitivity to heat stress.

      Strengths:

      This paper is an important contribution to our insights into the impact of heat stress on sexual reproduction in plants.

      Weaknesses:

      While it is highly significant, I struggled to interpret the results because of the poor quality of the figures and the videos.

      We apologize for the poor quality of the figures. The figure resolution was drastically reduced during the conversion of the manuscript to pdf on publisher web site.

      Reviewer #1 (Recommendations For The Authors):

      To complete the presented analysis, it would be great to analyze the signal strength of the here-presented BMF3 reporter at high temps, see below for further reasoning.

      Quantification of the BMF3 signal is difficult - it is only transiently associated with kinetochores and its level changes over time. Nevertheless, analysis of our movies taken under the same microscope settings indicates that the amount of BMF3 decreases with increasing temperature. This is illustrated in the new Figure S6C.

      Conversely, how is the BMF1 and BMF3 signal strength in cenh3-4 mutants?

      We performed an analysis of BMF1 and BMF3 signal in cenh3-4 mutants and observed a reduced level of signal from both proteins (Figure S6). In the case of BMF1, no signal was detectable in either somatic or meiotic cells.

      How do the authors explain the reduction in BMF1 signal at 26 and 30{degree sign}C versus the extension of the duration of the SAC as measured by the persistence of a BMF3 signal (line 192: "...reduces the amount of CENH3 and the kinetochore protein BMF1 on meiotic centromeres, potentially affecting their functionality..." versus line 213: "...We observed that while the BMF3:GFP signal persisted, on average, for about 22.7 min at 21 and 26{degree sign}C, its appearance was prolonged to 40.5 min at 30{degree sign}C..."). Is the BMF3 signal also reduced at high temps (see question above)?

      This is a very interesting point. While we see reduced levels of both proteins under heat stress or in cenh3-4 plants, the effect on BMF1 is much more pronounced and becomes undetectable under these conditions. This contrasts with BMF3, which appears to be reduced but is still clearly visible. These data suggest that BMF1 is more sensitive to reduced levels of CENH3 and it further corroborates the findings from the Schnittger lab that BMF1 is not the core component of SAC.

      Line 18-20: The observation that heat stress reduces fertility has been made by several research teams before this study. I propose to write "confirm"/"support" etc. instead of "reveal" to avoid a (presumably not intended) false priority claim in the abstract.

      We apologize, this was unintentional and we cite the relevant literature in the article. We have rewritten the abstract to avoid this impression.

      Figure 2: The panel/legend appears to be a bit mixed up. Panel C is described in legend under A. In addition, I cannot find any blue arrows in panel A (which is described as panel B). Correspondingly, the references to the panels in this figure (lines 134/135 and following) need to be updated. I am also not sure how the meiocytes in this figure were stained. The dots look like centromeres but then their intensity rather increases with increasing temperature. If correct, how can this be reconciled with the authors' statement that centromeres decrease in size at higher temps?

      We apologize for the mix up. An early version of the Figure was accidentally submitted and we now corrected it. The Panel B shows DAPI stained meiocytes at the tetrad stage and examples of micronuclei are indicated by arrowheads.

      Line 520: Should read "genotype" not "phenotype".

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) It is intriguing that heat stress impairs only the centromeres and segregation of meiotic chromosomes but not the mitotic chromosomes. No analysis of mitotic divisions is provided in the manuscript. As they have generated marker lines, it is reasonable to examine the mitotic time course as well by live monitoring of root tissues exposed to similar temperature conditions as done for meiotic analysis. This will help to address the effect of heat stress on mitotic centromeres and its comparison with meiosis will provide a better picture. There are two likely outcomes during mitosis:

      (a) It is possible that the heat stress also slows down mitotic progression as well as is the case in meiosis as shown in this paper and hence it is important to examine those as well to compare and contrast the CENH3/BMF1 dynamics in mitosis and meiosis.

      (b) The second scenario is that there is no effect of heat stress on the centromere integrity of mitotic chromosomes. In fact, the authors show indirect evidence in support of this wherein the eYFP: CENH3 showed a strong signal in the tapetal cells (somatic origin) surrounding the male meiocytes (generative origin). It is interesting that somatic cells of the tapetum show a strong signal whereas the meiocytes lack this. The authors should elaborate on this contrasting result.

      The effect we observed seems to be specific to meiosis. We analyzed the progression of mitosis in root cells and we see a negligible effect of temperature on mitotic progression and no micronuclei formation. Interestingly, in terms of CENH3 loading, root cells show a slight decrease in CENH3 at 30°C, in contrast to the situation in tapetum cells. These and other data suggest a tissue/cell specific behavior of centromere maintenance and deserve further analysis. We plan to publish data on mitosis and tissue-specific aspects of CENH3 loading in a separate manuscript.

      (2) Spindle assembly checkpoint (SAC) comprises several core proteins that are recruited to the kinetochores to correct the errors during the defective cell cycle. Here the authors demonstrate the prolonged presence of BMF3 as the only proof to claim that heat stress prolongs the spindle assembly checkpoint during metaphase I. Have the authors observed the dynamics of any other SAC core components such as MAD1, MAD2, MPS1, BUB3, and the like during heat stress?

      No, we did not. We provide several independent lines of evidence that centromere structure and functionality are affected, and spindle checkpoint analysis is only one of them. At the time we designed these experiments, the only experimentally validated and well-characterized component of the SAC was BMF3, and we used only on this protein as SAC reporter because a general analysis of the SAC was not the primary goal of our study. While this paper was under review, a preprint from the Schnittger lab focusing on plant SAC was published that comprehensively analyzed these SAC components in Arabidopsis and provided a solid foundation and resources for further research in this direction. This study also uses BMF3 as a reporter for SAC in meiotic cells. It is noteworthy that despite using different microscopic methods and different plant reporter lines, our labs independently arrived at exactly the same duration of BMF3 association with the kinetochore (i.e. 22 min).

      (3) Is BMF1 a component of SAC or the kinetochore? I understand that BMF1 is a part of the core SAC ( Komaki and Schnittger, 2017) although it localizes to the kinetochore. There are well-characterized kinetochore proteins in Arabidopsis such as Mis12, NUF2, NNF1, and SPC24(MUN1) which the authors could have used as a kinetochore marker. Regardless, here the authors used it as a kinetochore marker. Being a part of SAC, one would expect the prolonged presence of BMF1 similar to BMF3 in the meiotic kinetochores but it is the other way. How to explain these contrasting results?

      As discussed in the public section of the review, BMF1 does not seem to be the core component of SAC. Furthermore, this protein localizes to centromeres/kinetochore throughout the cell cycle and therefore, it cannot be used as SAC reporter.

      (4) Micronuclei can form as a result of chromosome missegregation as shown for spo11-1 and also due to segregation error caused by DNA repair defects. Here it is not clear what is the origin of micronuclei. It is very hard to decipher from live cell imaging. A simple meiotic spread of anthers of different treatments would address the origin of micronuclei.

      Cytology cannot easily determine the origin of micronuclei in meiotic cells. Acentric fragments produced from aberrant DNA repair will still be cytologically detectable only after metaphase I as they are tethered to the remaining chromatin via cohesion. Therefore, we took advantage of spo11 mutants that do not form any meiotic breaks, and hence cannot generate acentric fragments by aberrant repair, to discriminate the origin of micronuclei. We reason that all micronuclei produced in spo11 plants originate from chromosome mis-segregation and their increase at elevated temperature support the notion that heat stress further impairs chromosome segregation.

      (5) Fig.1 B The microspores are not clearly visible in the alexander-stained anthers. It is not clear which is fertile and which is sterile. A better quality picture would be ideal to appreciate the fact.

      Again, we apologize for poor quality of pictures due to manuscript conversion.

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figure 2, it should be pointed out where the micronuclei are. I see here and there a single bright spot. In Arabidopsis, we have noticed bright spots under stress conditions that are autofluorescent signals. It needs to be shown that these spots are not observed in non-GFP lines. Better image quality may help too.

      The micronuclei in Figure 2 are visualized by DAPI staining, not with GFP. The nuclei are now indicated by arrowheads.

      (2) It was not possible to see the centromeres in Figure 3 hence I could not verify the fluorescence intensities of CENH3 and BMF1. There is also something wrong with the color codes blue and red in fig3B, C, and D.

      Again, we apologize for poor quality of pictures due to manuscript conversion.

      (3) Also in the videos it would help to point out where the micronuclei are seen. At what stage were these nuclei quantified? Given that meiosis progression in the cenh3-4 mutant is slower, it may be necessary to wait long enough to see established micronuclei. This information is supposed to be presented in Figure 2C. However, the X-axis shows time, not number. So I presume Fig 2C shows the duration of meiosis stages in the mutant. In Fig 2B, it shows the number of micronuclei per lobe. However, to correlate the incidence of micronuclei formation and the frequency of polyad formation (inviable microspores), one needs the quantification of the numbers of meiocytes carrying micronuclei. Then one can correlate the number of pollen per anther (shown in Fig 1c) with the incidence of micronuclei formation. The question of whether the degree of fertility reduction is due to micronuclei formation is a major issue that should be clarified.

      Then micronuclei were not quantified from the movies, but from DAPI stained whole anthers at the tetrad stage as indicated in the main text. We also apologize for confusion with the Figure 2 as we mixed up the panels in the original submission. This has been corrected in the new submission.

    1. Reviewer #1 (Public Review):

      Summary:

      In this study, Nishi et al. claim that the ratio of long-term hematopoietic stem cell (LT-HSC) versus short-term HSC (ST-HSC) determines the lineage output of HSCs and reduced ratio of ST-HSC in aged mice causes myeloid-biased hematopoiesis. The authors used Hoxb5 reporter mice to isolate LT-HSC and ST-HSC and performed molecular analyses and transplantation assays to support their arguments. How the hematopoietic system becomes myeloid-biased upon aging is an important question with many implications in the disease context as well. However, their study is descriptive with remaining questions.

      Weaknesses:

      (1) The authors may need conceptual re-framing of their main argument because whether the ST-HSCs used in this study are functionally indeed short-term "HSCs" is questionable. The data presented in this study and their immunophenotypic definition of ST-HSCs (Lineage negative/Sca-1+/c-Kit+/Flk2-/CD34-/CD150+/Hoxb5-) suggest that authors may find hematopoietic stem cell-like lymphoid progenitors as previously shown for megakaryocyte lineage (Haas et al., Cell stem cell. 2015) or, as the authors briefly mentioned in the discussion, Hoxb5- HSCs could be lymphoid-biased HSCs. The authors disputed the idea that Hoxb5- HSCs as lymphoid-biased HSCs based on their previous 4 weeks post-transplantation data (Chen et al., 2016). However, they overlooked the possibility of myeloid reprogramming of lymphoid-biased population during regenerative conditions (Pietras et al., Cell stem cell., 2015). In other words, early post-transplant ST-HSCs (Hoxb5- HSCs) can be seen as lacking the phenotypic lymphoid-biased HSCs. Thinking of their ST-HSCs as hematopoietic stem cell-like lymphoid progenitors or lymphoid-biased HSCs makes more sense conceptually as well. ST-HSCs come from LT-HSCs and further differentiate into lineage-biased multipotent progenitor (MPP) populations including myeloid-biased MPP2 and MPP3. Based on the authors' claim, LT-HSCs (Hoxb5- HSCs) have no lineage bias even in aged mice. Then these LT-HSCs make ST-HSCs, which produce mostly memory T cells. These memory T cell-producing ST-HSCs then produce MPPs including myeloid-biased MPP2 and MPP3. This differentiation trajectory is hard to accept. If we think Hoxb5- HSCs (ST-HSCs by authors) as a sub-population of immunophenotypic HSCs with lymphoid lineage bias or hematopoietic stem cell-like lymphoid progenitors, the differentiation trajectory has no flaw.

      (2) Authors' experimental designs have some caveats to support their claims. Authors claimed that aged LT-HSCs have no myeloid-biased clone expansion using transplantation assays. In these experiments, authors used 10 HSCs and young mice as recipients. Given the huge expansion of old HSC by number and known heterogeneity in immunophenotypically defined HSC populations, it is questionable how 10 out of so many old HSCs can faithfully represent the old HSC population. The Hoxb5+ old HSC primary and secondary recipient mice data (Figure 2C and D) support this concern. In addition, they only used young recipients. Considering the importance of the inflammatory aged niche in the myeloid-biased lineage output, transplanting young vs old LT-HSCs into aged mice will complete the whole picture.

      (3) The authors' molecular data analyses need more rigor with unbiased approaches. They claimed that neither aged LT-HSCs nor aged ST-HSCs exhibited myeloid or lymphoid gene set enrichment but aged bulk HSCs, which are just a sum of LT-HSCs and ST-HSCs by their gating scheme (Figure 4A), showed the "tendency" of enrichment of myeloid-related genes based on the selected gene set (Figure 4D). Although the proportion of ST-HSCs is reduced in bulk HSCs upon aging, since ST-HSCs do not exhibit lymphoid gene set enrichment based on their data, it is hard to understand how aged bulk HSCs have more myeloid gene set enrichment compared to young bulk HSCs. This bulk HSC data rather suggests that there could be a trend toward certain lineage bias (although not significant) in aged LT-HSCs or ST-HSCs. The authors need to verify the molecular lineage priming of LT-HSCs and ST-HSCs using another comprehensive dataset.

      (4) Some data are too weak to fully support their claims. The authors claimed that age-associated extramedullary changes are the main driver of myeloid-biased hematopoiesis based on no major differences in progenitor populations upon transplantation of 10 young HSCs into young or old recipient mice (Figure 7F) and relatively low donor-derived cells in thymus and spleen in aged recipient mice (Figure 7G-J). However, they used selected mice to calculate the progenitor populations in recipient mice (8 out of 17 from young recipients denoted by * and 8 out of 10 from aged recipients denoted by * in Figure 7C). In addition, they calculated the progenitor populations as frequency in c-kit positive cells. Given that they transplanted 10 LT-HSCs into "sub-lethally" irradiated mice and 8.7 Gy irradiation can have different effects on bone marrow clearance in young vs old mice, it is not clear whether this data is reliable enough to support their claims. The same concern applies to the data Figure 7G-J. Authors need to provide alternative data to support their claims.

    1. ment. Performance studies taught us t"acting" is not just something set apart from reality, but a model of and for the procethrough which real identities are constructed. What I suggest here is that we cbegin to think of animation as more than an entertainment medium, as a possimode of performative (real, social) world

      Just like how performance affects how someone's identity is constrcuted, animation may also change the real world.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      Regarding significance, we would like to highlight that the main finding and breakthrough of our manuscript is the discovery that intronic polyadenylation (IPA) isoforms are a source of microproteins (indeed, IPA was not known to induce sORF-encoded microproteins). We make the proof of principle of this concept (called miP-5’UTR-IPA) and of its functional relevance for one gene (PRKAR1B).

      A second finding of this study is that IPA (including miP-5’UTR-IPA) isoforms are widely upregulated in cell response to cisplatin, and therefore we show the functional relevance of miP-5’UTR-IPA isoforms in this biological context.

      Regarding the generality of the miP-5’UTR-IPA concept, we provide evidence that many genes generate miP-5’UTR-IPA isoforms, by crossing our 3’-seq data with available Ribo-Seq and mass spectrometry datasets, which were generated without cisplatin treatment. Also, the miP-5’UTR-IPA isoforms of PHF20 and PRKAR1B are detected both in the presence and absence of cisplatin. Thus, the novel concept of microprotein-coding IPA isoforms opens wide perspectives, way beyond cisplatin response.

      2. Description of the planned revisions

      REVIEWER #1

      Evidence, reproducibility and clarity

      Microporteins originating from coding and non-coding transcript are increasingly understood to control various cellular processes. In the present study, the authors investigated whether intronic polyadenylation (IPA) contributes to the formation of transcript isoforms encoding microproteins. Using genotoxic stress by cisplatin as a model in cell cultures, the authors detect abundant IPA. IPA in a subset of such transcripts leads to short 5'UTR transcript isoforms that are poorly associated with heavy polysomes and encode microproteins. For PRKAR1B, they demonstrate the expression of a corresponding microprotein and a function in modulating the cisplatin response. Based on depletion experiments of FANCD2 and STX1, the authors propose that impaired transcription processivity after cisplatin is one mechanism leading to IPA and microprotein production.

      While this is an interesting manuscript, I felt the support for the claimed generalization falls a bit short.

      Our response: The generality of the miP-5’UTR-IPA concept is supported by the large-scale analysis that we presented (Fig. 6): indeed, by crossing our 3’-seq data with Ribo-Seq and MS data (both of which originate from multiple cell types and tissues), we identified 156 genes with cisplatin-regulated miP-5’UTR-IPA isoforms. To strengthen this part and highlight the generality of the miP-5’UTR-IPA concept, we will provide the cell type/ tissue distribution of our set of 156 miP-5’UTR-IPA isoforms, by exploiting available 3’-seq datasets from various cells/tissues. (Please also see major point 1 below.)

      Major:

        • If I see it correctly, the authors mainly refer to existing riboSeq data and evidence from mass spectrometry/proteomics to infer the generality of the mechanism (beyond PRKAR1B). It is important to back this up with further experiments and validate this for the set-up used in this manuscript. This concerns the existence of the microproteins but also the downstream functional impact. Our response: In our study, we make the proof of principle of miP-5’UTR-IPA (that is, a microprotein-encoding IPA isoform) for the PRKAR1B gene and its sORF#2 (microprotein detection by WB and IF, functional evidence by siRNA and CRISPR of IPA site and sORF initiation codon). If I understand well (also based on minor point 3 below), this reviewer is requesting further evidence of microprotein existence (in addition to Ribo-Seq and mass spectrometry [MS] data) and function, for a second IPA-derived sORF that we study in this manuscript (either PHF20 sORF or PRKAR1B sORF#1). To the best of our knowledge, a proof of principle for a new concept is usually done on a single gene. Nevertheless, for the miP-5’UTR-IPA isoform of PHF20*, we already provided evidence for its function by using siRNAs (Fig. 3A-C) and for its translation by polysome profiling (Fig. 4C) in addition to Ribo-Seq and MS evidence (Fig. 4A). The fact that for PHF20 we did not detect the transfected Flag-tagged microprotein in HEK cells could be due to several reasons (as discussed on page 16); __we will __try this approach again with different biological conditions (cell lines, stress) or construct designs (as the sORF context may be important).
      1. Also, I wonder is this limited to cisplatin-induced genotoxic stress and the specific cell line used or is this a more global mechanism?*

      Our response: We provided evidence of IPA isoform regulation by cisplatin in two lung cancer cell lines (A549 and H358; Fig. 1A-B) but we agree that our analyses of miP-5’UTR-IPA were mainly done in A549 cells. We will: (i) clarify that we detected the miP-5’UTR-IPA isoforms of PRKAR1B and PHF20 in A549 cells (total cytosol and light polysomes) both in the presence and absence of cisplatin (Fig. 2D, 3A and S3D); (ii) add RT-qPCR validation of their cisplatin regulation in H358 cells; (iii) try to detect the PRKAR1B-encoded microprotein in a second cell line (Fig. 4); (iv) test the impact of PRKAR1B and PHF20 miP-5’UTR-IPA isoforms on cell survival in a second cell line and with a second genotoxic agent; (v) clarify in Fig. 6 that miP-5’UTR-IPA isoforms are regulated by cisplatin in both A549 and H358 cells (our 3’-seq data) and that the Ribo-Seq and MS datasets supporting their translation originate from multiple cell types and tissues without cisplatin treatment; and (vi) provide the cell type/ tissue distribution of our set of 156 miP-5’UTR-IPA isoforms (by exploiting available 3’-seq datasets from various cells/tissues).

      Minor:

        • While the rest of the paper reads well, the abstract could be improved/simplified to increase accessibility* Our response: We will improve the abstract.

      Page 11: Pertaining to Figure 3 and the functional impact: The authors analyze the IPA effect by probing cell viability and cell survival. It would be important to define the effects in further detail, as the mere regulation of cell cycle and/or apoptosis could also result in such outcome (which is then not necessarily a direct cisplatin response). Does this also impact the response to other genotoxic stress (also pertains to the effects studied and shown in Fig. 5)?

      Our response: Because cisplatin effects on cell growth are usually mediated by effects on cell cycle and cell death, we will determine which aspect is impacted by PRKAR1B and PHF20 miP-5’UTR-IPA isoforms, by carrying out FACS analysis of PI/BrdU and Annexin V (both in the presence and absence of cisplatin). As mentioned in major point 2, we will also test the impact of these isoforms on cell survival to a second genotoxic agent.

      Page 12 concerning the microprotein expression: the authors refer to data from other resources to claim that the microproteins are expressed, however they fail to demonstrate this for their setup (at least for 2 out of three they study here). I think this is a weak point as it does not directly support the general claim.

      Our response: Please see major point 1 above.

      Also, I did not understand what the authors intended to demonstrate with the immunoflourescence (Fig. 4E). What should a defined nuclear expression imply versus the diffuse staining throughout after cisplatin? How does this relate to the functional effects?

      Our response: We included in Fig. 4E the observation that the subcellular localization of the PRKAR1B-encoded micropotein is altered in response to cisplatin, because this supports the notion that this micropotein plays a role in cell response to cisplatin. We can remove this data if requested.

      Page 13/Fig. 5E: the different clones of the mATG show very high variability. To my understanding it is difficult to draw a clear conclusion from this heterogeneity.

      Our response: The statistical analysis shows a significant difference between the mATG and Control groups (p Page 15 on the mechanism: SETX has been demonstrated to control poly(A) site choice (PMID: 21700224, 32976578). However the quantitative role of SETX in poly(A) site choice regulation (compared to other regulators) seems to be rather marginal and not strictly unidirectional, i.e after SETX depletion also longer transcript isoforms can be detected (PMID: 32976578). How does this relate to the proposed mechanism of SETX-dependent processivity? Interestingly, from PMID: 32976578 it also appears that PRKAR1A has a 5'UTR poly(A) site that is regulated in a SETX-dependent manner.

      Our response: We will add in the discussion statements that (i) the role of SETX in cisplatin regulation of IPA:LE isoform ratio and processivity might be different from its role in APA regulation in the absence of genotoxic treatment (citing PMID 32976578; keeping in mind that we did not compare them side by side on a genome-wide scale) and (ii) PRKAR1A seems to have a 5'UTR poly(A) site regulated by SETX in TREND-DB (PMID 32976578).

      • Page 16, discussion first paragraph. While refs 1-4 are nice reviews that could be quoted here a study that appeared later represents the most comprehensive analysis to date covering the different facets from transcription to RNA processing and the resulting impact on poly(A) site choice (PMID: 30552333).*

      Our response: We will cite PMID 30552333 and 32976578 as resources of APA regulation by various regulators of gene expression (keeping in mind, however, that for most factors these studies do not exclude indirect effects).

      Significance

      This could be a very significant report, provided the generality of the claims and mechanistic insigths are further strengthend.

      Overall it targets a rather specialized readership. This could be improved by simplifing the abstract, additional experimental evidence for the generality of the proposed mechanism, and a stringent rewording of the main text drawing a clear line, omitting unnecessary details and focussing on the novel findings.

      Our response: Please see our responses above. In addition, we will reword the main text where necessary.

      REVIEWER #2

      Evidence, reproducibility and clarity

      Summary:

      *In this manuscript, Devaux et al. report that the anti-cancer drug cisplatin upregulates intronic polyadenylation (IPA) isoforms in non-small cell lung cancer cell lines. Their finding was based on 3' end sequencing and long-read sequencing. Through polysome profiling they confirmed that many of the IPA isoforms are translated, despite being inefficient in most cases. *

      Our response: There is some misunderstanding here. We will clarify in the text that inefficient association with heavy polysomes is observed for a minority (not the majority) of IPA isoforms. For this, in Fig. 2B and S2A, we will add the information that for the majority of IPA sites, the IPA:LE ratio is not significantly different (neither up or down) between total cytosol and heavy polysomes.

      They validated functions of IPA isoforms from two genes, PHF20 and PRKAR1B, in cell survival upon cisplatin treatment, based on an array of methods, including siRNA knockdown, CRISPR knockout of IPA polyA site, and CRISPR mutation of the start codon. They further found that FANCD2 and Senataxin can regulate cisplatin-mediated IPA activation. The authors advocate a new paradigm of expression of IPA-encoding microproteins in cisplatin-treated cells.

      Our response: We would like to point out that our data indicate that cisplatin upregulation of the IPA:LE isoform ratio is mediated at least in part by an inhibition of transcription processivity (explaining the decrease of LE isoforms), and that we do not claim an ‘IPA activation’ (that is, enhanced used of IPA sites) by cisplatin. This remark is also related to major point 1 below.

      Major comments:

        • While the phenomenon of IPA isoform upregulation by Cisplatin is quite convincing, the underlying mechanism is largely elusive. The authors indicated processivity as a potential mechanism and the effects of FACD2 and Senataxin appear in line with this hypothesis. However, they cannot rule out other possibilities based on the data presented in the manuscript. For example, it is not clear if the elongation rate of Pol II (distinct from its processivity) or nuclear RNA degradation is affected by cisplatin, which could also lead to increased expression of IPA isoforms. In addition, enhanced 3' end processing activity has been previously shown to activate IPA sites. Therefore, the underlying mechanism is mostly speculative. Our response: As explained on page 14, the reason why we focused on transcription processivity is that the cisplatin-induced upregulation of the IPA:LE isoform ratio was enriched in long genes and was accompanied by a decrease of LE isoform levels. Importantly, our data (e.g., for the PHF20 and PRKAR1B genes) indicate that the cisplatin-induced decrease of processivity explains –at least in part– the selective decrease of the LE but not IPA isoform levels and therefore the increase of the IPA:LE isoform ratio; we will clarify this in the manuscript (on pages 14 and 15). Our data also show that cisplatin effects on both processivity and IPA:LE isoform ratio are dependent on FANCD2 and SETX. We agree with the reviewer that we cannot exclude that IPA:LE isoform ratio upregulation by cisplatin might also be mediated in part by additional mechanisms (e.g.*, ‘factors involved in cleavage/ polyadenylation, splicing, transcription elongation and termination, and epigenetic marks’, as mentioned in the discussion on page 16) and we will add nuclear RNA degradation to the list of potential factors. However, we want to emphasize that the role of processivity is not speculative.

      The authors used the polysome:cytosolic ratio to indicate translational efficiency. However, because the CDS size affects the number of ribosomes per mRNA, the translational efficiency should be based on polysome:cytosolic ratio normalized to CDS size. Ideally, the authors should calculate number of ribosome per transcript based on monosome, light polysome and heavy polysome.

      Our response: We cannot normalize ribosome number by CDS size because (i) heavy polysomes are not a precise number of ribosomes and (ii) sORFs are not annotated as CDS.

      The functions of PHF20 and PRKAR1B IPA isoforms are based on knockdown or knockout mutations. Because of its gain-of-function property, overexpression of the isoforms in cisplatin-treated cells would be necessary to definitively confirm their funcitons.

      Our response: For PRKAR1B sORF#2, we ____will carry out overexpression of the sORF microprotein in A549 cells and CRISPR clones and analyze its effects on cell growth and cisplatin survival. We have appropriate constructs for this.

      Minor:

        • Fig. 1H, the numbers of IPA and LE transcripts should be provided. The statistical significance for the difference should also be included.* Our response: The numbers of IPA and LE transcripts were provided in Fig. S1I and we will provide the statistical significance (which is good), as requested.

      Fig. 1I, the image should be accompanied with fold difference as indicated in the text. Some statistics for difference between vehicle only and CisPt only is necessary.

      Our response: We will indicate the fold differences and provide the statistical significance, as requested.

      • Fig. 6, the authors did data mining of ribo-seq data and mass-spec data and identified 156 genes whose IPA isoforms have potentials of protein expression. The enriched GO terms for the 156 IPA genes are different than the overall IPA isoforms shown in Fig. 1C. Does this mean some genes, like those in DNA damage stimulus, produce IPA isoforms with different consequences, such as to inhibit their expression? *

      Our response: We think it is difficult to compare the enriched GO terms between overall IPA and miP-5’UTR-IPA. Indeed, differences could be due in part to trivial reasons (e.g., different number of genes in the lists). As suggested by this reviewer, it could be that for some gene sets enriched in particular functions, IPA may serve to downregulate the expression level of the full-length (canonical) mRNA. We discuss that this may be the case for the PRIM2 gene involved in DNA replication (page 17), but expanding on this would be speculative. Likewise, IPA isoforms encoding carboxy-terminal isoforms of canonical proteins, or IPA isoforms with a noncoding function (like ASCC3 or SPUD), might be enriched in particular gene functions, but again this idea is speculative and it goes beyond the scope of our manuscript.

      In addition, the authors need to use ribo-seq and mass spec data as a validation tool for their polysome profiling data to indicate the reliability of using polysome data to call protein expression.

      Our response: This comment seems to concern those IPA isoforms that are abundant in heavy polysomes. We do not wish to validate protein production from such isoforms, because they are not the focus of our study.

      Significance

      The significance of this work is its novelty in reporting IPA isoform activation by cisplatin. More importantly, some IPA isoform give rise to microproteins that have functional roles in cell survival upon cisplatin treatment.

      Our response: We would like to highlight that the main finding of our manuscript is the discovery that IPA isoforms are a source of microproteins. Cisplatin response is the biological context in which we did the study, and therefore our functional and mechanistic analyses.

      REVIEWER #3

      Evidence, reproducibility and clarity

      Devaux et al. report how cisplatin treatment changes the abundance of mRNA isoforms, favoring the expression of short transcripts originating from intronic polyadenylation (IPA) events relative to the expression of the corresponding mRNA isoform that includes the last annotated exon (LE). To detect IPA events the authors performed 3' end sequencing of polyadenylated mRNAs, long-read sequencing and conventional total RNA sequencing experiments in control and cisplatin treated cells. Analysis of the 3' end sequencing data revealed numerous genes showing an increase in the IPA:LE ratio upon cisplatin treatment, whereas few events with a decreased IPA:LE ratio were detected. Many of the identified events could be corroborated by the long-read sequencing data, sequencing of total RNA, and an existing polyA database. Furthermore, the authors validate IPA:LE ratios for a few selected genes using quantitative PCR. Subsequently, the authors continue to analyze if IPA isoforms are translated with a specific focus on IPA isoforms that do not contain any parts of the LE isoform coding sequence but terminated transcription in what is annotated as 5' untranslated region (UTR). These experiments show that IPA isoforms (including 5' UTR-IPAs) are translated but frequently associated with fewer ribosomes than the corresponding LE isoform. For two selected 5' UTR-IPA isoforms the authors identified potential small open reading frames (sORFs) that could give rise to microproteins with a potential function during cisplatin treatment. siRNA experiments targeting either the 5' UTR-IPA or the LE mRNA isoform of selected genes identified a small but significant differential effect on cell viability upon cisplatin treatment. Similar results were obtained when the endogenous IPA locus was deleted or the start codon of the potential sORF was mutated. Finally, the authors shed some light onto the molecular mechanisms of how cisplatin affects the IPA:LE ratio by decreasing transcription processivity.

      *This is an interesting manuscript suggesting a link between IPA, sORFs and cancer treatment. The manuscript offers valuable datasets as a resource for the research community. While the authors generally present a well-analyzed and validated dataset supporting their claims, some aspects require further evidence or clearer presentation for robustness and reader comprehension. In addition, the manuscript would benefit from improving data visualization and we have several suggestions (see below) on how to make the representation of the data in the figures more appealing to the reader. We encourage the authors to reconsider several of their bar plots and instead plot their data on a continuous axis, e.g. using a scatter plot (fold change versus FDR) instead of a bar chart that can only represent up/down total numbers. *

      Our response: Please see our responses below.

      Main points:

        • We disagree with one of the data interpretations concerning the high polysome (HP) versus total cytosolic polysomes (cytosol) localized IPA and LE mRNA isoforms in the paragraph "A subset of IPA isoforms are depleted in heavy polysomes and terminate in the annotated 5'UTR part of genes". Preferential IPA isoform localization to cytosol versus HP in comparison to the LE isoform does not mean that the IPA isoform translation efficiency is lower than that of the LE isoform. It just reflects the fact that IPA isoform coding sequence is considerably shorter than the coding sequence of the LE isoform (and thus can accommodate fewer ribosomes!). The authors mention that point later in the text but it should already be made clear at this point in the manuscript. They should make sure not to confuse translation efficiency (ribosome density across an open reading frame) and open reading frame length. * Our response: We will modify the text of this section (pages 10-11). We will __state that ‘the HP:cytosol ratio is usually considered as a proxy for translation efficiency’ and __we will only make conclusions in terms of ‘HP:cytosol ratio’ or ‘HP recruitment efficiency’, instead of ‘translation efficiency’ (we had used this term in a few sentences for the sake of simplicity). Please note that these changes will not alter the main conclusion of this part, because both the title (‘a subset of IPA isoforms are depleted in heavy polysomes and terminate in the annotated 5'UTR part of genes’) and the end of this section (page 11), as well as the legend of Fig. 2, were already written in such terms. Thus, in this section, we do not need to discuss ORF length (and we cannot, because sORFs are not annotated as CDS and we introduce sORFs only two sections later [Fig. 4]).
      1. In Figure 5, the authors claim that the "cisplatin survival phenotype of the PRKAR1B 5'UTR-IPA isoform is attributable to its small ORF#2". This is an interesting phenotype but the authors only present a WST1 assay to support these claims. Given that it is an important Figure in their manuscript and links the observations made earlier to cisplatin-induced survival, it would be critical to bolster these claims with additional data, e.g. AnnexinV/PI staining and flow cytometry to distinguish changes in cisplatin-induced apoptosis from proliferation.*

      Our response: We will make the requested experiments with FACS analysis of Annexin V and PI/BrdU to distinguish changes in cisplatin-induced apoptosis from proliferation (cell cycle).

      • Along the same line, it would be important to test the overexpression of the sORF microprotein upon cisplatin treatment. Changes in the mRNA sequence (such as the AUG mutation) could potentially also alter the mRNA structure. It would therefore be critical to show that the sORF microprotein is indeed responsible for the changes in cisplatin-induced viability (for instance by expression of a sORF::P2A::GFP construct). *

      Our response: As requested, we will test whether overexpression of the sORF microprotein can rescue the cisplatin survival phenotype of our PRKAR1B IPA and ATG mutants. We have appropriate constructs for this.

      • Figure 5C: Please show the Western blot of PRKAR1B and GAPDH and not just the quantification. There is plenty of space in Figure 5. *

      Our response: We will show the Western blots for PRKAR1B and GAPDH.

      • In the following, we list suggestions to improve different figures where the data could be more adequately presented:*

      - Figure 1A and B: We suggest representing the data in a scatter plot log fold change on the x-axis and FDR on the y-axis. The authors decided for an FDR cutoff of 10%. This is quite high. Why did the authors decide for this cutoff? How many genes would be identified with a more stringent cutoff (1% for example)? Please list the corresponding FDR values in TableS4.

      Our response: We have never seen in the literature 3’-seq (or related) data of IPA:LE ratio regulation plotted as a scatter plot with log fold change on the x-axis and FDR on the y-axis. Instead, we propose to provide scatter plots with IPA fold change on the x-axis and LE fold change on the y-axis, as in many previous studies. We were not very stringent on the FDR or adjusted p values, in order to reduce the rate of false negatives, because we then cross our lists of regulated IPAs in different compartments (e.g., cytosol and heavy polysomes; Fig. 2C). We provided adjusted p values in Table S4; with an adjusted p value of 1%, we observe 1986 upregulated IPA sites and 33 downregulated ones.

      *-Figure 1C: There are many ways to visualize fold change, p value and number of genes of a GO term analysis. The authors could choose one of the common ways to represent such data instead of just showing raw numbers in a table. *

      Our response: We like showing GO terms as tables, but we can provide a figure if necessary.

      -Figure 1E-G: Add to the figure that PRIM2 was assayed. It is only written in the figure legend.

      Our response: We will write ‘PRIM2’ in the figure.

      *-Figure 2A and B: Same suggestion as for Figure 1A and B, a scatter plot log fold change on the x-axis and FDR on the y-axis would visualize the data much better. *

      Our response: Same response as for Fig 1A-B above.

      -Figure S1B: Where does the number of 2118 cisplatin regulated genes come from? It was not described anywhere else. Should it not be 1987 regulated genes?

      Our response: We will clarify that 2118 is the union of genes with cisplatin upregulated IPA:LE ratio in H358 and/or A549 cells.

      -Figure S1H: Typo in the y-axis.

      Our response: This typo will be corrected, thanks.

      -Figure S2A: Same suggestion as for Figure 1A and B, a scatter plot log fold change on the x-axis and FDR on the y-axis.

      Our response: Same response as for Fig 1A-B above.

      -Figure S3C: If possible, show the plotted digital data of the polysome curves.

      Our response: We do not have digital data for the polysome curves, just the printed graph shown at the bottom of the figure.

      • Data availability: The provided UCSC genome browser link unfortunately does not load the data bam files. Please fix.*

      Our response: We will fix this upon submission to journal.

      Minor points:

      • Please check the text for typos, e.g. page 8: artefacts instead of artifacts. *

      Our response: We will check for typos.

      Significance

      The manuscript describes an interesting link between intronic polyadenylation, sORFs and cancer treatment and will be of interest to the gene expression regulation and RNA communities. As a relatively unknown mechanism to induce sORF-encoded microproteins, the study could lead to follow-up studies tackling intronic polyadenylation and their role in sORF expression.

      Our response: We would like to highlight that IPA was not previously known to induce sORF-encoded microproteins.

      While the authors generally present a well-analyzed and validated dataset, the link between sORF function and cisplatin response will require additional experiments to strengthen the sORF's impact for cellular survival.

      Our response: Please see our responses to main points 2 and 3 above.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      None.

      4. Description of analyses that authors prefer not to carry out

      REVIEWER #2

      Major point #2: The authors used the polysome:cytosolic ratio to indicate translational efficiency. However, because the CDS size affects the number of ribosomes per mRNA, the translational efficiency should be based on polysome:cytosolic ratio normalized to CDS size. Ideally, the authors should calculate number of ribosome per transcript based on monosome, light polysome and heavy polysome.

      Our response: We cannot normalize ribosome number by CDS size because (i) heavy polysomes are not a precise number of ribosomes and (ii) sORFs are not annotated as CDS.

      *Minor point #3: Fig. 6, the authors did data mining of ribo-seq data and mass-spec data and identified 156 genes whose IPA isoforms have potentials of protein expression. The enriched GO terms for the 156 IPA genes are different than the overall IPA isoforms shown in Fig. 1C. Does this mean some genes, like those in DNA damage stimulus, produce IPA isoforms with different consequences, such as to inhibit their expression? *

      Our response: We think it is difficult to compare the enriched GO terms between overall IPA and miP-5’UTR-IPA. Indeed, differences could be due in part to trivial reasons (e.g., different number of genes in the lists). As suggested by this reviewer, it could be that for some gene sets enriched in particular functions, IPA may serve to downregulate the expression level of the full-length (canonical) mRNA. We discuss that this may be the case for the PRIM2 gene involved in DNA replication (page 17), but expanding on this would be speculative. Likewise, IPA isoforms encoding carboxy-terminal isoforms of canonical proteins, or IPA isoforms with a noncoding function (like ASCC3 or SPUD), might be enriched in particular gene functions, but again this idea is speculative and it goes beyond the scope of our manuscript.

      In addition, the authors need to use ribo-seq and mass spec data as a validation tool for their polysome profiling data to indicate the reliability of using polysome data to call protein expression.

      Our response: This comment seems to concern those IPA isoforms that are abundant in heavy polysomes. We do not wish to validate protein production from such isoforms, because they are not the focus of our study.

      REVIEWER #3

      -Figure S3C: If possible, show the plotted digital data of the polysome curves.

      Our response: We do not have digital data for the polysome curves, just the printed graph shown at the bottom of the figure.

    1. Author Response:

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex

      field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are

      hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility

      data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in

      fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K + conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K + channels in bacterial physiology’,Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior',

      E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments pr be the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well- defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysis and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the

      referees raise.

      Critical synopsis of the articles cited by referee 2:

      1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na + instead of H + for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K + compared with 0.0000001 M of H + in E. coli, so K + is arguably a million times more important for the membrane potential than H + and thus the electrophysiology! Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H + . This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K + ) around! In our model Figure 4A is better explained by depolarisation due to K + channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K + . The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees:

      Reviewer #1:

      Summary:<br /> Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:<br /> - The authors report original data.<br /> - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.<br /> - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.<br /> - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.<br /> - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:<br /> - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by

      Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al,‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2:

      Summary of what the authors were trying to achieve:<br /> The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:<br /> The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3:

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      1. An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      2. The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      3. It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      4. The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      5. Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      6. Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      1. In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      2. In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      3. The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    2. Author Response

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K+ conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K+ channels in bacterial physiology’, Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments probe the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well-defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysics and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the referees raise.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary: Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      • The authors report original data.

      • For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      • The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      • The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      • Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      • Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      • Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      Electrical signal propagation is an important aspect of the manuscript. However, a detailed >quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      • Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      • The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      • The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      • Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gKn^4 for potassium, gNam^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C).

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Fig. 3C needs the "still" for the movie of control C. owczarzaki (in Movie S1).

      We have now added a WT control in this figure panel.

      (2) The elongated cell shape is seen infrequently in control cells, and I wonder whether these events are transient inactivation of coHpo or coWts in these cells. Perhaps the authors could comment on this in the discussion.

      This is an interesting possibility and we have now included it in our discussion (Lines 401403).

      (3) Does C. owczarzaki normally aggregate or this is a lab-specific phenotype? For example, the slime mold Dictyostelium discoideum forms aggregates during its life cycle. Could some additional information about C. owczarzaki be added to the introduction?

      Unfortunately little is known about Capsaspora “in the wild”, as it was isolated as an endosymbiont from a laboratory strain of snails. However, some related filasterians isolated from natural environments also show aggregatve ability, indicating that aggregation is in fact a physiological process in this group of organisms. We have updated our introduction to include this fact (Line 78-80).

      Reviewer #2 (Recommendations For The Authors):

      The studies on Hippo signalling in Capsaspora are currently limited to genetic experiments and analysis of Yki/YAP localisation. Biochemical evidence that Co Wts phosphorylates Co Yki/YAP on a conserved serine residue(s) would give important further evidence that this essential signalling step in the animal Hippo pathway is conserved in Capsaspora. However, such experiments require antibodies that detect specific phosphorylation events, which might not be available at present. Is mass spectrometry of the phospho-proteome a potential approach that could be employed to investigate this? The benefit of this approach is it would give information on other Hippo pathway proteins and could be used to probe signalling events under different culture conditions (e.g., aggregate, non-aggregate).

      In response to this recommendation, we attempted to detect Phospho-coWts and PhosphocoHpo using commercial antibodies against mammalian their homologs, in the hope of cross-species reactivity. However, we could not detect a signal by Western blot. Thus better reagents or refinement of techniques beyond the scope of this article may be required to examine the phosphorylation of these Capsaspora proteins. There was a published report of Capsaspora phosphoproteome analysis (Sebe Pedros et al., 2016 Dev Cell), although phosphorylation of the conserved sites on coYki, coWts, and coHpo was not reported in this analysis, suggesting more targeted approaches may be needed to examine phosphorylation of these core Hippo pathway components.

      The following statement that Wts LOF is stronger than Hpo LOF Capsaspora is consistent with overgrowth phenotypes in flies and mammals:

      "Interestingly, we found that coWts-/- cells were significantly more likely to show nuclear mScarlet-coYki localization than coHpo-/- cells (Figure 1D), which is consistent with Hpo/MST independent activity of Wts/LATS previously reported in Drosophila and mammals (Zheng et al., 2015)."

      However, the following statement describes a stronger phenotype in Hpo LOF Capsaspora than Wts LOF:

      "As contractile cells in the coHpo mutant background tended to show a more extreme elongated morphology than the coWts mutant, we focused on the coHpo mutant for further analysis."

      Does this mean that Hpo can regulate actomyosin contractility in both Wts/Yki-dependent and independent manners? A genetic experiment, similar to those that have been performed in Drosophila and mammals could help to address this, e.g., what is the phenotype of Hpo, Yki Capsaspora and Wts, Yki double mutant Capsaspora? Do they phenocopy Yki LOF Capsaspora and are the actomyosin phenotypes associated with Hpo and Wts mutant Capsaspora completely or partially suppressed? The authors indicate that generation of double mutant Capsaspora is not technically possible at present, however.

      Indeed given available techniques the generation of such double mutants is not currently possible. With this phenotype (aberrant cytoskeletal dynamics), it is hard to say what a “stronger” phenotype is, and which mutant has the “stronger” phenotype. We have edited this statement to try and reflect this point (Line 208-209).

      Another outstanding question is whether the Hpo/Wts/Yki-related actomyosin phenotypes are linked to regulation of transcription by Yki, or are regulated non-transcriptionally. Indeed, a non-transcriptional role for Drosophila Yki in promoting actomyosin contractility has been reported (Fehon lab, Dev Cell, 2018). Generation of Scalloped/TEAD mutant Capsaspora would allow this question to be investigated. Alternatively, this could be explored using variant Co Yki transgenes, e.g., one a Co Yki transgene does not form a physical complex with Co Sd/TEAD and a Co Yki transgene that is targeted to the cell cortex.

      To address this point, we tested whether a conserved amino acid residue in coYki (F123) that is required for transcriptional activity of human YAP (in this case, F95) is required for the phenotypic effects of the coYki 4SA mutant. We found that, in contrast to expression of coYki 4SA, expression of a coYki 4SA F123A mutant showed no effect on cell or aggregate morphology. These new results, which support a requirement for transcriptional activity for coYki function, have now been added to Figure 7.

      Reviewer #3 (Recommendations For The Authors):

      Repetition from previous publication:

      (1) ej: last sentences of the abstract in both works: From Phillips et al. eLife 2022;0:e77598: "Taken together, these findings implicate an ancestral role for the Hippo pathway in cytoskeletal dynamics and multicellular morphogenesis predating the origin of animal multicellularity, which was co-opted during evolution to regulate cell proliferation".

      From this manuscript: "Together, these results implicate cytoskeletal regulation but not proliferation as an ancestral function of the Hippo pathway and uncover a novel role for Hippo signaling in regulating cell density in a proliferation-independent manner "

      Our two papers deal with different components of the Hippo pathway: Yorkie/YAP/coYki in Phillips et al. eLife 2022;0:e77598 and upstream kinases in the current paper. The fact that perturbing different components of the pathway leads to similar conclusions actually strengthens the overall conclusion. Nevertheless, to be more clear about the novelty of the current manuscript, we have now changed the current text from “Hippo pathway” to “Hippo kinase cascade”, to emphasize that the current analysis deals with kinases upstream of Yorkie/YAP/coYki (Lines 35, 368-371).

      (2) The authors claim that the change in localization of coYki in Hpo -/- and Wts -/- , being now able to enter the nucleus, is the demonstration that the nuclear regulation of Yki by the Hippo pathway is ancestral to animals. Nevertheless, the authors had already made this claim in their publication of eLife 2022, when they made a mutant version of Yki with the four conserved phosphorylation sites (Sebé-Padrós 2012) mutated. Figure 5 A to F in Phillips et al. eLife 2022;0:e77598. In their words "This regulation of coYki nuclear localization, along with the previous finding that coYki can induce the expression of Hippo pathway genes when expressed in Drosophila (Sebé-Pedrós et al., 2012), suggests that the function of coYki has a transcriptional regulator and Hippo pathway effector is conserved between Capsaspora and animals. ".

      I understand that the localization of Yki in the coHpo-/- and coWts-/- is needed as part of final proof that Hpo and Wts are the kinases that control Yki phosphorylation in C. owczarzaki, but does not constitute a completely new message and should be written like that. Figure 1C of the actual manuscript drives to the same conclusion as Figure 5 A to F in Phillips et al. eLife 2022;0:e77598

      We think that demonstrating that Hippo and Warts orthologs specifically are responsible for regulation of coYki localization is a very important finding: Many unicellular organisms encode Hippo, Warts, and/or Yorkie’s transcriptional factor partner Sd, but not Yorkie. Our understanding is that in these earlier-branching unicellular organisms, the Hippo/Warts kinase module and Sd-like proteins functioned in distinct signaling modules. Thus Yorkie has the interesting property of “fusing” these two distinct signaling modules when it emerged. In this framework, it is interesting to show that this “fusion” occurred in Capsaspora, the most distant known relative of animals with a Yorkie ortholog, indicating that this “fusion” event is very ancient. Although fleshing out of this idea is beyond the scope of this manuscript and we plan to write about it elsewhere, we have modified our discussion to point out the importance that Hippo and Warts specifically are upstream regulators of coYki.

      In Drosophila among the genes transcriptionally regulated by Yki, are the positive regulators of the Hippo pathway in order to down regulate the Yki production.

      (1) The authors don't explain if these upstream regulators of the Hippo pathway are conserved in C. owczarzaki.

      We have now indicated the conservation of some upstream Hippo pathway components (Line 69-71).

      (2) Also it would be important to know how much coYki is being active in the C. owczarzaki in the mutant lines of coHpo-/- and coWts-/- in respect to wt and also in respect to coYki 4SA, and how this is impacting the transcription and protein production of down stream genes of coYki. I think some transcriptional and proteomic data would be informative. At least for those genes related with cytoskeleton.

      We have now performed RNA-seq on the coHpo and coWts mutants to address the concerns above (See Figure 8 and the final section of Results).

      Related with the above. Among the downstream targets of coYki, the authors mentioned in their previous work (Phillips et al. eLife 2022;0:e77598) that B-integrins were up regulated in coYki -/- suggesting that B-integrins could be behind the stronger cell-substrate attachment observed in the coYki-/- mutant. It would be important to investigate if the integrin adhesome is now down regulated and how previous and new results are related to the stronger cellsubstrate attachment in the coHpo-/- and coWts-/- lines. It would be important that previous results on coYki-/-, a mutant line of the same pathway, are discussed in these two new mutant contexts.

      Two Capsaspora integrin beta genes were previously found to be upregulated in the coYki mutant (CAOG_05058 and CAOG_01283, from Phillips et al., 2022 eLife). In our coWts and coHpo mutant RNAseq data, we see that CAOG_05058 is upregulated in both coHpo and coWts mutants, whereas CAOG_01283 does not show significantly different expression in either the coHpo or coWts mutant. Because the CAOG_05058 expression data seems to go in the “opposite” direction than you might expect (i.e. not “down regulated” as the reviewer predicts), and because we see no change in expression in CAOG_01283, these results are difficult to interpret. Therefore the role of integrins in Capsaspora Hippo pathway mutant phenotypes is thus still an open question.

      Some cells from the coHpo-/- and coWts-/- mutant lines, show higher attachment to the substrate, which results in an elongated shape while the cell detaches from the substrate. The authors claim this phenotype as a contractile behavior in these cells. This behavior would be caused by changes in cytoskeleton regulation or increased number of microvilli or a change in the distribution of microvilli.

      (1) In my opinion, this phenotype can not be considered a behavior per se (the cells become round once they are free from the substrate, so the elongation is temporal and the contractile behavior is a consequence from this attachment to the substrate), so I would not say that the Hippo pathway controls a contractile behavior as the authors state as one of the main conclusions of the manuscript.

      Many cell behaviors are known to depend on external conditions, such as substrates, growth factors, nutrients, chemokines, etc., and are therefore “temporal” by the reviewer’s criteria. We therefore feel that the phenotype we describe here can be considered a cell behavior.

      (2) On the other hand I think that further efforts on microscopy or immunocytochemistry could be performed in order to discern among the different causes; more microvilli? change in microvilli distribution? change in the acto-myosin cytoskeleton? Moreover these options are not mutually exclusive and very likely the explanation is multifactorial.

      (3) coWts-/- has a different phenotype at the periphery of the aggregates than coHpo-/-. The authors use stable transfected lines with NMM-Venus to visualize microvilli. It would be interesting that further experiments using this tool would be performed in order to visualize putative differences of the cell membrane at the periphery in the two mutant genotypes.

      We have now performed experiments examining filopodia in round vs elongated cells using the NMM-venus marker, as well as differences in filopodial morphology within aggregates in the different genotypes. Our data and conclusions are included in our updated manuscript (Figure 3- figure supplement 1).

      The authors nicely inspect the consequences of the mutant lines coHpo-/- and coWts-/- in the formation of the aggregates. They find that the aggregates in these cases are more densely packed likely due to the higher attachment from microvilli, which they are able to revert by using myosin inhibitors.

      (1) As mentioned above, it would be interesting that further experiments are performed by using NMM-Venus transfection into the coHpo-/- and coWts-/-genotypes in order to visualize putative differences of the strength and distribution of the microvilli in the aggregates of these two mutant genotypes. These experiments would inform if more or less microvilli contacts are created in these lines and support a mechanical explanation of the denser aggregates in the mutant lines, as they now suggest in the discussion.

      We have now performed these experiments, and our data and conclusions are described in the updated manuscript (Figure 5- figure supplement 1).

      (2) On the other hand, myosin inhibition through blebbistatin increases the number of elongated cells in the mutant lines, demonstrating that myosin is necessary for the cells to resolve their substrate attachment and become round. In my view is confusing that myosin is needed for cells to become round again (wt phenotype) and at the same time myosin inhibition is needed for aggregates to become less dense (wt phenotype). Do they lose density because more elongated cells are now in the aggregate? These results look confusing to me and I think they should be better discussed. Again the above transfections of NMM-Venus into the coHpo-/- and coWts-/-genotypes could be informative.

      We have attempted to detect cells with an “elongated” morphology within WT and mutant aggregates but so far have been unable to visualize such cells. More advanced microscopy techniques at extended time scales may allow us to observe such things, but we believe such studies are beyond the scope of this manuscript.

      The authors do not connect and discuss their results with a very relevant study done in Drosophila, Xu J et al. Dev Cell. 2018; 46(3): 271-284.e5, where a transcriptionally independent role of Yki is characterized. In Drosophila, Yki has an important role in a positive feedback loop with myosin at the cortical part of the cell, which is especially relevant for cytoskeleton regulation.

      The results encountered by the authors in their previous study with coYki-/-, indicated that coYki was important for proper actin dynamics and cell shape in C. owczarzaki. At that moment they did not interrogate if this phenotype could be due to the lack of a possible role of coYki in the cortex and they argue that the phenotype was caused by the lack of transcription regulation of downstream genes of coYki, which actually many were cytoskeleton related.

      Because the cortex function of Yki is independent of regulation of Hpo and Wts, the authors could use these genotypes by comparing them with WT (where the cortical role of Yki should be the same) and coYki-/- to investigate if the cortex role of Yki, is conserved in C. owczarzaki. In Drosophila the cortex role of Yki has been suggested to control tension at the cell surface. Drosophila Yki at the cortex activates myosin II through the N-terminal part of the protein and establishes a positive feedback loop by down regulating the Hippo pathway and obtaining therefore more active DmYki into the nucleus. This mechanism has been proposed by Xu et al. to work as the link between sensing cell tensions at the surface with control of tissue proliferation.

      In my opinion these are relevant results in the field that should be addressed in this study or at least well discussed. Actually, I think they could be a great opportunity for investigating if a putative cortex role of Yki is ancestral to its role linked to the Hippo pathway.

      We have now addressed this study in our manuscript- please see our response to reviewer #2’s last comment above.

      It would be informative to understand how stable expression through hygromicin selection is achieved in the transfection experiments. Is the recombinant plasmid integrated in the genome? Or is it stable as an episome?

      We believe that the plasmids stably integrate, as we never lose fluorescent signal once established in a clonal line, even after extended culturing (>6 months). It may be worthwhile to definitely determine integration vs. episome in future studies.

      The authors do not speculate or discuss how cell tension and cell proliferation is different for a unicellular organism or a tissue (multicellular) and I think should be addressed since the contexts are different.

      This is an interesting and important point, which we plan to discuss in detail in an upcoming review article, as a proper discussion of this idea, we think, is beyond the scope of this manuscript.

      Minor point. The study should cite other unicellular holozoans that have been also developed into treatable organisms such as Monosiga brevicollis (Woznica A, Kumar A, et al 2021eLife 10:e70436) and Abeoforma whisleri (Faktorová, D., Nisbet, R.E.R., Fernández Robledo, J.A. et al. Nat Methods17, 481-494 (2020) in line 83 of the manuscript. I am sure the authors appreciate how much effort there is behind every non-model organism put forward as experimentally treatable and should be properly acknowledged.

      We agree, and we have now included these examples of non-model organism development in our manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank all the reviewers for taking the time to assess and provide valuable feedback on the manuscript. We believe these comments helped clarify the manuscript’s prose, and the suggestions on the functionality and aim of the toolbox were globally incorporated into the following updates of the toolbox. Particularly, we would like to point out some changes that will help all reviewers, independently of their individual comments, to understand the current state of the toolbox and some systematic changes that were made to the manuscript.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which is reflected by changes made throughout the manuscript, most particularly in Figure 3 and Table 1. A beneficial side-effect of this is a much simpler structure for MotorNet which ought to contribute positively toward its usability by researchers in the neuroscience community.

      We also refactored some terminology to be more in line with current computational neuroscience vocabulary:

      • The term “plant”, which comes from industrial engineering and is more niche in neuroscience, has been replaced by “effector”.

      • The term “task” has been replaced by “environment” to match the gymnasium toolbox terminology, which MotorNet is now compatible with. Task objects essentially performed the same function as environment objects from the gymnasium toolbox.

      • The term “controller” has been replaced by “policy” throughout, as this term is more general.

      • The term “motor command” is very specific to the motor control subfield of neuroscience, and therefore is replaced by “action”, which is more commonplace for this modelling component in computational neuroscience and machine learning.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      “The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.”

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      (a) The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not. While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      “Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.”

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      (b) I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      (c) The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      “This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).”

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      “In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.”

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comment.

    1. Author Response

      This important work presents a new methodology for the statistical analysis of fiber photometry data, improving statistical power while avoiding the bias inherent in the choices that are necessarily made when summarizing photometry data. The reanalysis of two recent photometry data sets, the simulations, and the mathematical detail provide convincing evidence for the utility of the method and the main conclusions, however, the discussion of the re-analyzed data is incomplete and would be improved by a deeper consideration of the limitations of the original data. In addition, consideration of other data sets and photometry methodologies including non-linear analysis tools, as well as a discussion of the importance of the data normalization are needed.

      Thank you for the thorough and positive review of our work! We will incorporate this feedback to strengthen the manuscript. Specifically, we plan to revise the Discussion section to include a deeper consideration of the limitations of the original data, a description of the capacities of our method for conducting non-linear analyses, and the role data normalization plays in applicability of our tool.

      Reviewer 1:

      Strengths:

      The framework the authors present is solid and well-explained. By reanalyzing formerly published data, the authors also further increase the significance of the proposed tool opening new avenues for reinterpreting already collected data.

      Weaknesses:

      However, this also leads to several questions. The normalization method employed for raw fiber photometry data is different from lab to lab. This imposes a significant challenge to applying a single tool of analysis.

      Thank you for the positive feedback, we will address your comments in our revision. We agree that any data pre-processing steps will have down-stream consequences on the statistical inference from our method. Note, though, that this would also be the case with standard analysis approaches (e.g., t-tests, correlations) applied to summary measures like AUCs. For that reason, we do not believe that variability in pre-processing is an impediment to widespread adoption of a standard analysis procedure. Rather, we argue that the sensitivity of analysis results to pre-processing choices underscores the need for establishing statistical techniques that reduce the need for pre-processing, and properly account for structure in the data arising from experimental designs. The reviewer brings up an excellent point that we can further elaborate on how our methods actually reduce the need for such pre-processing steps. Indeed, our method provides smooth estimation results along the functional domain (i.e., across trial timepoints), has the ability to adjust for between-trial and -animal heterogeneity, and provides a valid statistical inference framework that quantifies the resulting uncertainty. For example, adjustment for session-to-session variability in signal magnitudes or dynamics could be accounted for, at least in part, through the inclusion of session-level random effects. This heterogeneity would then influence the width of the confidence intervals. This stands in contrast to “sweeping it under the rug” with a pre-processing step that may have an unknown impact on the final statistical inferences. Similarly, the level of smoothing is at least in part selected as a function of the data, and again is accounted for directly in the equations used to construct confidence intervals. In sum, our method provides both a tool to account for challenges in the data, and a systematic framework to quantify the additional uncertainty that accompanies accounting for those data characteristics.

      Does the method that the authors propose work similarly efficiently whether the data are normalized in a running average dF/F as it is described in the cited papers? For example, trace smoothing using running averages (Jeong et al. 2022) in itself may lead to pattern dilution. The same question applies if the z-score is calculated based on various responses or even baselines.

      This is an important question given how common this practice is in the field. Briefly, application of pre-processing steps will change the interpretation of the results from our analysis method. For example, if one subtracts off a pre-trial baseline average from each trial timepoint, then the “definition of 0”, and the interpretation of coefficients and their statistical significance, changes. Similarly, if one scales the signal (e.g., divides the signal magnitude by a trial- or animal-specific baseline), then this changes the interpretation of the FLMM regression coefficients to be in terms of an animal-specific signal unit as opposed to a raw dF/F. This is, however, not specific to our technique, and pre-processing would have a similar influence on, for example, linear regression (and thus t-tests, ANOVAs and Pearson correlations) applied to summary measures. We agree with the reviewer that explicitly discussing this point will strengthen the paper.

      While it is difficult to make general claims about the anticipated performance of the method under all the potential pre-processing steps taken in the field, we believe that most common pre-processing strategies will not negatively influence the method’s performance or validity; they would, instead, change the interpretation of the results. We are releasing a series of vignettes to guide analysts through using our method and, to address your comment, we will add a section on interpretation after pre-processing.

      How reliable the method is if the data are non-stationary and the baselines undergo major changes between separate trials?

      This is an excellent question. We believe the statistical inferences will be valid and will properly quantify the uncertainty from non-stationarities, since our framework does not impose stationarity assumptions on the underlying process. It is worth mentioning that non-stationarity and high trial-to-trial variability may increase variance estimates if the model does not include a rich enough set of covariates to capture the source of the heterogeneity across trial baselines. However, this is a feature of our framework, rather than a bug, as it properly conveys to the analyst that high unaccounted for variability in the signal may result in high model uncertainty. Finally, mixed effects modeling provides a transparent, statistically reasonable, and flexible approach to account for between-session, and between-trial variability, a type of non-stationarity. We agree with the reviewer that this should be more explicitly discussed in the paper, and will do so.

      Finally, what is the rationale for not using non-linear analysis methods? Following the paper's logic, non-linear analysis can capture more information that is diluted by linear methods.

      Functional data analysis assumes that the function varies smoothly along the functional domain (i.e., across trial timepoints). It is a type of non-linear modeling technique over the functional domain since we do not assume a linear model (straight line). Therefore, our functional data analysis approach is able to capture more information that is diluted by linear models. While the basic form of our model assumes a linear change in the signal at a fixed trial timepoint, across trials/sessions, our package allows one to easily model changes with non-linear functions of covariates using splines or other basis functions. One must consider, however, the tradeoff between flexibility and interpretability when specifying potentially complex models.

      Reviewer 2

      Strengths:

      The open-source package in R using a similar syntax as the lme4 package for the implementation of this framework on photometry data enhances the accessibility, and usage by other researchers. Moreover, the decreased fitting time of the model in comparison with a similar package on simulated data, has the potential to be more easily adopted.

      The reanalysis of two studies using summary statistics on photometry data (Jeong et al., 2022; Coddington et al., 2023) highlights how trial-by-trial analysis at each time-point on the trial can reveal information obscured by averaging across trials. Furthermore, this work also exemplifies how session and subject variability can lead to opposite conclusions when not considered.

      Thank you for the positive assessment of our work!

      Weaknesses:

      Although this work has reanalyzed previous work that used summary statistics, it does not compare with other studies that use trial-by-trial photometry data across time-points in a trial.

      As described by the authors, fitting pointwise linear mixed models and performing t-test and Benjamini-Hochberg correction as performed in Lee et al. (2019) has some caveats. Using joint confidence intervals has the potential to improve statistical robustness, however, this is not directly shown with temporal data in this work. Furthermore, it is unclear how FLMM differs from the pointwise linear mixed modeling used in this work.

      We agree with the reviewers that providing more detail about the drawbacks of the approach applied in Lee et al., 2019 will strengthen the paper. We will add an example analysis applying the method proposed by Lee et al., 2019 to show how the set of timepoints at which coefficient estimates reach statistical significance can vary dramatically depending on the sampling rate one subsamples their data at, a highly undesirable property of this strategy. Our approach is robust to this, and still provides a multiple comparisons correction through the joint confidence intervals.

      In this work, FLMM usages included only one or two covariates. However, in complex behavioral experiments, where variables are correlated, more than two may be needed (see Simpson et al. (2023), Engelhard et al. (2019); Blanco-Pozo et al. (2024)). It is not clear from this work, how feasible computationally would be to fit such complex models, which would also include more complex random effects.

      This is a good point. In our experience, the code is still quite fast (often taking seconds to tens of seconds in our experience) on a standard laptop when fitting complex models that include, for example, 10 covariates, or complex random effect specifications on dataset sizes common in fiber photometry. In the manuscript, we included results from simpler models with few covariates in an attempt to show results from the FLMM versions of the standard analyses (e.g., correlations, t-tests) applied in Jeong et al., 2022. Our goal was to show that our method reveals effects obscured by standard analyses even in simple cases. Some of our models did, however, include complex nested random effects (e.g., the models described in Section 4.5.2).

      Like other mixed-model based analyses, our method becomes slower when the number of observations in the dataset is on the order of tens of thousands of observations. However, we coded the methods to be memory efficient so that even these larger analyses can be run on standard laptops. We thank the reviewer for this point, as we worked extremely hard to scale the method to be able to efficiently fit models commonly applied in neuroscience. Indeed, challenges with scalability were one of the main motivations for applying the estimation procedure that we did; in the appendix we show that the fit time of our approach is much faster than existing FLMM software such as the refund package function pffr(), especially for large sample sizes. While pffr() appears to scale exponentially with the number of clusters (e.g., animals), our method appears to scale linearly. We will more explicitly emphasize the scalability in the revision, since we agree this will strengthen the final manuscript.

      Reviewer #3

      Strengths:

      The statistical framework described provides a powerful way to analyze photometry data and potentially other similar signals. The provided package makes this methodology easy to implement and the extensively worked examples of reanalysis provide a useful guide to others on how to correctly specify models.

      Modeling the entire trial (function regression) removes the need to choose appropriate summary statistics, removing the opportunity to introduce bias, for example in searching for optimal windows in which to calculate the AUC. This is demonstrated in the re-analysis of Jeong et al., 2022, in which the AUC measures presented masked important details about how the photometry signal was changing.

      Meanwhile, using linear mixed methods allows for the estimation of random effects, which are an important consideration given the repeated-measures design of most photometry studies.

      Thank you for the positive assessment of our work!

      Weaknesses:

      While the availability of the software package (fastFMM), the provided code, and worked examples used in the paper are undoubtedly helpful to those wanting to use these methods, some concepts could be explained more thoroughly for a general neuroscience audience.

      We appreciate this and, to address your and other reviewers’ comments, we are creating a series of vignettes walking users through how to analyze photometry data with our package. We will include algebraic illustrations to help users gain familiarity with the regression modeling here.

      While the methodology is sound and the discussion of its benefits is good, the interpretation and discussion of the re-analyzed results are poor:

      In section 2.3, the authors use FLMM to identify an instance of Simpson's Paradox in the analysis of Jeong et al. (2022). While this phenomenon is evident in the original authors' metrics (replotted in Figure 5A), FLMM provides a convenient method to identify these effects while illustrating the deficiencies of the original authors' approach of concatenating a different number of sessions for each animal and ignoring potential within-session effects. The discussion of this result is muddled. Having identified the paradox, there is some appropriate speculation as to what is causing these opposing effects, particularly the decrease in sessions. In the discussion and appendices, the authors identify (1) changes in satiation/habitation/motivation, (2) the predictability of the rewards (presumably by the click of a solenoid valve) and (3) photobleaching as potential explanations of the decrease within days. Having identified these effects, but without strong evidence to rule all three out, the discussion of whether RPE or ANCCR matches these results is probably moot. In particular, the hypotheses developed by Jeong et al., were for a random (unpredictable) rewards experiment, whereas the evidence points to the rewards being sometimes predictable. The learning of that predictability (e.g. over sessions) and variation in predictability (e.g. by attention level to sounds of each mouse) significantly complicate the analysis. The FLMM analysis reveals the complexity of analyzing what is apparently a straightforward task design.

      While we are disappointed to hear the reviewer felt our initial interpretations and discussion were poor, the reviewer brings up an excellent point that we had not considered. They have convinced us that acknowledging and elaborating on this alternative perspective will strengthen the paper. We agree that the ANCCR/RPE model predictions were made for unpredictable rewards and, as the reviewer rightly points out, there is evidence that the animals sense the reward delivery. Regardless of the learning theory one adopts (RPE, ANCCR or others), we agree that this (potentially) learned predictability alone could account for the increase in signal magnitude across sessions.

      After reading the reviewer’s comments, we consulted with a number of researchers in this area, and several felt that a CS+ can serve as a reward, within itself. From this perspective, the rewards in the Jeong et al., 2022 experiment might still be considered unexpected. After discussing extensively with the authors of Jeong et al., 2022, it is clear that they went to enormous trouble to prevent the inadvertent generation of a CS+, and it is likely changes in pressure from the solenoid (rather than a sound) that served as a cue. This underscores the difficulty of preventing perception of reward delivery in practice. As this paper is focused on analysis approaches, we feel that we can contribute most thoughtfully to the dopamine–learning theory conversation by presenting both sides.

      Overall, we agree with the reviewer that future experiments will be needed for testing the accuracy of the models’ predictions for random (unpredicted) rewards. While we understand that our attempt to document our conversations with the Jeong et al., 2022 authors may have room for improvement, we hope the reviewer can appreciate that this was done with the best of intentions. We wish to emphasize that we also consulted with several other researchers in the field when crafting the discussion. The Jeong et al., 2022 authors could easily have avoided acknowledging the potential incompleteness of their theory, by claiming that our results do not invalidate their predictions for a random reward, as the reward was not unpredicted in the experiment (as a result of the inadvertent solenoid CS+). Instead, they went out of their way to emphasize that their experiment did test a random reward, and that our results do present problems for their theory. We think that engagement with re-analyses of one’s data, even when findings are inconvenient, is a good demonstration of open science practice. For that reason as well, we feel providing readers with a perspective on the entire discussion will contribute to the scientific discourse in this area.

      Finally, we would like to reiterate that this conversation is happening because our method, by analyzing the signal at every trial timepoint, revealed a neural signal that appears to indicate that the animals sense reward delivery. Ultimately, this was what we set out to do: help researchers ask questions of their data that they could not ask before. We believe that having a demonstration that we can indeed do this for a “live” issue is the most appropriate way of demonstrating the usefulness of the method.

      It is clear the reviewer put a lot of time into understanding what we did, and was very thoughtful about the feedback. We would like to thank the reviewer again for taking such care in reviewing our paper.

      If this paper is not trying to arbitrate between RPE and ANCCR, as stated in the text, the post hoc reasoning of the authors of Jeong et al 2022 provided in the discussion is not germane.

      While we appreciate that the post hoc reasoning of the authors of Jeong et al., 2022 may not seem germane, we would like to provide some context for its inclusion. As statisticians and computer scientists, our role is to create methods, and this often requires using open source data and recreating past analyses. This usually involves extensive conversation with authors about their data and analysis choices because, if we cannot reproduce their findings using their analysis methods, we cannot verify that results from our own methods are valid. As such, we prefer to conduct method development in a collaborative fashion, and we strive to constructively, and respectfully, discuss our results with the original authors. We feel that giving them the opportunity to suggest analyses, and express their point of view if our results conflict with their original conclusions, is important, and we do not want to discourage authors from making their datasets public. As such, we conducted numerous analyses at the suggestion of Jeong et al., 2022 and discussed the results over the course of many months. Indeed the analyses in the Appendix that the reviewer is referring to were conducted at the suggestion of the authors of Jeong et al., 2022, in an attempt to rule out alternative explanations. We nevertheless appreciate that our interpretations of these results can include some of the caveats suggested by the reviewer, and we will strive to improve these sections.

      Arbitrating between the models likely requires new experimental designs (removing the sound of the solenoid, satiety controls) or more complex models (e.g. with session effects, measures of predictability) that address the identified issues.

      We agree with the reviewer that the results suggest that new experimental designs will likely be necessary to adjudicate between models. It is our hope that, by weighing the different issues and interpretations, our paper might provide useful suggestions into what experimental designs would be most beneficial to rule out competing hypotheses in future data collection efforts. We believe that our methodology will strengthen our capacity to design new experiments and analyses. We will make the reviewer’s suggestions more explicit in the discussion by emphasizing the limitations of the original data.

      Of the three potential causes of within-session decreases, the photobleaching arguments advanced in the discussion and expanded greatly in the appendices are not convincing. The data being modeled is a processed signal (ΔF/F) with smoothing and baseline correction and this does not seem to have been considered in the argument.

      We are disappointed to hear that this extensive set of analyses, much of which was conducted at the suggestion of Jeong et al., 2022, was not convincing. We agree that acknowledging any pre-processing would provide useful context for the reader. We do wish to clarify that we analyzed the data that were made available online (raw data was not available). Moreover, for comparison with the authors’ results, we felt it was important to maintain the same pre-processing steps as they did. These conditions were held constant across analysis approaches; therefore, we think that the changes within-trial are likely not influenced substantially by these pre-processing choices. While we cannot speak definitively to the impact any of the processing conducted by the authors had on the results, we believe that it was likely minor, given that the timing of signals at other points in the trial, and in other experiments, were as expected (e.g., the signal rose rapidly after cue onset in Pavlovian tasks).

      Furthermore, the photometry readout is also a convolution of the actual concentration changes over time, influenced by the on-off kinetics of the sensor, which makes the interpretation of timing effects of photobleaching less obvious than presented here and more complex than the dyes considered in the cited reference used as a foundation for this line of reasoning.

      We appreciate the nuance of this point, and we will add it to our discussion. In response to your criticism, we have consulted with more experts in the field regarding the potential for bleaching in this data, and it is not clear to us why photobleaching would be visible in one time-window of a trial, but not at another (less than a second away), despite high dF/F magnitudes in both time-windows. We do wish to point out that, at the request of the authors, we analyzed many experiments from the same animals and in most cases did not observe other indications of photobleaching. Hence, it is not clear to us why this particular set of experiments would garner additional skepticism regarding the potential for photobleaching to invalidate results. While the role of photobleaching may be more complicated with this sensor than others in the references, that citation was included, at the suggestion of Jeong et al., 2022 simply as a way of acknowledging that non-linearities in photobleaching can occur.

      Within this discussion of photobleaching, the characterization of the background reward experiments used in part to consider photobleaching (appendix 7.3.2) is incorrect. In this experiment (Jeong et al., 2022), background rewards were only delivered in the inter-trial-interval (i.e. not between the CS+ and predicted reward as stated in the text). Both in the authors' description and in the data, there is a 6s before cue onset where rewards are not delivered and while not described in the text, the data suggests there is a period after a predicted reward when background rewards are not delivered. This complicates the comparison of this data to the random reward experiment.

      Thank you for pointing this out!! We will remove the parenthetical on page 18 of the appendix that incorrectly stated that rewards can occur between the CS+ and the predicted reward.

      The discussion of the lack of evidence for backpropagation, taken as evidence for ANCCR over RPE, is also weak.

      This point was meant to acknowledge that, although our method yields results that conflict with the conclusions described by Jeong et al., 2022 on data from some experiments, on other experiments our method supports their results. Again, we believe that a critical part of open science is acknowledging both areas where analyses support and conflict with those of the original authors. We agree with the reviewer that qualifying our results so as not to emphasize support for/against RPE/ANCCR will strengthen our paper, and we will make these changes.

      A more useful exercise than comparing FLMM to the methods and data of Jeong et al., 2022, would be to compare against the approach of Amo et al., 2022, which identifies backpropagation (data publicly available: DOI: 10.5061/dryad.hhmgqnkjw). The replication of a positive result would be more convincing of the sensitivity of the methodology than the replication of a negative result, which could be a result of many factors in the experimental design. Given that the Amo et al. analysis relies on identifying systematic changes in the timing of a signal over time, this would be particularly useful in understanding if the smoothing steps in FLMM obscure such changes.

      Thank you for this suggestion, and we agree this could be a useful analysis for the field. Your thoughtful review has convinced us that focusing on our statistical contribution will strengthen the paper, and we will make changes to further emphasize that we are not seeking to adjudicate between RPE/ANCCR. We only had space in the manuscript to include a subset of the analyses conducted on Jeong et al., 2022, and had to relegate the results from the Coddington et al., data to an appendix. Realistically, it would be hard for us to justify analyzing a third dataset. As you may surmise from the one we presented, reanalyzing a new dataset is usually very time consuming, and invariably requires extensive communication with the original authors. We did include numerous examples in our manuscript where we already replicated positive results, in a way that we believe demonstrates the sensitivity of the methodology. We have also been working with five groups at NIH and elsewhere using our approach, in experiments targeting different scientific questions. In fact, one paper that extensively applies our method and compares the results from those yielded by standard analysis of AUCs is already accepted and in press. Hence there should soon be additional demonstrations of what the method can do in less controversial settings. Finally, our forthcoming vignettes include additional analyses, not included in the manuscript, that replicate positive results. We take your point that our description of the data supporting one theory or the other should be qualified, and we will correct that. Again, your review was very thorough, and we appreciate your taking so much time to help us improve our work.

      Reviewer #2 (Recommendations For The Authors):

      First, I would like to commend the authors for the clarity of the paper, and for creating an open-source package that will help researchers more easily adopt this type of analysis.

      Thank you!

      I would suggest the authors consider adding to the manuscript, either some evidence or some intuition on how feasible would be to use FLMM for very complex model specifications, in terms of computational cost and model convergence.

      This is an excellent point and we will make this suggested change in the Methods and Discussion section in the next draft.

      From my understanding, this package might potentially be useful not just for photometry data but also for two-photon recordings for example. If so, I would also suggest the authors add to the discussion this potential use.

      We appreciate your thinking on this point, as it would definitely help expand use of the method. We included a brief point in the Discussion that this package would be useful for other techniques, but we will expand upon this.

      Reviewer #3 (Recommendations For The Authors):

      The authors should define 'function' in context, as well as provide greater detail of the alternate tests that FLMM is compared to in Figure 7. Given the novelty of estimating joint CIs, the authors should be clearer about how this should be reported and how this differs from pointwise CIs (and how this has been done in the past).

      Thank you, this is a very good point and will be critical for helping analysts describe and interpret results. We will add more detail to the Methods section on this point.

      The authors identify that many photometry studies are complex nested longitudinal designs, using the cohort of 8 animals used in five task designs of Jeong et al. 2022 as an example. The authors miss the opportunity to illustrate how FLMM might be useful in identifying the effects of subject characteristics (e.g. sex, CS+ cue identity).

      This is a great suggestion and we will add this important point to the discussion , especially in light of the factorial designs common in neuroscience experiments.

      In discussing the delay-length change experiment, it would be more accurate to say that proposed versions of RPE and ANCCR do not predict the specific change.

      We will make this change and agree this is a better phrasing.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides valuable insights into how the brain parses the syntactic structure of a spoken sentence. A unique contribution of the work is to use a large language model to quantify how the mental representation of syntactic structure updates as a sentence unfolds in time. Solid evidence is provided that distributive cortical networks are engaged for incremental parsing of a sentence, although the contribution could be further strengthened if the authors would further highlight the main results and clarify the benefit of using a large language model.

      We thank the editors for the overall positive assessment. We have revised our manuscript to further emphasize our main findings and highlight the advantages of using a large language model (LLM) over traditional behavioural and corpus-based data.

      This study aims to investigate the neural dynamics underlying the incremental construction of structured interpretation during speech comprehension. While syntactic cues play an important role, they alone do not define the essence of this parsing process. Instead, this incremental process is jointly determined by the interplay of syntax, semantics, and non-linguistic world knowledge, evoked by the specific words heard sequentially by listeners. To better capture these multifaceted constraints, we derived structural measures from BERT, which dynamically represent the evolving structured interpretation as a sentence unfolds word-by-word.

      Typically, the syntactic structure of a sentence can be represented by a context-free parse tree, such as a dependency parse tree or a constituency-based parse tree, which abstracts away from specific content, assigning a discrete parse depth to each word regardless of its semantics. However, this context-free parse tree merely represents the result rather than the process of sentence parsing and does not elucidate how a coherent structured interpretation is concurrently determined by multifaceted constraints. In contrast, BERT parse depth, trained to approach the context-free discrete dependency parse depth, is a continuous variable. Crucially, its deviation from the corresponding discrete parse depth indicates the preference for the syntactic structure represented by this context-free parse. As BERT processes a sentence delivered word-by-word, the dynamic change of BERT parse depth reflects the incremental nature of online speech comprehension.

      Our results reveal a behavioural alignment between BERT parse depth and human interpretative preference for the same set of sentences. In other words, BERT parse depth could represent a probabilistic interpretation of a sentence’s structure based on its specific contents, making it possible to quantify the preference for each grammatically correct syntactic structure during incremental speech comprehension. Furthermore, both BERT and human interpretations show correlations with linguistic knowledge, such as verb transitivity, and non-linguistic knowledge, like subject noun thematic role preference. Both types of knowledge are essential for achieving a coherent interpretation, in accordance with the “constraint-based hypothesis” of sentence processing.

      Motivated by the observed behavioural alignment between BERT and human listeners, we further investigated BERT structural measures in source-localized EEG/MEG using representational similarity analyses (RSA). This approach revealed the neural dynamics underlying incremental speech comprehension on millisecond scales. Our main findings include: (1) a shift from bi-hemispheric lateral frontal-temporal regions to left-lateralized regions in representing the current structured interpretation as a sentence unfolds, (2) a pattern of sequential activations in the left lateral temporal regions, updating the structured interpretation as syntactic ambiguity is resolved, and (3) the influence of lexical interpretative coherence activated in the right hemisphere over the resolved sentence structure represented in the left hemisphere.

      From our perspective, the advantages of using a LLM (or deep language model) like BERT are twofold. Conceptually, BERT structural measures offer a deep contextualized structural representation for any given sentence by integrating the multifaceted constraints unique to the specific contents described by the words within that sentence. Modelling this process on a word-by-word basis is challenging to achieve with behavioural or corpus-based metrics. Empirically, as demonstrated in our responses to the reviewers below, BERT measures show better performance compared to behavioural and corpus-based metrics in aligning with listeners’ neural activity. Moreover, when it comes to integrating multiple sources of constraints for achieving a coherent interpretation, BERT measures also show a better fit with the behavioural data of human listeners than corpus-based metrics.

      Taken together, we propose that LLMs, akin to other artificial neural networks (ANNs), can be considered as computational models for formulating and testing specific neuroscientific hypotheses, such as the “constraint-based hypothesis” of sentence processing in this study. However, we by no means overlook the importance of corpus-based and behavioural metrics. These metrics play a crucial role in interpreting and assessing whether and how ANNs stimulate human cognitive processes, a fundamental step in employing ANNs to gain new insights into the neural mechanisms of human cognition.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors investigate where and when brain activity is modulated by incoming linguistic cues during sentence comprehension. Sentence stimuli were designed such that incoming words had varying degrees of constraint on the sentence's structural interpretation as participants listened to them unfolding, i.e. due to varying degrees of verb transitivity and the noun's likelihood of assuming a specific thematic role. Word-by-word "online" structural interpretations for each sentence were extracted from a deep neural network model trained to reproduce language statistics. The authors relate the various metrics of word-by-word predicted sentence structure to brain data through a standard RSA approach at three distinct points of time throughout sentence presentation. The data provide convincing evidence that brain activity reflects preceding linguistic constraints as well as integration difficulty immediately after word onset of disambiguating material.

      We thank Reviewer #1 (hereinafter referred to as R1) for their recognition of the objectives of our study and the analytical approaches we have employed in this study.

      The authors confirm that their sentence stimuli vary in degree of constraint on sentence structure through independent behavioral data from a sentence continuation task. They also show a compelling correlation of these behavioral data with the online structure metric extracted from the deep neural network, which seems to pick up on the variation in constraints. In the introduction, the authors argue for the potential benefits of using deep neural networkderived metrics given that it has "historically been challenging to model the dynamic interplay between various types of linguistic and nonlinguistic information". Similarly, they later conclude that "future DLMs (...) may provide new insights into the neural implementation of the various incremental processing operations(...)".

      We appreciate R1’s positive comments on the design, quantitative modelling and behavioural validation of the sentence stimuli used in this experiment.

      By incorporating structural probing of a deep neural network, a technique developed in the field of natural language processing, into the analysis pipeline for investigating brain data, the authors indeed take an important step towards establishing advanced machine learning techniques for researching the neurobiology of language. However, given the popularity of deep neural networks, an argument for their utility should be carefully evidenced.

      We fully concur with R1 regarding the need for cautious evaluation and interpretation of deep neural networks’ utility. In fact, this perspective underpinned our decision to conduct extensive correlation analyses using both behavioural and corpus-based metrics to make sense of BERT metrics. These analyses were essential to interpret and validate BERT metrics before employing them to investigate listeners’ neural activity during speech comprehension. We do not in any way undermine the importance of behavioural or corpus-based data in studying language processing in the brain. On the contrary, as evidenced by our findings, these traditional metrics are instrumental in interpreting and guiding the use of metrics derived from LLMs.

      However, the data presented here don't directly test how large the benefit provided by this tool really is. In fact, the authors show compelling correlations of the neural network-derived metrics with both the behavioral cloze-test data as well as several (corpus-)derived metrics. While this is a convincing illustration of how deep language models can be made more interpretable, it is in itself not novel. The correlation with behavioral data and corpus statistics also raises the question of what is the additional benefit of the computational model? Is it simply saving us the step of not having to collect the behavioral data, not having to compute the corpus statistics or does the model potentially uncover a more nuanced representation of the online comprehension process? This remains unclear because we are lacking a direct comparison of how much variance in the neural data is explained by the neural network-derived metrics beyond those other metrics (for example the main verb probability or the corpusderived "active index" following the prepositional phrase).

      From our perspective, a primary advantage of using the neural network-derived metrics (or LLMs as computational models of language processing), compared to traditional behavioural and corpus-based metrics, lies in their ability to offer more nuanced, contextualized representations of natural language inputs. There seems no effective way of computationally capturing the distributed and multifaceted constraints within specific contexts until the current generation of LLMs came along. While it is feasible to quantify lexical properties or contextual effects based on the usage of specific words via corpora or behavioural tests, this method appears less effective in modelling the composition of meanings across more words on the sentence level. More critically, it struggles with capturing how various lexical constraints collectively yield a coherent structured interpretation.

      Accumulating evidence suggests that models designed for context prediction or next-word prediction, such as word2vec and LLMs, outperform classic count-based distributional semantic models (Baroni et al. 2014) in aligning with neural activity during language comprehension (Schrimpf et al. 2021; Caucheteux and King 2022). Relevant to this, we have conducted additional analyses to directly assess the additional variance of neural data explained by BERT metrics, over and above what traditional metrics account for. Specifically, using RSA, we re-tested model RDMs based on BERT metrics while controlling for the contribution from traditional metrics (via partial correlation).

      During the first verb (V1) epoch, we tested model RDMs of V1 transitivity based on data from either the behavioural pre-test (i.e., continuations following V1) or massive corpora. Contrasting sharply with the significant model fits observed for BERT V1 parse depth in bilateral frontal and temporal regions, the two metrics of V1 transitivity did not exhibit any significant effects (see Author response image 1).

      Author response image 1

      RSA model fits of BERT structural metrics and behavioural/corpus-based metrics in the V1 epoch. (upper) Model fits of BERT V1 parse depth (relevant to Appendix 1-figure 10A); (middle) Model fits of the V1 transitivity based on the continuation pre-rest conducted at the end of V1 (e.g., completing “The dog found …”); (bottom) Model fits of the V1 transitivity based on the corpus data (as described in Methods). Note that verb transitivity is quantified as the proportion of its transitive uses (i.e., followed by a direct object) relative to its intransitive uses.

      In the PP1 epoch, which was aligned to the onset of the preposition in the prepositional phrase (PP), we tested the probability of a PP continuation following V1 (e.g., the probability of a PP after “The dog found…”). While no significant results were found for PP probability, we have plotted the uncorrected results for PP probability (Author response image 2). These model fits have very limited overlap with those of BERT parse depth vector (up to PP1) in the left inferior frontal gyrus (approximately at 360 ms) and the left temporal regions (around 600 ms). It is noteworthy that the model fits of the BERT parse depth vector (up to PP1) remained largely unchanged even when PP probability was controlled for, indicating that the variance explained by BERT metrics cannot be effectively accounted for by the PP probability obtained from the human continuation pre-test.

      Author response image 2

      Comparison between the RSA model fits of BERT structural metrics and behavioural / corpusbased metrics in the PP1 epoch. (upper) Model fits of BERT parse depth vector up to PP1 (relevant to Figure 6B in the main text); (middle) Model fits of the probability of a PP continuation in the prerest conducted at the end of the first verb; (bottom) Model fits of BERT parse depth vector up to PP1 after partialling out the variance explained by PP probability.

      Finally, in the main verb (MV) epoch, we tested the model RDM based on the probability of a MV continuation following the PP (e.g., the probability after “The dog found in the park…”). When compared with the BERT parse depth vector (up to MV), we observed a similar effect in the left dorsal frontal regions (see Author response image 3). However, this effect did not survive after the whole-brain multiple comparison correction. Subsequent partial correlation analyses revealed that the MV probability accounted for only a small portion of the variance in neural data explained by the BERT metric, primarily the effect observed in the left dorsal frontal regions around 380 ms post MV onset. Meanwhile, the majority of the model fits of the BERT parse depth vector remained largely unchanged after controlling for the MV probability.

      Note that the probability of a PP/MV continuation reflect participants’ predictions based on speech input preceding the preposition (e.g., “The dog found…”) or the main verb (e.g., “The dog found in the park…”), respectively. In contrast, BERT parse depth vector is designed to represent the structure of the (partial) sentence in the speech already delivered to listeners, rather than to predict a continuation after it. Therefore, in the PP1 and MV epochs, we separately tested BERT parse depth vectors that included the preposition (e.g., “The dog found in…”) and the main verb (e.g., “The dog found in the park was…”) to accurately capture the sentence structure at these specific points in a sentence. Despite the differences in the nature of information captured by these two types of metrics, the behavioural metrics themselves did not exhibit significant model fits when tested against listeners’ neural activity.

      Author response image 3

      Comparison between the RSA model fits of BERT structural metrics and behavioural / corpusbased metrics in the MV epoch. (upper) Model fits of BERT parse depth vector up to MV (relevant to Figure 6C in the main text); (middle) Model fits of the probability of a MV continuation in the pre-rest conducted at the end of the prepositional phrase (e.g., “The dog found in the park …”); (bottom) Model fits of BERT parse depth vector up to MV after partialling out the variance explained by MV probability.

      Regarding the corpus-derived interpretative preference, we observed that neither the Active index nor the Passive index showed significant effects in the PP1 epoch. In the MV epoch, while significant model fits of the passive index were observed, which temporally overlapped with the BERT parse depth vector (up to MV) after the recognition point of the MV, the effects of these two model RDMs emerged in different hemispheres, as illustrated in Figures 6C and 8D in the main text. Consequently, we opted not to pursue further partial correlation analysis with the corpus-derived interpretative preference. Besides, as shown in Figure 8A, 8B and 8C, subject noun thematic role preference and non-directional index exhibit significant model fits in the PP1 or the MV epoch. Interesting, these effects lead corresponding effects of BERT metrics in the same epoch (see Figure 6B and 6C), suggesting that the overall structured interpretation emerges after the evaluation and integration of multifaceted lexical constraints.

      In summary, our findings indicate that, in comparison to corpus-derived or behavioural metrics, BERT structural metrics are more effective in explaining neural data, in terms of modelling both the unfolding sentence input (i.e., incremental BERT parse vector) and individual words (i.e., V1) within specific sentential contexts. This advantage of BERT metrics might be due to the hypothesized capacity of LLMs to capture more contextually rich representations. Such representations effectively integrate the diverse constraints present in a given sentence, thereby outperforming corpus-based metrics or behavioural metrics in this respect. Concurrently, it is important to recognize the significant role of corpus-based / behavioral metrics as explanatory variables. They are instrumental not only in interpreting BERT metrics but also in understanding their fit to listeners’ neural activity (by examining the temporal sequence and spatial distribution of model fits of these two types of metrics). Such an integrative approach allows for a more comprehensive understanding of the complex neural processes underpinning speech comprehension.

      With regards to the neural data, the authors show convincing evidence for early modulations of brain activity by linguistic constraints on sentence structure and importantly early modulation by the coherence between multiple constraints to be integrated. Those modulations can be observed across bilateral frontal and temporal areas as well as parts of the default mode network. The methods used are clear and rigorous and allow for a detailed exploration of how multiple linguistic cues are neurally encoded and dynamically shape the final representation of a sentence in the brain. However, at times the consequences of the RSA results remain somewhat vague with regard to the motivation behind different metrics and how they differ from each other. Therefore, some results seem surprising and warrant further discussion, for example: Why does the neural network-derived parse depth metric fit neural data before the V1 uniqueness point if the sentence pairs begin with the same noun phrase? This suggests that the lexical information preceding V1, is driving the results. However, given the additional results, we can already exclude an influence of subject likelihood for a specific thematic role as this did not model the neural data in the V1 epoch to a significant degree.

      As pointed out by R1, model fits of BERT parse depth vector (up to V1) and its mismatch for the active interpretation were observed before the V1 uniqueness point (Figures 6A and 6D). These early effects could be attributed to the inclusion of different subject nouns in the BERT parse depth vectors. In our MEG data analyses, RSA was performed using all LoTrans and HiTrans sentences. Each of the 60 sentence sets contained one LoTrans sentence and one HiTrans sentence, which resulted in a 120 x 120 neural data RDM for each searchlight ROI across the brain within each sliding time window. Although LoTrans and HiTrans sentences within the same sentence set shared the same subject noun, subject nouns varied across sentence sets. This variation was expected to be reflected in both the model RDM of BERT metrics and the data RDM, a point further clarified in the revised manuscript.

      In contrast, when employing a model RDM constructed solely from the BERT V1 parse depth, we observed model fits peaking precisely at the uniqueness point of V1 (see Appendix 1figure 10). It is important to note that BERT V1 parse depth is a contextualized metric influenced by the preceding subject noun, which could account for the effects of BERT V1 parse depth observed before the uniqueness point of V1.

      Relatedly, In Fig 2C it seems there are systematic differences between HiTrans and LoTrans sentences regarding the parse depth of determiner and subject noun according to the neural network model, while this is not expected according to the context-free parse.

      We thank R1 for pointing out this issue. Relevant to Figure 3D (Figure 2C in the original manuscript), we presented the distributions of BERT parse depth for individual words as the sentence unfolds in Appendix 1-figure 2. Our analysis revealed that the parse depth of the subject noun in high transitivity (HiTrans) and low transitivity (LoTrans) sentences did not significantly differ, except for the point at which the sentence reached V1 (two-tailed twosample t-test, P = 0.05).

      However, we observed a significant difference in the parse depth of the determiner between HiTrans and LoTrans sentences (two-tailed two-sample t-test, P < 0.05 for all results in Appendix 1-figure 2). Additionally, the parse depth of the determiner was found to covary with that of V1 as the input unfolded to different sentence positions (Pearson correlation, P < 0.05 for all plots in Appendix 1-figure 2). This difference, unexpected in terms of the contextfree (dependency) parse used for training the BERT structural probing model, might be indicative of a “leakage” of contextual information during the training of the structural probing model, given the co-variation between the determiner and V1 which was designed to be different in their transitivity in the two types of sentences.

      Despite such unexpected differences observed in the BERT parse depths of the determiner, we considered the two sentence types as one group with distributed features (e.g., V1 transitivity) in the RSA, and used the BERT parse depth vector including all words in the sentence input to construct the model RDMs. Moreover, as indicated in Appendix 1-figure 3, compared to the content words, the determiner contributed minimally to the incremental BERT parse depth vector. Consequently, the noted discrepancies in BERT parse depth of the determiner between HiTrans and LoTrans sentences are unlikely to significantly bias our RSA results.

      "The degree of this mismatch is proportional to the evidence for or against the two interpretations (...). Besides these two measures based on the entire incremental input, we also focused on Verb1 since the potential structural ambiguity lies in whether Verb1 is interpreted as a passive verb or the main verb." The neural data fits in V1 epoch differ in their temporal profile for the mismatch metrics and the Verb 1 depth respectively. I understand the "degree of mismatch" to be a measure of how strongly the neural network's hidden representations align with the parse depth of an active or passive sentence structure. If this is correct, then it is not clear from the text how far this measure differs from the Verb 1 depth alone, which is also indicating either an active or passive structure.

      Within the V1 epoch, we tested three distinct types of model RDMs based on BERT metrics: (1) The BERT parse depth vector, representing the neural network’s hidden representation of the incremental sentence structure including all words up to V1. (2) The mismatch metric for either the Active or Passive interpretation, calculated as the distance between the BERT parse depth vector and the context-free parse depth vector for each interpretation. (3) The BERT parse depth of V1, crucial in representing the preferred structural interpretation of the unfolding sentence given its syntactic role as either a passive verb or the main verb.

      While the BERT parse depth vector per se does not directly indicate a preferred interpretation, its mismatch with the context-free parse depth vectors of the two possible interpretations reveals the favoured interpretation, as significant neural fit is only anticipated for the mismatch with the interpretation being considered. The contextualized BERT depth of V1 is also indicative of the preferred structure given the context-free V1 parse depth corresponding to different syntactic roles, however, compared to the interpretative mismatch, it does not fully capture contributions from other words in the input. Consequently, we expected the interpretative mismatch and the BERT V1 depth to yield different results. Indeed, our analysis revealed that, although both metrics extracted from the same BERT layer (i.e., layer 13) demonstrated early RSA fits in the left fronto-temporal regions, the V1 depth showed relatively more prolonged effects with a notable peak occurring precisely at the uniqueness point of V1 (compare Figure 6C and Appendix 1-figure 10). These complementary results underscore the capability of BERT metrics to align with neural responses, in terms of both an incrementally unfolding sentence and a specific word within it.

      In previous studies, differences in neural activity related to distinct amounts of open nodes in the parse tree have been interpreted in terms of distinct working memory demands (Nelson et al. pnas 2017, Udden et al tics 2020). It seems that some of the metrics, for example the neural network-derived parse depth or the V1 depth may be similarly interpreted in the light of working memory demands. After all, during V1 epoch, the sentences do not only differ with respect to predicted sentence structure, but also in the amount of open nodes that need to be maintained. In the discussion, however, the authors interpret these results as "neural representations of an unfolding sentence's structure".

      We agree with the reviewer that the Active and Passive interpretations differ in terms of the number of open nodes before the actual main verb is heard. Given the syntactic ambiguity in our sentence stimuli (i.e., LoTrans and Hi Trans sentences), it is infeasible to determine the exact number of open nodes in each sentence as it unfolds. Nevertheless, the RSA fits observed in the dorsal lateral frontal regions could be indicative of the varying working memory demands involved in building the structured interpretations across sentences. We have added this perspective in the revised manuscript.

      Reviewer #2 (Public Review):

      This article is focused on investigating incremental speech processing, as it pertains to building higher-order syntactic structure. This is an important question because speech processing in general is lesser studied as compared to reading, and syntactic processes are lesser studied than lower-level sensory processes. The authors claim to shed light on the neural processes that build structured linguistic interpretations. The authors apply modern analysis techniques, and use state-of-the-art large language models in order to facilitate this investigation. They apply this to a cleverly designed experimental paradigm of EMEG data, and compare neural responses of human participants to the activation profiles in different layers of the BERT language model.

      We thank Reviewer #2 (hereinafter referred to as R2) for the overall positive remarks on our study.

      Strengths:

      (1) The study aims to investigate an under-explored aspect of language processing, namely syntactic operations during speech processing

      (2) The study is taking advantage of technological advancements in large language models, while also taking linguistic theory into account in building the hypothesis space

      (3) The data combine EEG and MEG, which provides a valuable spatio-temporally resolved dataset

      (4) The use of behavioural validation of high/low transitive was an elegant demonstration of the validity of their stimuli

      We thank R2 for recognizing and appreciating the motivation and the methodology employed in this study.

      Weaknesses:

      (1) The manuscript is quite hard to understand, even for someone well-versed in both linguistic theory and LLMs. The questions, design, analysis approach, and conclusions are all quite dense and not easy to follow.

      To address this issue, we have made dedicated efforts to clarify the key points in our study. We also added figures to visualize our experimental design and methods (see Figure 1, Figure 3C and Figure 5 in the revised main text). We hope that these revisions have made the manuscript more comprehensible and straightforward for the readers.

      (2) The analyses end up seeming overly complicated when the underlying difference between sentence types is a simple categorical distinction between high and low transitivity. I am not sure why tree depth and BERT are being used to evaluate the degree to which a sentence is being processed as active or passive. If this is necessary, it would be helpful for the authors to motivate this more clearly.

      Indeed, as pointed by R2, the only difference between LoTrans and HiTrans sentences is the first verb (V1), whose transitivity is crucial for establishing an initial preference for either an Active or a Passive interpretation as the sentence unfolds. Nonetheless, in line with the constraint-based approach to sentence processing and supported by previous research findings, a coherent structured interpretation of a sentence is determined by the combined constraints imposed by all words within that sentence. In our study, the transitivity of V1 alone is insufficient to fully explain the interpretative preference for the sentence structure. The overall sentence-level interpretation also depends on the thematic role preference of the subject noun – its likelihood of being an agent performing an action or a patient receiving the action.

      This was evident in our findings, as shown in Author response image 1 above, where the V1 transitivity based on corpus or behavioural data did not fit to the neural data during the V1 epoch. In contrast, BERT structural measures [e.g., BERT parse depth vector (up to V1) and BERT V1 parse depth] offered contextualized representations that are presumed to integrate various lexical constraints present in each sentence. These BERT metrics exhibited significant model fits for the same neural data in the V1 epoch. Besides, a notable feature of BERT is its bi-directional attention mechanism, which allows for the dynamic updating of an earlier word’s representation as more of the sentence is heard, which is also changeling to achieve with corpus or behavioural metrics. For instance, the parse depth of the word “found” in the BERT parse depth vector for “The dog found…” differs from its parse depth in the vector for “The dog found in…”. This feature of BERT is particularly advantageous for investigating the dynamic nature of structured interpretation during speech comprehension, as it stimulates the continual updating of interpretation that occurs as a sentence unfolds (as shown by Figure 7 in the main text). We have elaborated on the rationale for employing BERT parse depth in this regard in the revised manuscript.

      (3) The main data result figures comparing BERT and the EMEG brain data are hard to evaluate because only t-values are provided, and those, only for significant clusters. It would be helpful to see the full 600 ms time course of rho values, with error bars across subjects, to really be able to evaluate it visually. This is a summary statistic that is very far away from the input data

      We appreciate this suggestion from R2. In the Appendix 1 of the revised manuscript, we have provided individual participants’ Spearman’s rho time courses for every model RDM tested in all the three epochs (see Appendix 1-figures 8-10 & 14-15). Note that RSA was conducted in the source-localized E/MEG, it is infeasible to plot the rho time course for each searchlight at one of the 8196 vertices on the cortical surface mesh. Instead, we plotted the rho time course of each ROI reported in the original manuscript. These plots complement the time-resolved heatmap of peak t-value in Figures 6-8 in the main text.

      (4) Some details are omitted or not explained clearly. For example, how was BERT masked to give word-by-word predictions? In its default form, I believe that BERT takes in a set of words before and after the keyword that it is predicting. But I assume that here the model is not allowed to see linguistic information in the future.

      In our analyses, we utilized the pre-trained version of BERT (Devlin et al. 2019) as released by Hugging Face (https://github.com/huggingface). It is noteworthy that BERT, as described in the original paper, was initially trained using the Cloze task, involving the prediction of masked words within an input. In our study, however, we neither retrained nor fine-tuned the pre-trained BERT model, nor did we employ it for word-by-word prediction tasks. We used BERT to derive the incremental representation of a sentence’s structure as it unfolded word-by-word.

      Specifically, we sequentially input the text of each sentence into the BERT, akin to how a listener would receive the spoken words in a sentence (see Figure 3C in the main text). For each incremental input (such as “The dog found”), we extracted the hidden representations of each word from BERT. These representations were then transformed into their respective BERT parse depths using a structural probing model (which was trained using sentences with annotated dependency parse tress from the Penn Treebank Dataset). The resulting BERT parse depths were subsequently used to create model RDMs, which were then tested against neural data via RSA.

      Crucially, in our approach, BERT was not exposed to any future linguistic information in the sentence. We never tested BERT parse depth of a word in an epoch where this word had not been heard by the listener. For example, the three-dimensional BERT parse depth vector for “The dog found” was tested in the V1 epoch corresponding to “found”, while the fourdimensional BERT parse depth vector for “The dog found in” was tested in the PP1 epoch of “in”.

      How were the auditory stimuli recorded? Was it continuous speech or silences between each word? How was prosody controlled? Was it a natural speaker or a speech synthesiser?

      Consistent with our previous studies (Kocagoncu et al. 2017; Klimovich-Gray et al. 2019; Lyu et al. 2019; Choi et al. 2021), all auditory stimuli in this study were recorded by a female native British English speaker, ensuring a neutral intonation throughout. We have incorporated this detail into the revised version of our manuscript for clarity.

      It is difficult for me to fully assess the extent to which the authors achieved their aims, because I am missing important information about the setup of the experiment and the distribution of test statistics across subjects.

      We are sorry for the previously omitted details regarding the experimental setup and the results of individual participants. As detailed in our responses above, we have now included the necessary information in the revised manuscript.

      Reviewer #3 (Public Review):

      Syntactic parsing is a highly dynamic process: When an incoming word is inconsistent with the presumed syntactic structure, the brain has to reanalyze the sentence and construct an alternative syntactic structure. Since syntactic parsing is a hidden process, it is challenging to describe the syntactic structure a listener internally constructs at each time moment. Here, the authors overcome this problem by (1) asking listeners to complete a sentence at some break point to probe the syntactic structure mentally constructed at the break point, and (2) using a DNN model to extract the most likely structure a listener may extract at a time moment. After obtaining incremental syntactic features using the DNN model, the authors analyze how these syntactic features are represented in the brain using MEG.

      We extend our thanks to Reviewer #3 (referred to as R3 below) for recognizing the methods we used in this study.

      Although the analyses are detailed, the current conclusion needs to be further specified. For example, in the abstract, it is concluded that "Our results reveal a detailed picture of the neurobiological processes involved in building structured interpretations through the integration across multifaceted constraints". The readers may remain puzzled after reading this conclusion.

      Following R3’s suggestion, we have revised the abstract and refined our conclusions in the main text to explicitly highlight our principal findings. These include: (1) a shift from bihemispheric lateral frontal-temporal regions to left-lateralized regions in representing the current structured interpretation as a sentence unfolds, (2) a pattern of sequential activations in the left lateral temporal regions, updating the structured interpretation as syntactic ambiguity is resolved, and (3) the influence of lexical interpretative coherence activated in the right hemisphere over the resolved sentence structure represented in the left hemisphere.

      Similarly, for the second part of the conclusion, i.e., "including an extensive set of bilateral brain regions beyond the classical fronto-temporal language system, which sheds light on the distributed nature of language processing in the brain." The more extensive cortical activation may be attributed to the spatial resolution of MEG, and it is quite well acknowledged that language processing is quite distributive in the brain.

      We fully agree with R3 on the relatively low spatial resolution of MEG. Our emphasis was on the observed peak activations in specific regions outside the classical brain areas related to language processing, such as the precuneus in the default mode network, which are unlikely to be artifacts due to the spatial resolution of MEG. We have revised the relevant contents in the Abstract.

      The authors should also discuss:

      (1) individual differences (whether the BERT representation is a good enough approximation of the mental representation of individual listeners).

      To address the issue of individual differences which was also suggested by R2, we added individual participants’ model fits in ROIs with significant effects of BERT representations in Appendix 1 of the revised manuscript (see Appendix 1-figures 8-10 & 14-15).

      (2) parallel parsing (I think the framework here should allow the brain to maintain parallel representations of different syntactic structures but the analysis does not consider parallel representations).

      In the original manuscript, we did not discuss parallel parsing because the methods we used does not support a direct test for this hypothesis. In our analyses, we assessed the preference for one of two plausible syntactic structures (i.e., Active and Passive interpretations) based on the BERT parse vector of an incremental sentence input. This assessment was accomplished by calculating the mismatch between the BERT parse depth vector and the context-free dependency parse depth vector representing each of the two structures. However, we only observed one preferred interpretation in each epoch (see Figures 6D-6F) and did not find evidence supporting the maintenance of parallel representations of different syntactic structures in the brain. Nevertheless, in the revised manuscript, we have mentioned this possibility, which could be properly explored in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Consider fitting the behavioral data from the continuation pre-test to the brain data in order to illustrate the claimed advantage of using a computational model beyond more traditional methods.

      Following R1’s suggestion, we conducted additional RSA using more behavioural and corpusbased metrics. We then directly compared the fits of these traditional metrics to brain data with those of BERT metrics in the same epoch to provide empirical evidence for the advantage of using a computational model like BERT to explain listeners’ neural data (see Appendix 1figures 11-13).

      Clarify the use of "neural representations: For a clearer assessment of the results, please discuss your results (especially the fits with BERT parse depth) in terms of the potential effects of distinct sentence structure expectations on working memory demands and make clear where these can be disentangled from neural representations of an unfolding sentence's structure.

      In the revised manuscript, we have noted the working memory demands associated with the online construction of a structured interpretation during incremental speech comprehension. As mentioned in our response to the relevant comment by R1 above, our experimental paradigm is not suitable for quantitatively assessing working memory demands since it is difficult to determine the exact number of open nodes for our stimuli with syntactic ambiguity before the disambiguating point (i.e., the main verb) is reached. Therefore, while we can speculate the potential contribution of varying working memory demands (which might correlate with BERT V1 parse depth) to RSA model fits, we think it is not possible to disentangle their effects from the neural representation of an unfolding sentence’s structure modelled by BERT parse depths in our current study.

      Please add in methods a description of how the uniqueness point was determined.

      In this study, we defined the uniqueness point of a word as the earliest point in time when this word can be fully recognized after removing all of its phonological competitors. To determine the uniqueness point for each word of interest, we first identified the phoneme by which this word can be uniquely recognized according to CELEX (Baayen et al. 1993). Then, we manually labelled the offset of this phoneme in the auditory file of the spoken sentence in which this word occurred. We have added relevant description of how the uniqueness point was determined in the Methods section of the revised manuscript.

      I found the name "interpretative mismatch" very opaque. Maybe instead consider "preference".

      We chose to use the term “interpretative mismatch” rather than “preference” based on the operational definition of this metric, which is the distance between a BERT parse depth vector and one of the two context-free parse depth vectors representing the two possible syntactic structures, so that a smaller distance value (or mismatch) signifies a stronger preference for the corresponding interpretation.

      In the abstract, the authors describe the cognitive process under investigation as one of incremental combination subject to "multi-dimensional probabilistic constraint, including both linguistic and non-linguistic knowledge". The non-linguistic knowledge is later also referred to as "broad world knowledge". These terms lack specificity and across studies have been operationalized in distinct ways. In the current study, this "world knowledge" is operationalized as the likelihood of a subject noun being an agent or patient and the probability for a verb to be transitive, so here a more specific term may have been the "knowledge about statistical regularities in language".

      In this study, we specifically define “non-linguistic world knowledge” as the likelihood of a subject noun assuming the role of an agent or patient, which relates to its thematic role preference. This type of knowledge is primarily non-linguistic in nature, as exemplified by comparing nouns like “king” and “desk”. Although it could be reflected by statistical regularities in language, thematic role preference hinges more on world knowledge, plausibility, or real-world statistics. In contrast, “linguistic knowledge” in our study refers to verb transitivity, which focuses on the grammatically correct usage of a verb and is tied to statistical regularities within language itself. In the revised manuscript, we have provided clearer operational definitions for these two concepts and have ensured consistent usage throughout the text.

      Please spell out what exactly the "constraint-based hypothesis" is (even better, include an explicit description of the alternative hypothesis?).

      The “constraint-based hypothesis”, as summarized in a review (McRae and Matsuki 2013), posits that various sources of information, referred to as “constraints”, are simultaneously considered by listeners during incremental speech comprehension. These constraints encompass syntax, semantics, knowledge of common events, contextual pragmatic biases, and other forms of information gathered from both intra-sentential and extra-sentential context. Notably, there is no delay in the utilization of these multifaceted constraints once they become available, neither is a fixed priority assigned to one type of constraint over another. Instead, a diverse set of constraints is immediately brought into play for comprehension as soon as they become available as the relevant spoken word is recognized.

      An alternative hypothesis, proposed earlier, is the two-stage garden path model (Frazier and Rayner 1982; Frazier 1987). According to this model, there is an initial parsing stage that relies solely on syntax. This is followed by a second stage where all available information, including semantics and other knowledge, is used to assess the plausibility of the results obtained in the first-stage analysis and to conduct re-analysis if necessary (McRae and Matsuki 2013). In the Introduction of our revised manuscript, we have elaborated on the “constraint-based hypothesis” and mentioned this two-stage garden path model as its alternative.

      Fig1 B&C: In order to make the data more interpretable, could you estimate how many possible grammatical structural configurations there are / how many different grammatical structures were offered in the pretest, and based on this what would be the "chance probability" of choosing a random structure or for example show how many responded with a punctuation vs alternative continuations?

      In our analysis of the behavioural results, we categorized the continuations provided by participants in the pre-test at the offset of Verb1 (e.g., “The dog found/walked …”) into 6 categories, including DO (direct object), INTRANS (intransitive), PP (prepositional phrase), INF (infinitival complement), SC (sentential complement) and OTHER (gerund, phrasal verb, etc.).

      Author response table 1.

      Similarly, we categorized the continuations that followed the offset of the prepositional phrase (e.g., “The dog found/walked in the park …”) into 7 categories, including MV (main verb), END (i.e., full stop), PP (prepositional phrase), INF (infinitival complement), CONJ (conjunction), ADV (adverb) and OTHER (gerund, sentential complement, etc.).

      Author response table 2.

      It is important to note that the results of these two pre-tests, including the types of continuations and their probabilities, exhibited considerable variability between and within each sentence type (see also Figures 2B and 2C).

      Typo: "In addition, we found that BERT structural interpretations were also a correlation with the main verb probability" >> correlated instead of correlation.

      We apologize for this typo. We have conducted a thorough proofreading to identify and correct any other typos present in the revised manuscript.

      "In this regard, DLMs excel in a flexible combination of different types of features embedded in their rich internal representations". What are the "different types", spell out at least some examples for illustration.

      We have rephrased this sentence to give a more detailed description.

      Fig 2 caption: "Same color scheme as in (A)" >> should be 'as in (B)'?, and later A instead of B.

      We are sorry for this typo. We have corrected it in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      My biggest recommendation is to make the paper clearer in two ways: (i) writing style, by hand-holding the reader through each section, and the motivation for each step, in both simple and technical language; (ii) schematic visuals, of the experimental design and the analysis. A schematic of the main experimental manipulation would be helpful, rather than just listing two example sentences. It would also be helpful to provide a schematic of the experimental setup and the analysis approach, so that people can refer to a visual aid in addition to the written explanation. For example, it is not immediately clear what is being correlated with what - I needed to go to the methods to understand that you are doing RSA across all of the trials. Make sure that all of the relevant details are explained, and that you motivate each decision.

      We thank R2 for these suggestions. In the revised manuscript, we have enhanced the clarity of the main text by providing a more detailed explanation of the motivation behind each analysis and the interpretation of the corresponding results. Additionally, in response to R2’s recommendation, we have added a few figures, including the illustration of the experimental design (Figure 1) and methods (see Figure 3C and Figure 5).

      Different visualisation of neural results - The main data result figures comparing BERT and the EMEG brain data are hard to evaluate because only t-values are provided, and those, are only for significant clusters. It would be helpful to see the full 600 ms time course of rho values, with error bars across subjects, to really be able to evaluate it visually.

      In the original manuscript, we opted to present t-value time courses for the sake of simplicity in illustrating the fits of the 12 model RDMs tested in 3 epochs. Following R2’s suggestion, we have included the ROI model fit time courses of each model RDM for all individual participants, as well as the mean model fit time course with standard error in Appendix 1figures 8-10 & 14-15.

      How are the authors dealing with prosody differences that disambiguate syntactic structures, that BERT does not have access to?

      All spoken sentence stimuli were recorded by a female native British English speaker, ensuring a neutral intonation throughout. Therefore, prosody is unlikely to vary systematically between different sentence types or be utilized to disambiguate syntactic structures. Sample speech stimuli have been made available in the following repository: https://osf.io/7u8jp/.

      A few writing errors: "was kept updated every time"

      We are sorry for the typos. We have conducted proof-reading carefully to identify and correct typos throughout the revised manuscript.

      Explain why the syntactic trees have "in park the" rather than "in the park"?

      The dependency parse trees (e.g., Figure 3A) were generated according to the conventions of dependency parsing (de Marneffe et al. 2006).

      Why are there mentions of the multiple demand network in the results? I'm not sure where this comes from.

      The mention of the multiple demand network was made due to the significant RSA fits observed in the dorsal lateral prefrontal regions and the superior parietal regions, which are parts of the multiple demand network. This observation was particularly notable for the BERT parse depth vector in the main verb epoch when the potential syntactic ambiguity was being resolved. It is plausible that these effects observed are partly attributed to the varying working memory demands required to maintain the “opening nodes” in the different syntactic structures being considered by listeners at this point in the sentence.

      Reviewer #3 (Recommendations For The Authors):

      The study first asked human listeners to complete partial sentences, and incremental parsing of the partial sentences can be captured based on the completed sentences. This analysis is helpful and I wonder if the behavioral data here are enough to model the E/MEG responses. For example, if I understood it correctly, the parse depth up to V1 can be extracted based on the completed sentences and used for the E/MEG analysis.

      The behavioural data alone do not suffice to model the E/MEG data. As we elucidated in our responses to R1, we employed three behavioural metrics derived from the continuation pretests. These metrics include the V1 transitivity and the PP probability, given the continuations after V1 (e.g., after “The dog found…”), as well as the MV probability, given the continuations after the prepositional phrase (e.g., after “The dog found in the park…”). These metrics aimed to capture participants’ prediction based on their structured interpretations at various positions in the sentence. However, none of these behavioural metrics yielded significant model fits to the listeners’ neural activity, which sharply contrasts with the substantial model fits of the BERT metrics in the same epochs. Besides, we also tried to model V1 parse depth as a weighted average based on participants’ continuations. As shown in Figure 3A, V1 parse depth is 0 in the active interpretation, 2 in the passive interpretation, while the parse depth of the determiner and the subject noun does not differ. However, this continuation-based V1 parse depth [i.e., 0 × Probability(active interpretation) + 2 × Probability(passive interpretation)] did not show significant model fits.

      Related to this point, I wonder if the incremental parse extracted using BERT is consistent with the human results (i.e., parsing extracted based on the completed sentences) on a sentence-bysentence basis.

      In fact, we did provide evidence showing the alignment between the incremental parse extracted using BERT and the human interpretation for the same partial sentence input (see Figure 4 in the main text and Appendix 1-figures 4-6).

      Furthermore, in Fig 1d, is it possible to calculate how much variance of the 3 probabilities is explained by the 4 factors, e.g., using a linear model? If these factors can already explain most of the variance of human parsing, is it possible to just use these 4 factors to explain neural activity?

      Following R3’s suggestion, we have conducted additional linear modelling analyses to compare the extent to which human behavioural data can be explained by corpus metrics and BERT metrics separately. Specifically, for each of the three probabilities obtained in the pretests (i.e., DO, PP, and MV), we constructed two linear models. One model utilized the four corpus-based metrics as regressors (i.e., SN agenthood, V1 transitivity, Passive index, and Active index), while the other model used BERT metrics as regressors (i.e., BERT parse depth of each word up to V1 from layer 13 for DO/PP probability and BERT parse depth of each word up to the end of PP from layer 14 for MV probability, consistent with the BERT layers reported in Figure 6).

      As shown in the table below, corpus metrics demonstrate a more effective fit than BERT metrics for predicting the DO/PP probability. The likelihood of a DO/PP continuation is chiefly influenced by the lexical syntactic property of V1 (i.e., transitivity), and appears to rely less on contextual factors. Since V1 transitivity is explicitly included as one of the corpus metrics, it is thus expected to align more closely with the DO/PP probability compared to BERT metrics, primarily reflecting transitive versus intransitive verb usage.

      Author response table 3.

      Actually, BERT V1 parse depth was not correlated with V1 transitivity when the sentence only unfolds to V1 (see Appendix 1-figure 6). This lack of correlation may stem from the fact that the BERT probing model was designed to represent the structure of a (partially) unfolded sentence, rather than to generate a continuation or prediction. Moreover, V1 transitivity alone does not conclusively determine the Active or Passive interpretation by the end of V1. For instance, both transitive and intransitive continuations after V1 are compatible with an Active interpretation. Consequently, the initial preference for an Active interpretation (as depicted by the early effects before V1 was recognized in Figure 6D), might be predominantly driven by the animate subject noun (SN) at the beginning of the sentence, a word order cue in languages like English (Mahowald et al. 2023).

      In contrast, when assessing the probability of a MV following the PP (e.g., after “The dog found in the park ...”), BERT metrics significantly outperformed corpus metrics in terms of fitting the MV probability. Although SN thematic role preference and V1 transitivity were designed to be the primary factors constraining the structured interpretation in this experiment, we could only obtain their context-independent estimates from corpora (i.e., considering all contexts). Additionally, despite Active/Passive index (a product of these two factors) are correlated with the MV probability, it may oversimplify the task of capturing the specific context of a given sentence. Furthermore, the PP following V1 is also expected to influence the structured interpretation. For instance, whether “in the park” is a more plausible scenario for people to find a dog or for a dog to find something. Thus, this finding suggests that the corpus-based metrics are not as effective as BERT in representing contextualized structured interpretations (for a longer sentence input), which might require the integration of constraints from every word in the input.

      In summary, corpus-based metrics excel in explaining human language behaviour when it primarily relies on specific lexical properties. However, they significantly lag behind BERT metrics when more complex contextual factors come into play at the same time. Regarding their performance in fitting neural data, among the four corpus-based metrics, we only observed significant model fits for the Passive index in the MV epoch when the intended structure for a Passive interpretation was finally resolved, while the other three metrics did not exhibit significant model fits in any epoch. Note that subject noun thematic role preference did fit neural data in the PP and MV epochs (Figure 8A and 8B). In contrast, the incremental BERT parse depth vector exhibited significant model fits in all three epochs we tested (i.e., V1, PP1, and MV).

      To summarize, I feel that I'm not sure if the structural information BERT extracts reflect the human parsing of the sentences, especially when the known influencing factors are removed.

      Based on the results presented above and, in the manuscript, BERT metrics align closely with human structured interpretations in terms of both behavioural and neural data. Furthermore, they outperform corpus-based metrics when it comes to integrating multiple constraints within the context of a specific sentence as it unfolds.

      Minor issues:

      Six types of sentences were presented. Three types were not analyzed, but the results for the UNA sentences are not reported either.

      In this study, we only analysed two out of the six types of sentences, i.e., HiTrans and LoTrans sentences. The remaining four types of sentences were included to ensure a diverse range of sentence structures and avoid potential adaption the same syntactic structure.

      Fig 1b, If I understood it correctly, each count is a sentence. Providing examples of the sentences may help. Listing the sentences with the corresponding probabilities in the supplementary materials can also help.

      Yes, each count in Figure 2B (Figure 1B in the original manuscript) is a sentence. All sentence stimuli and results of pre-tests are available in the following repository https://osf.io/7u8jp/.

      "trajectories of individual HiTrans and LoTrans sentences are considerably distributed and intertwined (Fig. 2C, upper), suggesting that BERT structural interpretations are sensitive to the idiosyncratic contents in each sentence." It may also mean the trajectories are noisy.

      We agree with R3 that there might be unwanted noise underlying the distributed and intertwined BERT parse depth trajectories of individual sentences. Meanwhile, it is also important to note that the correlation between BERT parse depths and lexical constraints of different words at the same position across sentences is statistically supported.

      References

      Baayen RH, Piepenbrock R, van H R. 1993. The {CELEX} lexical data base on {CD-ROM}. Baroni M, Dinu G, Kruszewski G. 2014. Don't count, predict! A systematic comparison of contextcounting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol 1.238-247.

      Caucheteux C, King JR. 2022. Brains and algorithms partially converge in natural language processing. Communications Biology. 5:134.

      Choi HS, Marslen-Wilson WD, Lyu B, Randall B, Tyler LK. 2021. Decoding the Real-Time Neurobiological Properties of Incremental Semantic Interpretation. Cereb Cortex. 31:233-247.

      de Marneffe M-C, MacCartney B, Manning CD editors. Generating typed dependency parses from phrase structure parses, Proceedings of the 5th International Conference on Language Resources and Evaluation; 2006 May 22-28, 2006; Genoa, Italy:European Language Resources Association. 449-454 p.

      Devlin J, Chang M-W, Lee K, Toutanova K editors. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2019 June 2-7, 2019; Minneapolis, MN, USA:Association for Computational Linguistics. 4171-4186 p.

      Frazier L. 1987. Syntactic processing: evidence from Dutch. Natural Language & Linguistic Theory. 5:519-559.

      Frazier L, Rayner K. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology. 14:178-210.

      Klimovich-Gray A, Tyler LK, Randall B, Kocagoncu E, Devereux B, Marslen-Wilson WD. 2019. Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context. Journal of Neuroscience. 39:519-527.

      Kocagoncu E, Clarke A, Devereux BJ, Tyler LK. 2017. Decoding the cortical dynamics of soundmeaning mapping. Journal of Neuroscience. 37:1312-1319.

      Lyu B, Choi HS, Marslen-Wilson WD, Clarke A, Randall B, Tyler LK. 2019. Neural dynamics of semantic composition. Proceedings of the National Academy of Sciences of the United States of America. 116:21318-21327.

      Mahowald K, Diachek E, Gibson E, Fedorenko E, Futrell R. 2023. Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages. Cognition. 241:105543.

      McRae K, Matsuki K. 2013. Constraint-based models of sentence processing. Sentence processing. 519:51-77.

      Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, Kanwisher N, Tenenbaum JB, Fedorenko E. 2021. The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America. 118:e2105646118.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The brain-machine interface used in this study differs from typical BMIs in that it's not intended to give subjects voluntary control over their environment. However, it is possible that rats may become aware of their ability to manipulate trial start times using their neural activity. Is there any evidence that the time required to initiate trials on high-coherence or low-coherence trials decreases with experience?

      This is a great question. First, we designed the experiment to avoid this possibility. Rats were experienced on the sequence of the automatic maze both pre and post implantation (totaling to weeks of pre-training and habituation). As such, the majority of the trials ever experienced by the rat were not controlled by their neural activity. During BMI experimentation, only 10% of trials were triggered during high coherence states and 10% for low coherence states, leaving ~80% of trials not controlled by their neural activity. We also implemented a pseudo-randomized trial sequence. When considered together, we specifically designed this experiment to avoid the possibility that rats would actively use their neural activity to control the maze.

      Second, we had a similar question when collecting data for this manuscript and so we conducted a pilot experiment. We took 3 rats from experiment #1 (after its completion) and we required them to perform “forced-runs” over the course of 3-4 days, a task where rats navigate to a reward zone and are rewarded with a chocolate pellet. The trajectory on “forced-runs” is predetermined and rats were always rewarded for navigating along the predetermined route. Every trial was initiated by strong mPFC-hippocampal theta coherence. We were curious as to whether time-to-trial-onset would decrease if we repeatedly paired trial onset to strong mPFC-hippocampal theta coherence. 1 out of 3 rats (rat 21-35) showed a significant correlation between time-to-trial onset and trial number, indicating that our threshold for strong mPFC-hippocampal theta coherence was being met more quickly with experience (Figure R1A). When looking over sessions and rats, there was considerable variability in the magnitude of this correlation and sometimes even the direction (Figure R1B). As such, the degree to which rat 21-35 was aware of controlling the environment by reaching strong mPFC-hippocampal theta coherence is unclear, but this question requires future experimentation.

      Author response image 1.

      Strong mPFC-hippocampal theta coherence was used to control trial onset for the entirety of forced-navigation sessions. Time-to-trial onset is a measurement of how long it took for strong coherence to be met. A) Time-to-trial onset was averaged across sessions for each rat, then plotted as a function of trial number (within-session experience on the forced-runs task). Rat 21-35 showed a significant negative correlation between time-to-trial onset and trial number, indicating that time-to-coherence reduced with experience. The rest of the rats did not display this effect. B) Correlation between trial-onset and trial number (y-axis; see A) across sessions (x-axis). A majority of sessions showed a negative correlation between time-to-trial onset and trial number, like what was seen in (A), but the magnitude and sometimes direction of this effect varied considerably even within an animal.

      Is there any evidence that rats display better performance on trials with random delays in which HPC-PFC coherence was naturally elevated?

      This question is now addressed in Extended Figure 5 and discussed in the section titled “strong prefrontal-hippocampal theta coherence leads to correct choices on a spatial working memory task”.

      The introduction frames this study as a test of the "communication through coherence" hypothesis. In its strongest form, this hypothesis states that oscillatory synchronization is a pre-requisite for inter-areal communication, i.e. if two areas are not synchronized, they cannot transfer information. Recent experimental evidence shows this relationship is more likely inverted-coherence is a consequence of inter-areal interactions, rather than a cause. See Schneider et al. (DOI: 10.1016/j.neuron.2021.09.037) and Vinck et al. (10.1016/j.neuron.2023.03.015) for a more in-depth explanation of this distinction. The authors should expand their treatment of this hypothesis in light of these findings.

      Our introduction and discussions have sections dedicated to these studies now.

      Figure 6 - It would be much more intuitive to use the labels "Rat 1", "Rat 2", and "Rat 3"; the "21-4X" identifiers are confusing.

      This was corrected in the paper.

      Figure 6C - The sub-plots within this figure are rather small and difficult to interpret. The figure would be easier to parse if the data were presented as a heatmap of the ratio of theta power during blue vs. red stim, with each pixel corresponding to one channel.

      This suggestion was implemented in the paper. See Fig 6C. Extended Fig. 8 now shows the power spectra as a function of recording shank and channel.

      Ext. Figure 2B - What happens during an acquisition failure? Instead of "Amount of LFP data," consider using "Buffer size".

      Corrected.

      Ext. Figure 2D-E - Instead of "Amount of data," consider using "Window size"

      Referred to as buffer size.

      Ext. Figure 2E - y-axis should extend down to 4 Hz. Are all of the last four values exactly at 8 Hz?

      Yes. Values plateau at 8Hz. These data represent an average over ~50 samples.

      Ext. Figure 2F - consider moving this before D/E, since those panels are summaries of panel F

      Corrected.

      Ext. Figure 4A - ANOVA tells you that accuracy is impacted by delay duration, but not what that impact is. A post-hoc test is required to show that long delays lead to lower accuracy than short ones. Alternatively, one could compute the correlation between delay duration and proportion correctly for each mouse, and look for significant negative values.

      We included supplemental analyses in Extended Fig. 4

      Reviewer #2 (Recommendations For The Authors):

      The authors should replace terms that suggest a causal relationship between PFC-HPC synchrony and behavior, such as 'leads to', 'biases', and 'enhances' with more neutral terms.

      Causal implications were toned down and wherever “leads” or “led” remains, we specifically mean in the context of coherence being detected prior to a choice being made.

      The rationale for the analysis described in the paragraph starting on line 324, and how it fits with the preceding results, was not clear to me. The authors also write at the start of this paragraph "Given that mPFC-hippocampal theta coherence fluctuated in a periodical manner (Extended Fig. 5B)", but this figure only shows example data from 2 trials.

      The reviewer is correct. While we point towards 3 examples in the manuscript now, we focused this section on the autocorrelation analysis, which did not support our observation as we noticed a rather linear decay in correlation over time. As such, the periodicity observed was almost certainly a consequence of overlapping data in the epochs used to calculate coherence rather than intrinsic periodicity.

      Shortly after the start of the results section (line 112), the authors go into a very detailed description of how they validated their BMI without first describing what the BMI actually does. This made this and the subsequent paragraphs difficult to follow. I suggest the authors start with a general description of the BMI (and the general experiment) before going into the details.

      Corrected. See first paragraph of “Development of a closed-loop…”.

      In Figure 2C, as expected, around the onset of 'high' coherence trials, there is an increase in theta coherence but this appears to be very transient. However, it is unclear what the heatmap represents: is it a single trial, single session, an average across animals, or something else? In Figure 3F, however, the increase appears to be much more sustained.

      The sample size was rats for every panel in this figure. This was clarified at the end of Fig. 3.

      In Figure 2D, it was not clear to me what units of measurement are used when the averages and error bars are calculated. What is the 'n' here? Animals or sessions? This should be made clear in this figure as well as in other figures.

      The sample size is rats. This is now clarified at the end of Fig 2.

      Describing the study of Jones and Wilson (2005), the authors write: "While foundational, this study treated the dependent variable (choice accuracy) as independent to test the effect of choice outcome on task performance." (line 83) It was not clear to me what is meant by "dependent" and "independent" here. Explaining this more clearly might clarify how the authors' study goes beyond this and other previous studies.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Reviewer #3 (Recommendations For The Authors):

      As explained in the public review, my comments mainly concern the interpretation of the experimental paradigm and its link with previous findings. I think modifying these in order to target the specific advance allowed by the paradigm would really improve the match between the experimental and analytical data that is very solid and the author's conclusions.

      Concerning the paradigm, I recommend that the authors focus more on their novel ability to clearly dissociate the functional role of theta coherence prior to the choice as opposed to induced by the choice. Currently, they explain by contrasting previous studies based on dependent variables whereas their approach uses an independent variable. I was a bit confused by this, particularly because the task variable is not really independent given that it's based on a brain-driven loop. Since theta coherence remains correlated with many other neurophysiological variables, the results cannot go beyond showing that leading up to the decision it correlates with good choice accuracy, without providing evidence that it is theta coherence itself that enhances this accuracy as they suggest in lines 93-94.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Regarding previous results with muscimol inactivation, I recommend that the authors expand their discussion on this point. I think that their correlative data is not sufficient to conclude as they do that despite "these structures being deemed unnecessary" (based on causal muscimol experiments), they "can still contribute rather significantly" since their findings do not show a contribution, merely a correlation. This extra discussion could include possible explanations of the apparent, and thought-provoking discrepancies that they uncover such as: theta coherence may be a correlate of good accuracy without an underlying causal relation, theta coherence may always correlate with good accuracy but only be causally important in some tasks related to spatial working memory or, since muscimol experiments leave the brain time to adapt to the inactivation, redundancy between brain areas may mask their implication in the physiological context in certain tasks (see Goshen et al 2011).

      The second paragraph of the discussion is now dedicated to this.

      Possible further analysis :

      • In Extended 4A the authors show that performance drops with delay duration. It would be very interesting to see this graph with the high coherence / low coherence / yoked trials to see if the theta coherence is most important for longer trials for example.

      This is a great suggestion. Due to 10% of trials being triggered by high coherence states, our sample size precludes a robust analysis as suggested. Given that we found an enhancement effect on a task with minimal spatial working memory requirements (Fig. 4), it seems that coherence may be a general benefit or consequence of choice processes. Nonetheless, this remains an important question to address in a future study.

      • Figure 6: The authors explain in the text that although the effect of stimulation of VMT is variable, overall VMT activation increased PFC-HPC coherence. I think in the figure the results are only shown for one rat and session per panel. It would be interesting to add a figure including their whole data set to show the overall effect as well as the variability.

      The reviewer is correct and this comment promoted significant addition of detail to the manuscript. We have added an extended figure (Ext. Fig. 9) showing our VMT stimulation recording sessions. We originally did not include these because we were performing a parameter search to understanding if VMT stimulation could increase mPFC-hippocampal theta coherence. The results section was expanded accordingly.

      Changes to writing / figures :

      • The paper by Eliav et al, 2018 is cited to illustrate the universality of coupling between hippocampal rhythms and spikes whereas the main finding of this paper is that spikes lock to non-rhythmic LFP in the bat hippocampus. It seems inappropriate to cite this paper in the sentence on line 65.

      We agree with the reviewer and this citation was removed.

      • Line 180 when explaining the protocol, it would help comprehension if the authors clearly stated that "trial initiation" means opening the door to allow the rat to make its choice. I was initially unfamiliar with the paradigm and didn't figure this out immediately.

      We added a description to the second paragraph of our first results section.

      • Lines 324 and following: the analysis shows that there is a slow decay over around 2s of the theta coherence but not that it is periodical (as in regularly occurring in time), this would require the auto-correlation to show another bump at the timescale corresponding to the period of the signal. I recommend the authors use a different terminology.

      This comment is now addressed above in our response to Reviewer #2.

      • Lines 344: I am not sure why the stable theta coherence levels during the fixed delay phase show that the link with task performance is "through mechanisms specific to choice". Could the authors elaborate on this?

      We elaborated on this point further at the end of “Trials initiated by strong prefrontal-hippocampal theta coherence are characterized by prominent prefrontal theta rhythms and heightened pre-choice prefrontal-hippocampal synchrony”

      • Line 85: "independent to test the effect of choice outcome on task performance." I think there is a typo here and "choice outcome" should be "theta coherence".

      The sentence was removed in the updated draft.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Mice can learn to associate sensory cues (sound and light) with a reward or activation of dopamine neurons in the ventral tegmental area (VTA), and then anticipate the reward from the sensory cue only. Using this paradigm, Harada et al. showed that after learning, the cue is able to induce dopamine release in the projection targets of the VTA, namely the nucleus accumbens and lateral hypothalamus (LH). Within the LH, dopamine release from VTA neurons (either by presentation of the cue or direct optical stimulation of VTA neurons) activates orexin neurons, measured as an increase in intracellular calcium levels.

      Strengths:

      This study utilized genetically encoded optical tools to selectively stimulate dopamine neurons and to monitor dopamine release in target brain areas and the calcium response of orexin neurons. This allowed a direct assessment of the relationship between the behavioral response of the animals, the release of a key neurotransmitter in select brain areas, and its effect on target cells, with a precision previously not possible. The results shed light on the mechanism underlying reward-related learning and expectation.

      Weaknesses: - The Ca increase in orexin neurons in response to optical stimulation of VTA DA neurons is convincing. However, there is an accumulated body of literature indicating that dopamine inhibits orexin neurons through D2 receptors, particularly at high concentrations both directly and indirectly (PMID 15634779, 16611835, 26036709, 30462527; but note that synaptic effects at low conc are excitatory - PMID 30462527, 26036709). There should be a clear acknowledgment of these previous studies and a discussion directly addressing the discrepancy. Furthermore, there are in-vivo studies that investigated the role of dopamine in the LH involving orexin neurons in different behavioral contexts (e.g. PMID 24236888). The statement found in the introduction "whether and how dopamine release modulates orexin neuronal activity has not been investigated vigorously" (3rd para of Introduction) is an understatement of these previous reports.

      We thank the Reviewer for pointing out that we missed several important citations. We added the references mentioned and the discrepancy of concern is addressed in the discussion section

      • Along these lines, previous reports of concentration-dependent bidirectional dopaminergic modulation of orexin neurons suggest that high and low levels of DA would affect orexin neurons differently. Is there any way to estimate the local concentration of DA released by the laser stimulation protocol used in this study? Could there be a dose dependency in the Intensity of laser stimulation and orexin neuron response?

      We agree that this is an interesting point. However, one limitation of our study, and of intensity-based genetically-encoded sensors in general, is that the estimation of the concentration is technically difficult. The sensor effectively reports changes in extra-synaptic levels of neurotransmitters, but to get the absolute value other modalities would be needed such as fast scan voltammetry. This limitation is now included in the discussion section.

      • The transient dip in DA signal during omission sessions in Fig2C (approx 1% decrease from baseline) is similar in amplitude compared to the decrease seen in non-laser trails shown in Fig 1C right panel (although the time course of the latter is unknown as the data is truncated). The authors should clarify whether those dips are a direct effect of the cue itself or indeed reward prediction error.

      Thanks for raising this important point. Indeed, there is a dip of the signal during non-stimulation trials. At day 1, the delivery of the cue triggered a dip and at day 10, there was a slight increase of the signal and followed by the dip. The data is difficult to interpret but our hypothesis is that two components trigger this dip of the signal. One is the aversiveness of the cue. Because a relatively loud sound (90dB) was used for the cue, it would not be surprising if the auditory cue was slightly aversive to the experimental animals. It has been shown that aversive stimuli induce a dip of dopamine in the NAc, although it is specific to NAc subregions. The second component is reward prediction error. Although the non-laser paired cue never triggered the laser stimulation, it is similar to the laser paired one. In a way both are composed of loud tone and same color of the visual cue (spatially different). We think it is possible that reward-related neuronal circuit was slightly activated by the non-laser paired cue. In line with this interpretation, a small increase of the signal was observed at day 10 but not day 1. If our hypothesis is true, since this signal was induced by two components, further analysis is unfortunately difficult.

      • There seem to be orexin-negative-GCaMP6 positive cells (Fig. 4B), suggesting that not all cells were phenotypically orexin+ at the time of imaging.<br /> The proportion of GCaMP6 cells that were ORX+ or negative and whether they responded differently to the stimuli should be indicated.

      While we acknowledge the observation of orexin-negative-GCaMP6 positive cells in Figure 4B, it's important to note that this phenomenon is consistent with the characteristics of the hOX-GCaMP virus used in prior experiments. The virus has undergone thorough characterization, and it has been reported to exhibit over 90% specificity, as demonstrated in prior work conducted in the laboratory of one of our contributing authors (PMID: 27546579). To address the concern raised by the reviewer, we have included Supplemental Figure 4 confirming that all mice consistently exhibited qualitatively similar hOX-GCaMP transients upon dopaminergic terminal stimulation. This additional evidence supports the reliability and specificity of our experimental approach.

      • Laser stimulation of DA neurons at the level of cell bodies (in VTA) induces an increase in DA release within the LH (Fig. 3C, D), however, there is no corresponding Ca signal in orexin neurons (Fig.4C).

      We realized that the figures were not clear and we understood that the reviewer did not see any corresponding Ca signal, but this description is not true. We now added Supplemental Figure 3 to show that there is Ca signal at day 1 already.

      In contrast, stimulating DA terminals within the LH induces a robust, long-lasting Ca signal (> 30s) in orexin neurons (Fig. 5). The initial peak is blocked by raclopride but the majority of Ca signal is insensitive to DA antagonists (please add a positive control or cite references indicating that the dose of antagonists used was sufficient; also the timing of antagonist administration should be indicated).

      This is now included in the discussion section. Also, the timing and dose of the antagonist is now described in the method section.

      Taken together, these results seem to suggest that DA does not directly increase Ca signal in orexin neurons. What could be mediating the remaining component?

      This point has been included in the discussion section.

      • Similarly, there is an elevation of Ca signal in orexin neurons that remains significantly higher after the cue/laser stimulation (Fig. 4F). It appears that it is this sustained component that is missing in omission trials. This can be analyzed further.

      It is true that there is a sustained component in stimulation trials, that is missing in omission trials. Most likely that is evoked by the stimulation of dopamine neurons. We argue that this component is isolated in Fig 5 and analyzed as much as we can.

      • Mice of both sexes were used in this study; it would be interesting to know whether sex differences were observed or not.

      We agree that this is an important point. However, our sample number is not high enough to make a meaningful comparison between male and female.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-written study assessing the role of dopaminergic inputs from the VTA on orexin cell responses in an opto-pavlovian conditioning task. These data are consistent with a possible role of this system in reward expectation and are surprisingly one of the first demonstrations of a role for dopamine in this phenomenon.

      Strengths:

      The study has used an interesting opto-Pavlovian approach combined with fibre photometry.

      Weaknesses:

      It is unclear what n size was used or analysed, particularly for AUC measures e.g. Figures 1 D/E and 3 G. The number of trials reflected and the animal numbers need clarification.

      The sample size is indicated in the legend section.

      The study focused on opto-stim omissions - this work would be significantly strengthened by a comparison to a real-world examination where animals are trained for a radiation reward (food pellet).

      We agree that this would be an important experiment. This experiment is partially done in one of the contributing authors laboratories (doi.org/10.1101/2022.04.13.488195) and would be one of our follow up study.

      Have the authors considered the role of orexin in the opposing situation i.e. a surprise addition of reward?

      That would be an interesting experiment. To do that, natural reward, not optical stimulation, should be used as a reinforcer. This could be part of our follow up study.

      Similarly, there remains some conjecture regarding the role of these systems in reward and aversion - have the authors considered aversive learning paradigms - fear, or fear extinction - to further explore the roles of this system? There are some (important) discussions about the possible role of orexin in negative reinforcement. Further studies to address this could be warranted.

      It is true that dopamine also plays a significant role in aversive learning. Therefore, this would be an interesting experiment. The discussion section now includes this point.

      I think some further discussion of the work by Lineman concerning the interesting bidirectional actions of d1/d2 r signalling on glutamatergic transmission onto orexin neurons is worthwhile. While this work is currently cited, the nuance and perhaps relevance to d1 and d2 signalling could be contextualised a little more (https://doi.org/10.1152/ajpregu.00150.2018).

      Thanks for the suggestion. The discussion has been expanded.

      Reviewer #3 (Public Review):

      Summary:

      Harada and colleagues describe an interesting set of experiments characterizing the relationship between dopamine cell activity in the ventral tegmental area (VTA) and orexin neuron activity in the lateral hypothalamus (LH). All experiments are conducted in the context of an opto-Pavlovian learning task, in which a cue predicts optogenetic stimulation of VTA dopamine neurons. With training, cues that predict DA stimulation come to elicit dopamine release in LH (a similar effect is seen in accumbens). After training, omission trials (cue followed by no laser) result in a dip (inhibition) of dopamine release in LH, characteristic of reward prediction error observed in the striatum. Across cue training, the activity pattern of orexin neurons in LH mirrors that of LH DA levels. However, unlike the DA signal, orexin neurons do not exhibit a decrease in activity in omission trials. Systemic blockade of D2 but not D1 receptors blocked DA release in LH following VTA DA cell stimulation.

      Strengths: Although much work has been dedicated to examining projections from orexin cells to VTA, less has been done to characterize reciprocal projections and their function. In this way, this paper is a very important addition to the literature. The experiments are technically sound (with some limitations, below) and utilize sophisticated approaches, the manuscript is nicely written, and the conclusions are mostly reasonable based on the data collected.

      Weaknesses:

      I believe the impact of the paper could be enhanced by considering and/or addressing the following:

      Major:

      • I encourage the authors to discuss in the Introduction previous work on DA regulation of orexin neurons. In particular, the authors cite, but do not describe in any detail, the very relevant Linehan paper (2019; Am J Physiol Regul) which shows that DA differentially alters excitatory/inhibitory input onto orexin neurons and that these actions are reversed by D1 vs D2 receptor antagonists. Another paper (Bubser, 2005, EJN) showed that dopamine agonists increase the activity of orexin neurons and that these effects are blocked by D1/D2 antagonists. The current findings should be discussed in the context of these (and any other relevant) papers in the Discussion, too.

      Thanks for the valuable suggestion. This point has been integrated and the introduction and discussion sections have been revised carefully.

      • In the Discussion, the authors provide two (plausible) explanations for why they did not observe a dip in the calcium signal of orexin neurons during omission trials. Is it not possible that these cells do not encode for this type of RPE?

      We completely agree that it is possible. Now our current hypothesis is that dopamine in the LH encodes RPE and that information is transmitted to orexin neurons. Orexin neurons integrate other information and encode something else, we call it ‘multiplexed cognitive information’. It is still open question what this means exactly. This point is now mentioned in the discussion section.

      • Related to the above - I am curious about the authors' thoughts on why there is such redundancy in the system. i.e. why is dopamine doing the same thing in NAC and LH in the context of cue-reward learning?

      Thank you for the question. This is an important point, indeed. Our current hypothesis is described in the discussion section.

      ’Our data indicate that dopamine in both the NAc and LH encodes reward prediction error (RPE). One open question is the existence of such a redundant mechanism. We hypothesize that dopamine in the LH boosts dopamine release via a positive feedback loop between the orexin and dopamine systems. It has already been established that some orexin neurons project to dopaminergic neurons in the VTA, positively modulating firing. On the other hand, our data indicate that dopamine in the LH stimulates orexinergic neurons. These collective findings suggest that when either the orexin or dopamine system is activated, the other system is also activated consequently. Although the current findings align with this idea, the hypothesis should be carefully challenged and scrutinized.’

      • The data, as they stand, are largely correlative and do not indicate that DA recruitment of orexin neurons is necessary for learning to occur. It would be compelling if blocking the orexin cell recruitment affected some behavioral outcomes of learning. Similarly - does raclopride treatment across training prevent learning?

      We appreciate the insightful comment. It is indeed a limitation of our study that we lack behavioral data. However, given the extensive previous research on the crucial role of orexin in motivated behavior, we argue that establishing dopaminergic regulation of the orexin system itself is a valuable contribution. This perspective is thoroughly discussed in the dedicated section of our paper. It's important to note that the injection of D2 antagonists, including raclopride, is known to induce significant sedation. Due to this sedative effect, combining behavioral experiments with these drugs poses considerable challenges.

      • Only single doses of SCH23390 and raclopride were used. How were these selected? It would be nice to use more of a dose range to show that 1) and effect of D1R blockade was not missed, and 2) that the reduction in orexin signal with raclopride was dose-dependent.

      The rationale of the dose has been added to the discussion session. It is reported that these doses block dopamine receptors. We agree that it would be nice to have a dose-response curve, we are reluctant to increase the doses to avoid adverse effect to the experimental animals. The doses we used effectively induced hypo-locomotion, although data is not shown.

      • Fig 1C, could the effect the authors observed be due to movement?

      We argue this is unlikely. We recorded two channels one for the control and the other one for the signal. The motion-related artifact is corrected based on the control channel. One example trace around the laser stimulation is shown below. Please note that a typical motion-related artifact is a fast dip of the signal, normally observed in both 405 and 465 nm channels.

      Relatedly, what was the behavior like when the cue was on? Did mice orient/approach the cue?

      Although it has been reported that rats approach the cue (PMID: 30038277) in a similar task, it was not obvious in our case. It could be because we used both visual and auditory cues. Mice showed a general increase of locomotion during the cue and the stimulation but the direction was not clear to the experimenter.

      Also, when does the learning about the cue occur? Does it take all 10 days of learning or does this learning/cue-induced increase in dopamine signaling occur in less than 10 days?

      It is hard to say when the learning occurs. When we look at the learning curve of Figures 1,3 and 4, it seems the response to the cue plateaus at day 5 but since we don’t have behavioral data, the assessment is relayed only on the neuronal signal.

      • Also related to the above, could the observed dopamine signal be a result of just the laser turning on? It would seem important to include mice with a control sensor.

      We recorded two channels, 405 nm and 465 nm wavelength. 405 nm signal did not show increase of the signal while 465 nm signal did. The example trace is shown. Besides, the sensor has been characterized by the corresponding author already so we argue that this is unlikely.

      Author response image 1.

      Fig 1E, the effect seems to be driven by one mouse which looks like it could be a statistical outlier. The inclusion of additional animals would make these data more compelling.

      We agree that adding more mice would make data more compelling. However, considering the fact that dopamine in the accumbens has been investigated vigorously and our data is in line with the prior studies, we argue that we have enough data to claim our conclusion.

      • For Fig 1C, 3D, 3F, and 4D, could the authors please show the traces for the entire length of laser onset? It would be helpful to see both the rise and the fall of dopamine signals.

      For Fig 1C, one panel has been added. For fig 3, 4, supplemental figure was created to show the signal around laser stimulation.

      • Fig 2C, could the authors comment on how they compared the AUC to baseline? Was this comparison against zero? Because of natural hills and troughs during signals prior to cue (which may not equate to a zero), comparing the omission-induced dip to a zero may not be appropriate. A better baseline might be using the signals prior to the cue.

      The signal immediately before the cue onset was considered as a baseline, and baseline was subtracted. This means zero and baseline would be the same in our way of analysis.

      • Could the authors comment on how they came up with the 4-5.3s window to observe the AUC in Fig 3H?

      Since the kinetic of dopamine in the NAc and LH is different, different time windows have been used to observed a dip of dopamine. The analysis of the kinetics has been added.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific feedback to the authors

      • Sample size for each experiment/group could not be found.

      The sample size is now included in the legends.

      • In most figures, the timing of onset for the cue and laser stimulation is unclear. This makes the data interpretation difficult. They should be labeled as in Fig. 3C, for example.

      Panels have been updated to address this point.

      • Please provide the rationale for selecting the time range for the measurement of AUC for different experiments (e.g. Fig. 2C, 3H, 4A, 5F).

      The kinetics of dopamine in NAc and LH are different. This is now shown in the new Supplemental Figure 2. Based on this difference, the different window was chosen.

      • Fig. 1E, 3G right, 4E right: statistical analysis should use two-way repeated measures ANOVA rather than one-way ANOVA. Fig 1D, 3G left and 4E left panels can also be analyzed by two-way repeated measures ANOVA.

      We realized that those panels were redundant. Some panels have been removed and the analysis has been conducted according to this point.

      Minor comments:

      Fig. 2C can also show non-omission trials as a comparison.

      The panel has been updated.

      • The term "laser cue" is confusing, as the cue itself does not involve a laser.

      ’Laser-paired cue’ is used instead.

      • Color contrast can be improved for some figures, including Fig. 2C right, Fig. 3H right, and green and blue fluorescent fonts.

      The panels have been updated.

      • Figure legends: Tukey's test, rather than Tekey's test.

      This has been fixed.

      • There are some long-winded sentences that are hard to follow.

      Edited.

      • p.2, line 11 from bottom: should read ...the VTA evokes the release of dopamine.

      Edited

      • p.3, line 9: remove e from release.

      This has been addressed.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      • When discussing the understudied role of dopamine in brain regions other than the striatum in the Introduction, it might be helpful to cite this article: https://elifesciences.org/articles/81980 where the authors characterize dopamine in the bed nucleus of stria terminalis in associative behaviors and reward prediction error.

      The discussion session has been updated accordingly.

      • In the Discussion, it might be better to refrain from describing the results as 'measuring dopamine release' in the LH. Since there was no direct detection of dopamine release, rather a dopamine binding to the dLight receptors, referring to the detection as dopamine signaling/binding/transients is a better alternative.

      This point has been addressed.

      • In the Discussion, without measuring tonic dopamine release, it is difficult to say that there was a tonic dopamine release in the LH prior to negative RPE. In addition, I wouldn't describe the negative RPE as silencing of dopamine neurons projecting to the LH since this was not directly measured and it is hard to say for sure if the dip in dopamine is caused by silencing of the neurons. There certainly seems to be a reduction in extra-synaptic dopamine signaling in LH, however, what occurs upstream is unknown.

      We respectfully disagree with this point. In our opinion, the dopamine transient is more important than the firing of dopamine neurons because what matters for downstream neurons is dopamine concentration. For example, administration of cocaine increases the dopamine concentration extra-synaptically via blockade of DAT, while the firing of dopamine neurons go down via activation of D2 receptors expressed in dopamine neurons. Administration of cocaine is not known to induce negative RPE.

      • Typo at multiple places: 'Tekey's multiple comparison test'.

      This has been fixed.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      "Expanding the Drosophila toolkit for dual control of gene expression" by Zirin et al. aims to develop resources for simultaneous independent manipulation of multiple genes in Drosophila. The authors use CRISPR knock-ins to establish a collection of T2A-LexA and T2A-QF2 transgenes with expression patterns in a number of commonly studied organs and tissues. In addition to the transgenic lines that are established, the authors describe a number of plasmids that can be used to generate additional transgenes, including a plasmid to generate a dual insert of LexA and QF that can be resolved into a single insert using FLP/FRT-mediated recombination, and plasmids to generate RNAi reagents for the LexA and QF systems. Finally, the authors demonstrate that a subset of the LexA and QF lines that they generated can induce RNAi phenotypes when paired with LexAop or QUAS shRNA lines. In general, the claims of the paper are well supported by the evidence and the authors do a thorough job of validating the transgenic lines and characterizing their expression patterns.

      Strengths:

      • Numerous Gal4 lines allow for highly specific genetic manipulation in a wide range of organs and tissues, however, similar tissue-specific drivers using alternative binary expression systems are not currently well developed. This study provides a large number of tissue and organ-specific LexA and QF2 driver lines that should be broadly useful for the Drosophila community.

      • While a minority of the driver lines do not express the expected pattern (likely due to cryptic regulatory elements in the LexA or QF2 sequences), the ability to generate drivers using two different Gal4 alternatives mitigates this issue (as in nearly all cases at least one of the two systems produces a clean driver line with the expected expression pattern).

      • The use of LexA-GAD provides an additional degree of control as it is subject to Gal80 repression. This could prove to be particularly useful in cases where a researcher wishes to manipulate multiple genes using Gal4 and LexA-GAD drivers as the Gal80(ts) system could be used for simultaneous temporal control of both constructs.

      • The use of Fly Cell Atlas information to generate novel oenocyte-specific driver lines provides a useful proof-of-concept for constructing additional highly tissue-specific drivers.

      Weaknesses:

      • Since these reagents will most commonly be paired with existing Gal4 lines, adding information about corresponding Gal4 lines targeting these tissues and how faithfully the LexA and QF2 lines recapitulate these Gal4 patterns would be highly beneficial.

      It is outside the scope of this paper to analyze the expression patterns of the corresponding publicly available Gal4 lines. It is clear from the tissue specificity of the LexA-GAD and QF2 lines that they are expressed in the expected larval tissues based on the target genes. We have added a sentence in the discussion section noting “Further, we expect that there will also be differences between the expression pattern of corresponding Gal4 and the LexA-GAD/QF lines, as the latter were made by knock-in, while the former are often enhancer traps. However, based on our larval mounts and dissections, the stocks generated in this paper are highly specific to the expression pattern of the targeted genes.”

      • It is not stated in the manuscript if these transgenic lines and plasmids are currently publicly available. Information about how to obtain these reagents through Bloomington, Addgene, or TRiP should be added to the manuscript.

      We have added to the materials section that “All vectors described here that are required to produce new driver lines will be made available at Addgene.” And “All transgenic fly stocks described here will be made available at the Bloomington Drosophila Stock Center.”

      Reviewer #2 (Public Review):

      Zirin, Jusiak, and Lopes et al presented an efficient pipeline for making LexA-GAD and QF2 drivers. The tools can be combined with a large collection of existing GAL4 drivers for a dual genetic control of two cell populations. This is essential when studying inter-organ communications since most of the current genetic drivers are biased toward the expression of the central nervous system. In this manuscript, the authors described the methodology for efficiently generating T2A-LexA-GAD and T2A-QF2 knock-ins by CRISPR, targeting a number of genes with known tissue-specific expression patterns. The authors then validated and compared the expression of double as well as single drivers and found the tissue-specific expression results were largely consistent as expected. Finally, a collection of plasmids for LexA-GAD and QF,2 as well as the corresponding LexAop and QUAS plasmids were generated to facilitate the expansion of these tool kits. In general, this study will be of considerable interest to the fly community and the resources can be readily generalized to make drivers for other genes. I believe this toolkit will have a significant, immediate impact on the fly community.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Lines 56-57: Janelia Flylight lines are not necessarily brain-specific - this collection has or could be screened in other tissues.

      Correct. We have altered this sentence to read: However, these lines were developed primarily for brain expression. Although they are often expressed in other tissues, they are not well suited for experiments targeting non-neuronal cell types

      • Line 197 - I don't see the referenced Figure S1 in the reviewer materials. It appears this is actually referencing panels LL and MM in Figure 2.

      Correct. We have fixed this error.

      • No information on the injection efficiency to create the CRISPR knock-in lines is presented. I am guessing the efficiency will be similar to that of other reported HDR-based CRISPR knock-ins, but if this information is available it would be useful to include it so that others know what to expect when injecting these vectors.

      We did not systematically assay the injection efficiency. However, we can say that it was in line with previous descriptions of CRISPR-based plasmid and ‘drop-in’ HDR methods. We have added a note in the methods that “Knock-in efficiencies were comparable to previous reports (Kanca et al. 2019; Kanca et al. 2022).”

      • Demonstration of successful multi-manipulation would strengthen the paper.

      We do not feel that this is necessary as there have been many papers showing combinatorial Gal4+LexA/QF experiments. An example from our lab can be seen in PMID: 37582831.

      • Also, are there approaches for efficiently constructing pairs of UAS/LexAOp or UAS/QUAS shRNA lines that would potentially streamline the genetics for multi-manipulation? Otherwise, this could be rather cumbersome to implement as one needs to combine a Gal4 line, a LexA/QF2 line (which will be constrained as to its chromosomal location by the target gene), and separate UAS-shRNA and LexAop/QUAS-shRNA constructs into the same fly.

      There are some recent innovations that are useful in this respect. We have added a sentence to the discussion that says: “There remains an unmet need for a single vector that would allow for UAS/LexAop/QUAS control of different shRNAs. However, recent innovations in multi module vectors and multiplexed drug-based genetics allow researchers to more efficiently generate UAS/QUAS/lexAop transgenic fly strains (Matinyan et al. 2021; Wendler et al. 2022).”

      • In Figure 5 - is the difference for the hh inserts attributable to the driver line or the GFP/mCherry construct (or differential ability to detect GFP/mCherry)? One could try visualizing hhL(-Q) with the LexAop-GFP line. I guess that the correspondence between the nubbin and hh result suggests that maybe QF2 is suppressed in the wing pouch, but this could also be the difference in the reporter constructs and it would be interesting to know if this difference is truly attributable to the driver constructs from the standpoint of knowing how consistent the QF/LexA patterns are expected to be.

      The difference is not attributable to GFP versus mCherry or the specific LexAop and QUAS lines that we used in figure 5. We tested the double knock-in and derivative single knock-ins with various QUAS and lexAop reporters and always observed the same pattern.

      Reviewer #2 (Recommendations For The Authors):

      There are a few points that should be clarified. A list of these specific points is provided below with the view that this could help the preparations of a stronger, improved paper.

      Line 50-51: "There have been no systematic studies comparing the two systems, with only anecdotal evidence to support one system over the other." It is unclear to me what the anecdotal evidence the authors referred to. Could the authors elaborate more on this part?

      Based on an examination of QUAS brains, Potter et al, 2010 (PMID 20434990) makes the claim that “The low basal expression of QUAS and UAS reporters provides significant advantage compared to the lexA binary expression system.”

      Shearin et al., 2014 (PMID: 24451596) compared Gal4/UAS, LexA/LexAop, and QF/QUAS reporter strength with the nompC driver and found that the QF system produced the strongest expression.

      While these observations might be true in the nervous system, it isn’t clear that this extends to other tissues, nor what effect this would have on gene knockdown experiments.

      There have been some reports that have explored swapping out a Gal4 insertion for a LexA or QF at the same locus. For example, Gohl et al. 2011 PMID: (PMID 21473015) mentions that “the majority of the swaps captured most features of the original GAL4 expression patterns. In some cases, however, either prominent features of the GAL4 pattern were lost or we observed new expression patterns. These changes may have resulted from differences in the strength or responsiveness of reporter lines. Alternately, the swap may have modified some combination of enhancer spacing and sequence composition flanking the promoter.”

      Line 61-62: "On average, each StanEx line expresses LexA activity in five distinct cell types, with only one line showing expression in just one tissue..." What's the evidence to support this claim?

      This observation comes from Figure S3 of Kockel et al. 2016 (PMID: 27527793), where the authors “analyzed a subset of 76 StanEx lines that are unambiguously inserted within, or adjacent to, a single known gene.” We cited this reference in the preceding sentence. To clarify, we have added the citation again for line 61-62.

      Line 63-65: "These findings are consistent with prior studies indicating that enhancers very rarely produce expression patterns that are limited to a single cell type in a complex organism (Jenett et al. 2012)." It might be worth expanding on the use of the split system to achieve high cell-type-specificity. Especially, there are growing resources using split-intein and T2A-split-GAL4 with the prediction of genes from single-cell RNA sequencing datasets.

      We agree that the split system is currently the premier method to produce the most specific driver lines. Indeed, our group has recently published a paper on the split-intein Gal4 system (see PMID 37276389). However, the tradeoff is that split systems usually require generation of transgenic lines, which becomes impractical for research involving two independent binary transcriptional systems, as the user would need to combine at least three driver components into single stocks, plus the UAS/QUAS/LexAop insertions. The ideal would be to generate complementary split insertions on the same chromosome, but we think a discussion of this is tangential to the thrust of our work here.

      The authors did not fully discuss the rationale of using LexA-GAD vs LexA-p65 or VP16AD throughout the manuscript. I assumed the main reason for choosing LexA-GAD was to be compatible with GAL80 suppression. It might be worth explicitly stating in the result (e.g., line 123 or in the introduction). Also, did the authors observe weak transcriptional activation using LexA-GAD? It has been shown that the strength of transactional activation is much weaker for GAL4AD than the p65 or VP16AD. This might be worth noting in the manuscript as well.

      We did briefly mention in the introduction that one disadvantage of the Flylight lines is that they “use a p65 transcriptional activation domain and therefore are not compatible with the Gal80 temperature sensitive Gal4 repression system.” We have expanded on this issue in the introduction which now says: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015).”

      We did not have any problems visualizing gene expression with fluorescent reporters. Nor did we have any difficulty obtaining knock-down phenotypes with ubiquitous drivers.

      Line 125-127. Is there a specific reason why the authors chose the SV40 terminator for the double driver construct but the Hsp70 terminator for the single driver construct?

      We found that the Hsp70 terminator gave slightly lower expression and decided to use this for the singles to avoid toxicity. For the doubles we chose the SV40, to compensate for reduced protein expressiojn of the second gene position.

      Line 144-146: "To verify the knock-ins, we PCR-amplified the genomic regions flanking the insertion sites and confirmed that the insertions were seamless and in-frame." Did the authors recover lines with indel introduced, resulting in out-of-frame insertion?

      Yes, we did see indels, which sometimes resulted in out of frame insertions, which were discarded. This result is in line with what we have observed with other CRISPR HDR knock-in experiments.

      The underlying reason might be out of the scope of this manuscript. However, it would still be helpful for the authors to speculate the potential reasons why the T2A-LexA-GAD and T2A-QF2 targeting the same insertion site showed very distinct expressions.

      It is outside the scope of this report to test this issue experimentally. We have a section in the discussion which does speculate as to the reason: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016). Alternatively, differences in the LexA-GAD and QF2 sequences, and sequence length, could impact the function of nearby gene regulatory regions.”

      Regarding the observation that the existence of 3XP3-RFP marker can interfere with the expression of T2A-LexA-GAD and T2A-QF2 expression in a case-by-case manner, it might be worth emphasizing in the discussion that the proper removal of 3XP3-RFP marker by Cre/LoxP recombination is important.

      We have added the following to the discussion: “Importantly, our knock-in constructs contain the 3XP3-RFP cassette for screening transformants. Perhaps due to interaction between the 3XP3 promoter and the regulatory regions of the target gene, we occasionally saw misexpression of the LexA-GAD/QF2 in the 3XP3 domain. We have therefore prioritized Cre-Lox removal of the 3XP3-RFP cassette from our knock-in stocks, and advise that users of the plasmids described here likewise remove the marker, following successful knock-in.”

      For Fig. 5B, 5F-G, the authors should elaborate more in the result section. For example, lines 215-217: "We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B)." It is unclear what the authors mean by "robust generation". Also, there is no description of the results in Fig. 5F-G.

      We have expanded this section for figure 5B, which now reads: “We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B). In the case of the hh line, 15 out of 36 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 14% recombinant offspring per parent. 20 out of 36 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent. In the case of the dpp line, 31 out of 32 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 30% recombinant offspring per parent. 17 out of 32 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent.

      We have also added a description for Figure 5F-G, which reads: “Recombinants were also independently verified by PCR of the insertions (Figure 5F-G), where we observed the expected smaller band sizes in the derivative T2A-QF2 and T2A-LexA-GAD relative to the parental double driver.”

      Line 229, minor error: "Into these vectors, ..."

      We have edited this to read: “We cloned shRNAs targeting forked (f) and ebony (e) genes into these vectors and assayed their phenotypes when crossed to ubiquitous LexA-GAD and QF2 drivers.”

      Line 238-240: "Both Tub-LexA-GAD and Tub-QF2 drivers generated knockdown phenotypes in the thorax when crossed to f and e shRNA lines. However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J)." The stated "stronger phenotypes" are not clear to me. It might be worth elaborating more.

      We have further clarified this by changing it to: “However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J). For example, Tub-LexA-GAD produced a fully penetrant f bristle phenotype (Figure 6F) while some wild-type bristles remained on the thoraces of Tub-QF2 f knockdown (Figure 6G). Neither Tub-LexA-GAD or Tub-QF2 was able to achieve the strength of phenotype generated by the T2A-LexA-GAD da knock-in line (compare the darkness of the cuticle caused by e knockdown in Figure 6H-J).”

      Line 257-250: "Our collection of T2A-LexA-GAD and T2A-QF2 and double driver vectors can be easily adapted to target any gene for CRISPR knock-in, with a high probability that the resulting line will accurately reflect the expression of the endogenous locus" The authors could refer to the recent gene-specific Trojan GAL4/split-GAL4 work to support the idea that these gene-specific T2A-GAL4/split-GAL4 drivers reflect better than the enhancer-based drivers.

      We have added the following sentence to the discussion: “The specificity achieved with this approach can also be seen in recent efforts to build collections of gene specific T2A-Split-Gal4 and T2A-Gal4 insertions (Kanca et al. 2019; Chen et al. 2023; Ewen-Campen et al. 2023).”

      Line 630: "Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression." It would be helpful to add the annotation on Fig. 3B to show the location of glial cell expression.

      We have added arrowheads on Figure 3 and the legend now reads: “Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression (white arrowheads).

      Line 650-651: "The fat body mCherry expression is also present in the reporter stock and does not indicate LexA-GAD activity." I did not get what the authors were trying to convey. Where did the fat body mCherry expression come from? Please elaborate more.

      We have changed this section to explain that “The fat body mCherry expression (yellow arrowhead) is from leakiness of the reporter stock and does not indicate LexA-GAD activity.”

      Line 679-680: "forked shRNA produced a forked bristles phenotype." Please add the annotation on the figures to show where the phenotypes were.

      We have added arrowheads and asterisks to the figure. The legend now reads: “(E-G) forked shRNA produced a forked bristles phenotype (white arrowheads). Note that some bristles retain a more elongated wild-type morphology with the Tub-QF2 driven forked knockdown (G, yellow asterisk).”

      Fig 1D-E and 4A-B. There is no description throughout the manuscript about QA, QS regulation as well as little GAL80ts regulation. It will confuse readers with a little fly genetic background. Please include the introductions of these regulations of different binary expression systems.

      We have added a section in the introduction, which states: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by the temperature sensitive Gal4 repressor, Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015). Like Gal80-based modulation of LexA-GAD, QF2 activity can also be regulated temporally by expressing QS, a QF repressor. QS repression of QF can be released by feeding flies quinic acid (Riabinina and Potter 2016).”

      Fig. 2, there are several ND in the figure without any explanation in the manuscript (e.g. Mef2 and He). In addition, the expression patterns look quite different between T2A-LexA-GAD and T2A-QF2 for some genes (e.g., mex1, Myo31DF), but the authors did not mention any of them in the manuscript. Please elaborate more.

      We have altered the Figure 2 legend as follows: “(A-KK) T2A-LexA-GAD knock-in lines crossed to a LexAop-GFP reporter and T2A-QF2 knock-in lines crossed to a QUAS-GFP reporter. Panels show 3rd instar larva. GFP shows the driver line expression pattern. RFP shows the 3XP3 transformation marker, which labels the posterior gut and anal pads of the larva. Gene names and tissues are on the left. We failed to obtain LexA-GAD knock-ins for Mef2 (E) and He (DD). (LL-MM) 3rd instar imaginal disc from the insertions in the nubbin (nub) gene. Note that most of the lines are highly tissue-specific and are comparable between the LexA-GAD and QF2 knock-ins. Insertions in the daughterless gene (da) and nub are an exception, as the T2A-LexA-GAD, but not the T2A-QF2, gives the expected expression pattern. Insertions in the gut-specific genes mex1 (X-Y) and Myo31Df (Z-AA) also differed between the LexA-GAD and QF2 drivers.”

      We have also added a note on the inconsistency of mex1 and Myo31Df in the discussion: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. Additionally, we found that the driver expression in the gut-specific genes, mex1 and Myo31Df differed between the LexA-GAD and QF2 transformants. In both cases the LexA-GAD was more broadly expressed along the length of the gut than the QF2. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016).”

      Fig. 4B, it is unclear why the hsp70 is present downstream of the enhancer of interest (upstream of T2A). Is it the molecular mark resulting from the cloning steps? Does it serve any specific purpose?

      This is the Drosophila hsp70 gene minimal promoter and is standard for many expression constructs in Drosophila. In the methods section we described how we made versions of the pMCS-T2A-QF2-T2A-LexA-GAD-WALIUM20 with and without tis minimal promoter: “We used pMCS-T2A-QF2-T2A-lexA0GAD-WALIUM20 for dpp-blk and pMCS-T2A-QF2-T2A-lexGAD-WALIUM20-alt (which lacks the hsp70 promoter) for Ilp2, since dpp-blk does not have a basal promoter, but the Ilp2 enhancer does.”

      Fig 5A. The resulting single T2A-QF2 and T2A LexA-GAD from the double driver parental lines retain the sequence of FRT3 upstream of the QF2 and LexA-GAD. I assume the FRT3 part will be translated and remain attached to QF2 and LexA-GAD. Is that correct? If so, would this cause any adverse effect?

      Correct. The FRT3 sequence is present in both the parental double and single derivatives. We can say that the additional amino acids do not prevent LexA-GAD or QF2 transcriptional activation. We do not know whether there may be other adverse effects, though we did not observe any.

      Fig. 5C-C'. It seems like the images of Fig. 5C-C' were the same as Fig. 4D-D'. If so, the authors should indicate that in the figure legend.

      We have made a note of this in the figure legend.

    1. Then Gawain bethought him, and it came into his heart that this were a jewel for the jeopardy that awaited him when he came to the Green Chapel to seek the return blow–could he so order it that he should escape unslain, ’twere a craft worth trying

      Here the green girdle serves as a symbol of life, however we come to see the meaning of the girdle changes throughout the text. We see Sir Gawain not follow with his chivalrous values and duty of faith as the presentation of the girdle challenges his beliefs as it becomes a stronger symbol of faith than God himself. This is an interesting development considering how previously in the text Sir Gawain was devoted to his faith, turning to it in his moments of need by praying "that Mary may be his guide". I think this is a key moment in the text because it reveals the humanity in Sir Gawain and his fear for death. As time goes by and the day to which he is interred to meet with the Green Knight approaches, he resorts to supernatural objects in the midst of fear for his own life.

      Works cited: Malarkey, Stoddard, and J. Barre Toelken. “Gawain and the Green Girdle.” The Journal of English and Germanic Philology, vol. 63, no. 1, 1964, pp. 14–20. JSTOR, http://www.jstor.org/stable/27714339. Accessed 6 Mar. 2024.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1.1: The distinction of PIGS from nearby OPA, which has also been implied in navigation and ego-motion, is not as clear as it could be.

      Response1.1: The main “functional” distinction between TOS/OPA and PIGS is that TOS/OPA responds preferentially to moving vs. stationary stimuli (even concentric rings), likely due to its overlap with the retinotopic motion-selective visual area V3A, for which this is a defining functional property (e.g. Tootell et al., 1997, J Neurosci). In comparison, PIGS does not show such a motion-selectivity. Instead, PIGS responds preferentially to more complex forms of motion within scenes.

      Moreover, PIGS and TOS/OPA are located in differently relative to the retinotopic visual areas. Briefly, PIGS is located adjacent to areas IPS3-4 while TOS/OPA overlaps with areas V3A/B and IPS0 (V7). This point is now highlighted in the new experiment 3b and the new Figure 6. In this revision, we also tried to better highlight these point in sections 4.3, 4.4 and 4.5. (see also the response to the first comment from Reviewer #2).

      Reviewer 2:

      Comment 2.1: First, the scene-selective region identified appears to overlap with regions that have previously been identified in terms of their retinotopic properties. In particular, it is unclear whether this region overlaps with V7/IPS0 and/or IPS1. This is particularly important since prior work has shown that OPA often overlaps with v7/IPS0 (Silson et al, 2016, Journal of Vision). The findings would be much stronger if the authors could show how the location of PIGS relates to retinotopic areas (other than V6, which they do currently consider). I wonder if the authors have retinotopic mapping data for any of the participants included in this study. If not, the authors could always show atlas-based definitions of these areas (e.g. Wang et al, 2015, Cerebral Cortex).

      Response 2.1: We thank the reviewers for reminding us to more clearly delineate this issue of possible overlap, including the information provided by Silson et al, 2016. The issue of possible overlap between area TOS/OPA and the retinotopic visual areas, both in humans and non-human primates, was also clarified by our team in 2011 (Nasr et al., 2011). As you can see in Figure 6 (newly generated), and consistent with those previous studies, TOS/OPA overlaps with visual areas V3A/B and V7. Whereas PIGS is located more dorsally close to IPS3-4. As shown here, there is no overlap between PIGS and TOS/OPA and there is no overlap between PIGS and areas V3A/B and V7.

      To more directly address the reviewer’s concern, in this revision, we have added a new experiment (Experiment 3b) in which we have shown the relative position of PIGS and the retinotopic areas in two individual subjects (Figure 6). All the relevant points are also discussed in section 4.3.

      Comment 2.2: Second, recent studies have reported a region anterior to OPA that seems to be involved in scene memory (Steel et al, 2021, Nature Communications; Steel et al, 2023, The Journal of Neuroscience; Steel et al, 2023, biorXiv). Is this region distinct from PIGS? Based on the figures in those papers, the scene memory-related region is inferior to V7/IPS0, so characterizing the location of PIGS to V7/IPS0 as suggested above would be very helpful here as well. If PIGS overlaps with either of V7/IPS0 or the scene memory-related area described by Steel and colleagues, then arguably it is not a newly defined region (although the characterization provided here still provides new information).

      Response 2.2: The lateral-place memory area (LPMA) is located on the lateral brain surface, anterior relative to the IPS (see Figure 1 from Steel et al., 2021 and Figure 3 from Steel et al., 2023). In contrast, PIGS is located on the posterior brain surface, also posterior relative to the IPS. In other words, they are located on two different sides of a major brain sulcus. In this revision we have clarified this point, including the citations by Steel and colleagues in section 4.3.

      Comments 2.3: Another reason that it would be helpful to relate PIGS to this scene memory area is that this scene memory area has been shown to have activity related to the amount of visuospatial context (Steel et al, 2023, The Journal of Neuroscience). The conditions used to show the sensitivity of PIGS to ego-motion also differ in the visuospatial context that can be accessed from the stimuli. Even if PIGS appears distinct from the scene memory area, the degree of visuospatial context is an alternative account of what might be represented in PIGS.

      Response 2.3: The reviewer raises an interesting point. One minor confusion is that we may be inadvertently referring to two slightly different types of “visuospatial context”. Specifically, the stimuli used in the ego-motion experiment here (i.e. coherently vs. incoherently changing scenes) represent the same scenes, and the only difference between the two conditions is the sequence of images across the experimental blocks. In that sense, the two experimental conditions may be considered to have the same visuospatial “context”. However, it could be also argued that the coherently changing scenes provide more information about the environmental layout. In that case, considering the previous reports that PPA/TPA and RSC/MPA may also be involved in layout encoding (Epstein and Kanwisher 1998; Wolbers et al. 2011), we expected to see more activity within those regions in response to coherently compared incoherently changing scenes. These issues are now more explicitly discussed in the revised article (section 4.6).

      Reviewer 3:

      Comment 3.1: There are few weaknesses in this work. If pressed, I might say that the stimuli depicting ego-motion do not, strictly speaking, depict motion, but only apparent motion between 2s apart photographs. However, this choice was made to equate frame rates and motion contrast between the 'ego-motion' and a control condition, which is a useful and valid approach to the problem. Some choices for visualization of the results might be made differently; for example, outlines of the regions might be shown in more plots for easier comparison of activation locations, but this is a minor issue.

      Response 3.1: We thank the reviewer for these constructive suggestions, and we agree with their comment that the ego-motion stimuli are not smooth, even though they were refreshed every 100 ms. However, the stimuli were nevertheless coherent enough to activate areas V6 and MT, two major areas known to respond preferentially to coherent compared to incoherent motion.

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed reading this article. I have a few suggestions for improvement:

      (1) Delineation from OPA: The OPA has been described in quite similar terms as PIGS, with its involvement in ego-motion (e.g., crawling, walking) and navigation in general (e.g., Dilks' recent work; Bonner and Epstein). The authors address the distinction in section 4.4. Unlike Kamps et al. (2016) and Jones et al. (2023), the authors found weak or no evidence for ego-motion in OPA. They explain this discrepancy with differences in refresh rates and different levels of spatial smoothing of the fMRI data. It is not clear why these fairly small methodological differences would lead to different findings of ego-motion in the OPA. Arguably, the OPA is the closest of the "established" scene areas to PIGS, both in anatomical location and in function. I would therefore appreciate a more detailed discussion of the differences between these two areas.

      Response: Jones et al. have also shown that ego-motion TOS/OPA activity when compared to scrambled scenes. This is fundamentally different than what we have shown here, which coherently vs. incoherently changing scenes (i.e. not a small difference). Also, Kamps et al. used static scenes as a control which, considering TOS/OPA motion-selectivity, have a large impact on TOS/OPA response.

      (2) Random effects analysis: The authors mention using a "random effects analysis" for several of their experiments. I would ask them to provide more details on what statistical models were used here. Were they purely random-effects models or actually mixed-effects models? What were the factors that entered into the analysis? Providing more detail would make the analysis techniques more transparent.

      Response: This point is now clarified in the Methods section.

      (3) Data and code availability: The authors write that data and code "are ready to be shared upon request." (section 2.5) In the spirit of transparency and openness, I strongly encourage the authors to make the data publicly available, e.g., on OSF or OpenNeuro. In particular, having probabilistic maps of PIGS available will allow other researchers to include PIGS in their analysis pipelines, making the current work more impactful.

      Response: We have made the probabilistic labels available to the public. This point is now highlighted in section 2.5.

      (4) Minor comments on the writing that caught my eye while reading the article:

      • Line 27: "in the human brain".

      Response: Done.

      -Line 30: I don't agree with the characterization of the previous model of scene perception as "simplistic." Adding one additional ROI makes it no less simplistic. Perhaps the authors can rephrase to make this slightly less antagonistic?

      Response: Done.

      • Line 71: it is not clear why NHPs are relevant here.

      Response: We decided to keep the text intact.

      • Line 138" "were randomized".

      Response: Done.

      • Line 152: "consisting".

      Response: Done.

      • Line 155: "sets" (plural).

      Response: Done.

      • Lines 253-255: Why were the 3T spatially smoothed but not the 7T data? This seems odd.

      Response: We kept the text intact.

      • Line 481: "we found strong motion selectivity" (remove "a").

      Response: Done.

      • Line 564: a word is missing, probably: "a stronger effect of ego-motion".

      Response: Done.

      • Line 591: "controlling spatial attention" (remove "the").

      Response: Done.

      • Line 591 and 594: Both sentences start with "However". I think the first of these should not because it is setting up the contrast for the second sentence.

      Response: Done.

      • Line 607: "higher-level" (hyphen).

      Response: Done.

      • Throughout the manuscript: adverbial phrases such as "(in)coherently changing" or "probabilistically localized" do not get a hyphen.

      Response: Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that "All data, codes and stimuli are ready to be shared upon request". Ideally, these materials should be deposited in appropriate repositories (e.g. OpenMRI, GitHub) and not require readers to contact the authors to obtain such materials.

      Other Comments:

      (a) The title ("A previously undescribed scene-selective site is the key to encoding ego-motion in natural environments") is potentially misleading - the work was not conducted in a natural environment. At best, you could say they are 'naturalistic stimuli'. Also, in what sense is PIGS "key" to encoding ego-motion - the study just shows sensitivity to this factor.

      Response: We changed the title to “naturalistic environments”.

      (b) Figure 1 - I'm not sure what point the authors are trying to make with Figure 1. The comparison is between a highly smoothed, group fixed-effects analysis and a less-smoothed individual subject analysis. The differences between the two could reflect group vs. individual, highly-smoothed (5 mm) versus less-smoothed (2 mm), or differences in thresholding. If the thresholding were lower for the group analysis, it would probably start to look more similar to the individual subject. As it stands, this figure isn't particularly informative, it seems redundant with Figure 2, and Figure 1A is not even referenced in the main text. Further, fixed effects analyses are relatively uncommon in the recent literature, so their inclusion is unusual.

      Response: Figure 1A is a replication of the data/method used in Nasr et al., 2011 and it will help the readers see the difference between the “traditional” scene-selectivity maps generated based on group-averaging” vs. data from individual subjects. In this case, we decided not to change the Figure.

      (c) Figure 3 - why are the two sets of maps shown at different thresholds? For 3B given the larger sample size, it is expected that the extent of the significant activations will increase. Currently the higher threshold for 3B and the smaller range for 3A is making the sets of maps look more comparable.

      Response: As the reviewer noticed, the number of subjects is larger in Figure 3B compared to 3A. The main point of this figure is to show that the PIGS activity center does not vary across populations. Considering this point, we decided not to change this figure.

      (d) Figure 10 - why is the threshold lower than used for other figures? It would be helpful if there was consistent thresholding across figures.

      Response: Experiment 6 and Experiment 1 are based on different stimuli (see Methods). Also, among those subjects who participated in Experiment 1, two subjects did not participate in Experiment 6. These points are already highlighted in the text.

      (e) Figures - how about the AFNI approach of thresholding and showing sub-threshold data at the same time? (Taylor et al, 2023, Neuroimage).

      Response: We highly appreciate the methodology suggested by Taylor and colleagues. However, our main point here is to show the center of PIGS activity. In this condition, showing an unthresholded activity map doesn’t have any advantage over the current maps. Considering these points, we decided not to change the figures.

      (f) Coherent versus incoherent scenes - there are many differences between the coherent and incoherent scenes. Arguing that it must be ego-motion seems a little premature without further investigation. Activity anterior to OPA has been associated with the construction of an internal representation of a spatial environment (Steel et al., 2023, The Journal of Neuroscience). Could it be that this is the key effect, not really the ego-motion?

      Response: In this revision, we discussed the study by Steel et al., 2021 and 2023 in section 4.3.

      Reviewer #3 (Recommendations For The Authors):

      Overall, I think this is already an excellent contribution. The suggestions I have are minor and may help with the clarity of the results.

      (1) My main request of the authors would be to provide more points of reference in some of the figures with cortical maps. In many cases, the authors use arrows to point to the locations of activations of interest. However, the arrows in adjacent figures are often not placed in exactly the same places on maps that are meant to be compared. It would very much help the viewer to compare activations if the arrows pointing to activations or regions of interest were placed in identical locations for the same brains appearing in different sub-panels (e.g. in panels A and B of Figure 1). The underlying folds of the cortical surface provide some points of reference, but these are often occluded to different extents by data in figures that are meant to be compared.

      Response: To address the reviewer’s concern, we regenerated Figure 8 (Figure 7 in the previous submission) and we tried to put arrowheads in identical locations, as much as possible. Especially for PIGS, this point was also considered in Figures 2 and 3.

      (2) Outlines (such as those in Figure 5) are also very useful, and I would encourage broader use of them in other figures (e.g. Figures 7, 10, and 12). Figures 10 and 12 are on the fsaverage surface, so the same outlines could be used for them as for Figure 5.

      To be clear, it's possible to apprehend the results with the figures as they are, but I think a few small changes could help a lot.

      Response: In this revision, we added outlines to Figures 11 and 13 (Figure 10 and 12 in the previous submission). We did not add the outline to Figure 8 because it made it hard to see PIGS. Rather we used arrows (see the previous comment).

      Other minor points:

      In the method for Experiment 4, the authors write: "Other details of the experiment were similar to those in Experiment 1.". Similar or the same? The authors should clarify this statement, e.g. "the number of images per block, the number of blocks, the number of runs were the same as Experiment 1" - with any differences noted.

      Response: This point is now addressed in the Methods section.

      In Figure 8, it would be better to have the panel labels (A, B, C, D) in the upper left of each panel rather than the lower left.

      Response: We tried to keep the panels arrangement consistent across the figures. That is why letters are positioned like this.

      A final gentle suggestion: pycortex (http://github.com/gallantlab/pycortex) provides a means to visualize the flattened fsaveage surface with outlines for localized regions of interest and overlaid lines for major sulci. Though it is by no means necessary for publication, It would be lovely to see these results on that surface, which is freely available and downloadable via a pycortex command (surface here: https://figshare.com/articles/dataset/fsaverage_subject_for_pycortex/9916166)

      Response: We thank the reviewer for bringing pycortex to our attention. We will consider using it in our future studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The emergence of catalytic self-replication of polymers is an important question in the context of the origin of life. Tkachenko and Maslov present a model in which such a catalytic polymer sequence emerges from a random pool of replicating polymers.

      Strengths:

      The model is part of a theme from many previous papers from the same authors and their colleagues. The model is interesting, technically correct, and demonstrates qualitatively new phenomena. It is good that the paper also makes a connection with possible experimental scenarios -- specifically, concrete proposals are made for testing the core ideas of the model. It would indeed be an exciting demonstration when such an experiment does indeed materialize.

      Weaknesses:

      Unlike the rest of the paper which is very tight in its arguments, I find that the discussion section is not so. Specifically, sentences such as " In fact, this can be seen as a special case of the classical error catastrophe" are a bit loose and not well substantiated -- although these are in the discussion section, I find this to be a weakness of an otherwise good paper. Tightening some of the arguments here will make it an excellent paper in my opinion.

      We followed the reviewer's recommendations by streamlining the discussion and removing the potentially confusing comparison to the classic error catastrophe.

      Reviewer #2 (Public Review):

      Summary:

      The replication of information-coding polymers and the emergence of catalytic ribozymes pose significant challenges, both experimentally and theoretically, in the study of the RNA world hypothesis. In this context, Tkachenko et al. put forth a novel hypothesis regarding a replication oligomer system based on a cleavage ribozyme. They initially highlighted that the breakage of oligomers could contribute to self-replication, provided that these fragments function as primers for subsequent replications. Next, they proposed a self-replicating system of oligomers founded on a hammerhead structure that catalyzes cleavage. By a simple dynamical model, they demonstrated that such a system is self-sustainable in certain parameter regimes. Furthermore, they delved into discussions regarding the potential emergence of such a system and the evolution toward further optimized ribozymes.

      Strengths: Although the cleavage (hammerhead) ribozyme has been discussed in the context of the origins of life, the authors are the first to discuss how they could be selected using a mathematical model as far as I know. The idea is simple: ribozyme activity creates fragments by breakage of an oligomer, which works as a primer for the ribozyme itself, resulting in a positive feedback system (i.e., autocatalytic sets in a broader sense). This potentially enables us to resolve at the same time problems on the (i) supply of new primers (but note that there is a major concern on this as described in the 'weakness'), and (ii) the sustaining of the cleavage ribozyme.

      Weaknesses:

      The major weakness of their theory is that the ends of the new primers, formed through the breakage/cleavage of polymers, must be chemically active (as the authors have already emphasized in the last paragraph of their discussion) to enable further elongation. Reactivating the ends of preexisting oligomers without enzymes, to the best of our current knowledge, could be a challenging task. Although their model heavily relies on this aspect, the authors do not elaborate on it.

      We have added a discussion of the need for chemical activation: "It is important to note that in the context of RNA, such bidirectional elongation requires chemical activation of the phosphate group at the 5' end of the primer to provide free energy for the newly formed covalent bond. Like the polymerization process itself, achieving this without enzymes is biochemically challenging. One might speculate that prebiotic evolution relied on inorganic catalysis, such as on mineral surfaces, or involved polymers other than today's RNA."

      We also included in the discussion a comment on a possible combination of our mechanism and the Virtual Circle Genome model that would avoid the need for bidirectional growth: "It may be possible to incorporate the selection mechanism proposed in this paper into the Virtual Circle Genome model. Such a hybrid approach would avoid the need for the biochemically problematic bidirectional growth while explaining the emergence of early catalytic activity unaffected by sequence scrambling"

      Another weakness is in the setup of their discussion on evolutionary dynamics. While they claim that their model is robust against replication errors, their approach to evolutionary dynamics appears unconventional, and it remains unclear under what conditions their assumptions are founded. They treat a whole set of oligos as a subject of evolution, rather than each individual oligo. This may necessitate more complex assumptions, such as the encapsulation of sets of oligos inside a protocell, to be adequately rationalized. Thus, it remains uncertain whether the system is indeed robust against replication errors in a more natural context. For example, if a mutant oligo, denoted as b', arises due to an error in the replication of oligo b, and if b' has lower catalytic activity but replicates more rapidly than b, it may ultimately come to dominate the system.

      We agree with the reviewer that the evolutionary dynamics in multi-species ecosystems are somewhat complicated and potentially confusing. To this end, we have added the following text and citations to our discussion: "Note that this fitness is defined at the level of the ecosystem, comprising all sequences in the chemostat, and is not necessarily attributable to individual members of that population. Over time, similar to microbial ecosystems, this population changes according to the laws of competitive exclusion [34, 35]". However, we would like to point out that we assume that our model operates in a chemostat-like environment, which can be realized, for example, in a prebiotic pool supplied with a constant flux of monomers. Thus, the evolutionary dynamics described by our equations do not require encapsulation of sets of oligos in a protocell followed by selection of these protocells.

      Reviewer #3 (Public Review):

      Summary:

      Non-enzymatic replication of RNA or a similar polymer is likely to be important for the origin of life. The authors present a model of how a functional catalytic sequence could emerge from a mixture of sequences undergoing non-enzymatic replication.

      Strengths:

      Interesting model describing details of the proposed replication mechanism.

      Weaknesses:

      A discussion of the virtual circular genome idea proposed in [33] is included in the discussion section together with the problem of sequence scrambling faced by this mechanism that was raised in [34]. However, the authors state that sequence scrambling is a special case of the classical error catastrophe. This should be reworded, because these phenomena are completely different. The error catastrophe occurs due to single-point mutational errors in a model that assumes that a complete template is being copied in one cycle. Sequence scrambling arises in models that assume cycles of melting and reannealing, in which case only part of a template is copied in one cycle. Scrambling is due to the many alternative ways in which pairs of sequences can reanneal. Many of these alternatives are incorrect and this leads to the disappearance of the original sequence. This problem exists even in the limit where there is zero mutational error rate. Therefore, it cannot be called a special case of the error catastrophe problem.

      We followed the reviewer's recommendations and removed the potentially confusing comparison to the classic error catastrophe.

      The authors seem to believe that their model avoids the scrambling problem. If this is the case, a clear explanation should be added about why this problem is avoided. Two possible points are mentioned.

      (i) Replication is bidirectional in this model. This seems like a small detail to me. I don't think it makes any difference to whether scrambling occurs.

      (ii) The functional activity is located in a short sequence region. I can imagine that if the length of a strand that is synthesized in a single cycle is long enough to cover the complete functional region, then sometimes the complete functional sequence can be copied in one cycle. Is this what is being argued? If so, it depends a lot on rates of primer extension and lengths of melting cycles etc, and some comment on this should be made.

      As we now explain in the text, while the scrambling problem itself is not completely avoided in our model, it does not affect the replication of the functionally relevant regions of the oligomers. Our key observation is that, due to the simplicity of the cleaving enzymes, the length of the functionally relevant region is much smaller than the scrambling-free length. This can be seen from a back-of-the-envelope estimate of the scrambling-free length added to the text: "...assuming the minimal hybridization length l_0=6 and random statistics of the master sequence, one gets the scrambling free length \sqrt{2 x 4^l_0}+l_0 ~100. This is an order of magnitude larger than both l_0 and the length of the core region of the hammerhead ribozyme."

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have evaluated that the authors have proposed a novel mechanism potentially relevant to the origins of life, and they have explained it with a sufficiently simple model. However, I recommend that they address the following issues, including those I raised in the public review:

      • Title: I believe that the title "Emergence of catalytic activity in ..." is rather broad. Could it be more specific to accurately represent the system described in the paper? For instance, "Selective advantage (or selection) of the hammerhead cleavage ribozyme in..." may better encapsulate the paper's focus.

      We thank the reviewer for this suggestion. However, our mechanism is not unique to hammerhead ribozymes. So we decided to keep the old title.

      • One theoretically non-trivial aspect is the stability of the cooperative structure. Could the authors provide a more detailed explanation of what drives the instability of the system and what mechanisms restore its stability? For example, in a similar self-reproducing oligomer system with ribozymes and their fragments (Kamimura et al. PLoS Comp. 2019), the symmetry of fragments breaks because they effectively suppress each other's replication. Also, it would be beneficial to clarify the necessary assumptions for stability. (For instance, the authors assumed that a_L can serve as a primer for only a, while a_R can serve for both a and b.).

      We thank the reviewer for bringing this interesting paper to our attention. The cooperative fixed point in our model is intrinsically dynamically stable. It is an interesting point why the replicase in Kamimura et al can be dynamically unstable, while the ligase in our model is always stable. However, it goes beyond the scope of our study. We added the following discussion to the manuscript: "Note that the stability of our cooperative fixed point is a non-trivial result. For example, in a related model by Kamimura et al. [34], the fixed point corresponding to a viable composite replicase is dynamically unstable and requires additional stabilization, e.g., by cell-like compartments."

      • As mentioned in the public review, a critical aspect of the practical applicability of the theory is whether cleaved oligos can be reactivated and further elongated, especially through non-enzymatic pathways. Alternatively, is it possible with the presence of enzymes? While I appreciate the conceptual beauty of their model, I recommend that they at least address the difficulty or feasibility of achieving this.

      We addressed this point in response to the public review

      • As also mentioned, in the section on evolutionary dynamics, it's essential to clarify the unit of evolution and the assumptions made. For a system-level evolution (i.e., all the sets of oligos, a and b can be the unit of evolution), more detailed assumptions are required, such as the presence of compartments whose growth is coupled with the replication of oligos inside, and the competition between these compartments. I recommend the authors clarify these points.

      We addressed this point in response to the public review

      Reviewer #3 (Recommendations For The Authors):

      Assuming that the above points can be addressed, this reviewer would support publication with minor modifications.

      We addressed all points in response to the public review

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their kind works and helpful insights and suggestions. Below, we have pasted the reviews (in italics), with our responses:


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study provides insights into how polyploidization via endomitosis may arise in human hepatocytes by studying fetal liver cell line-derived organoids. Using live cell imaging and LSM microscopy, binculeation was consistently observed in two independent cell line systems, at frequencies seen in human liver and sensitive to pharmacological inhibition (GSK3i) and genetic manipulation (E2F7 & E2F8 editing). The findings presented are in line with earlier data, largely gathered studying rodents. The data is convincing and robust indicating that these systems can be used to study cause and consequences of polyploidy in human hepatocytes.

      1. While the authors do suggest that they provide a mechanisms how polyploidy is initiated in human hepatocytes undergoing endomitosis, ie. loss of membrane association of membrane-anchoring proteins at the midbody (e.g. Anillin, RacGAP1), I do feel that the data provided is rather descriptive and does not address a particular mechanism that may account for loss of membrane anchoring. As such, the title is making a too strong point, as, in my point of view, it associates with loss of membrane anchorage, but may not drive endomitosis. Whether this is a "passive" process in response to changes in physical forces and tension, or regulated via signalling intermediates to initiate regression of the cleavage furrow is not addressed experimentally (mislocalizing these proteins on a larger scale). Discussion seems warranted.

      We agree with the reviewer that our mechanistic insights into the molecular mechanisms of endomitosis are limited, and we cannot currently prove that the loss of membrane-anchoring drives endomitosis. We have therefore toned down this conclusion and changed the title to “Binucleated human hepatocytes arise through late cytokinetic regression during endomitosis M phase”. Furthermore, we have expanded the Discussion to reflect on the gaps in knowledge and speculate about possible molecular mechanisms of endomitosis, see pages 12-16 (in particular, lines 404-423, lines 433-443, and 445-472.

      I do not see the need for additional experiments, as I believe the data is robust and introduces an interesting new model where the role of ploidy can be studied in human hepatocytes ex vivo. However, if the authors wish to extend their studies and document further similarities with pathways engaged in rodents, some E2F7/8 targets relevant for ploidy control such as Anillin or PIDDosome components, or, maybe MDM2 processing for p53 activation, could be tested in wt and E2F mutant cell lines.

      Unfortunately, we have not been able to look at E2F7/8 targets and their expression in E2F mutant Hep-Orgs. We performed qPCRs for some cytokinesis regulators such as Ect2, RacGap1 and Mklp1 in Hep-Orgs, however these genes are so lowly expressed that we can hardly detect them. This is likely because these transcripts are only expressed in a short period of the cell cycle during S/G2 phase, whereas the vast majority of cells in Hep-Orgs are in G1. Therefore, differences in gene expression are very difficult (if not impossible) to detect by qPCR. We also tried to perform single molecule FISH on Hep-Orgs, which would allow us to quantify lowly expressed transcripts in single cells, however despite that the smFISH stainings work well on cholangiocyte organoids and intestinal organoids, we could not get good signals in Hep-Orgs. Taken together, we are unable at this point to look into downstream targets of E2F7/8.

      A minor suggestion is to clarify the term M-CDK activity in the introduction, as it may not be fully intuitive to all readers; similarly, ploidy reversal is still controversial in the field, but it is stated as a given fact.

      Thank you for these suggestions, we have clarified the term M-CDK on page 3, lines 60-61, and have rephrased the sentence on ploidy reversal on page 3, lines 81-82.

      Reviewer #1 (Significance (Required)):

      Polyploidy at the cellular and nuclear level is a key feature of hepatocytes albeit the physiological significance of the process is not entirely clear. Increased ploidy has been linked to cancer resistance in the liver, but may pose a threat to hepatocyte survival under conditions of repeated compensatory proliferation cycles. Curiously, during normal regeneration after single surgical intervention liver regeneration is not compromised, even though it may recover faster starting when starting from higher ploidy levels. Mechanistically, most data has been generates studying rodents where it is documented that the proliferation behaviour changes around the time of weaning in mice when hepatocytes start to fail cytokinesis and undergo endomitosis, leading to cellular and nuclear polyploidy. In rodents, insulin signalling / AKT appears involved as is the E2F network and p53, activated by the caspase-2-PIDDosome.

      The model system introduced here will allow mechanistic studies in human organoids and help to increase our understanding of this process in steady state and under conditions of stress.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Polyploid cells arise within various human tissues by multiple different mechanisms. Here, Darmasaputra et al present a study of one such mechanism, endomitosis, in liver cells using fetal-derived human hepatocyte organoids. In this model, they demonstrate that binucleated cells arise through the late regression of the cytokinetic furrow prior to abscission. They identify a rare event in cytokinetic cells - loss of midbody association with the plasma membrane - that could explain the cytokinesis failure observed in a proportion of these cells. Finally, they show that loss of Wnt signalling increases the number of binucleation events in a manner that depends on E2F7 and E2F8, similar to what has been observed in murine hepatocytes.

      Major comments:

      This is a compelling and well-presented study. The data presented are high quality, the experiments are well described and controlled and the conclusions are convincing. I am particularly impressed by the technical effort that the authors must have put into obtaining high quality live and IF images of dividing cells within organoids and their careful documentation of what are very rare mitotic events. In addition, the manuscript is extremely well written and I found it a pleasure to read.

      1. I do not think that there are additional experiments that are essential to justify the conclusions of the paper. However, I do have suggestions that I think would strengthen this work and increase its significance. As is, the authors present findings in two different areas: the documentation of cytokinesis failure in hepatocyte organoids and the role of Wnt and E2F7/8 on binucleation. It would be really nice if the two parts could be linked. For example, the authors could examine cell divisions in the organoids without Wnt either live or fixed and show that they have a higher proportion of cells undergoing cytokinetic regression or with membrane-midbody attachment defects. Alternatively, they could look at whether the expression levels of key cytokinetic genes are changed in the Wnt and E2F7/8 organoids. As I said, these experiments are not required for or the publication of this work and I will leave it up to the authors to decide if they have the time or capacity to add additional data.

      We thank the reviewer for this suggestion. Unfortunately, despite substantial effort, we have been unable to perform successful live imaging of Hep-Orgs under CHIR99021 removal conditions: these organoids become very sensitive to live imaging and they also proliferate very slowly. We have tried to look at the expression of cytokinetic genes by qPCR, however these experiments were inconclusive (see also our response to reviewer #1, point 2). Thus, we cannot rule out that the increase in binucleation that we see upon CHIR99021 removal is not due to increased endomitosis, but rather occurs independently, for example by an increased survival rate of binucleated cells upon WNT removal. We have now discussed this issue and explained the limitations of our study in the discussion, pages 14-15, lines 451-460.

      Finally, before publication, the authors should discuss further the mechanisms by which loss of membrane attachment during cytokinesis could occur - there is quite a lot of literature in this area on the role of RacGAP1 and Ect2 in membrane attachment that is not discussed, particularly from the lab of Mark Pentronczki (eg Kotynkova 2026 PMID: 27926870, Lekmotsev PMID: 23235882). It's surprising that the authors haven't mentioned (or looked at) Ect2 at all, especially since Ect2 levels have been shown to control polyploidy in cardiomyocytes (Liu 2019 PMID: 31597755). This at least warrants some discussion.

      We thank the reviewer for pointing us to these articles. We have elaborated the discussion to include the work on rodent and human cardiomyocytes, and to explain why we think that there is no defect in ECT2 and RhoA signaling in human hepatocytes undergoing endomitosis, see pages 13-14, 404-423 and 433-443.

      Minor comments:

      Table 1 would be more striking as a graphical representation. I appreciate that the n numbers in the regressed cells means that statistical comparisons is not possible, but some kind of colour coding or graph would make this part clearer

      We agree that Table 1 was difficult to read – we now show the data schematically in a new figure, Fig.4.

      It's not clear what the difference between Hep-Org 1 and Hep-Org 2 are. Are these from different donors?

      Indeed Hep-Org1 and Hep-Org2 are from different donors. We have clarified this in the text, see page 5, lines 131-133.

      Reviewer #2 (Significance (Required)):

      This study is an important technical development in that it reports a new system to study in depth cell biology of liver endomitosis in non-transformed and, crucially, human 3D hepatocyte organoids. The findings reported using this system are potentially interesting although they could be further developed if they were mechanistically linked together (see major comments). This work is likely to be highly interesting to scientists studying cell division, cytokinesis and hepatocyte biology. It also has wider implications for liver biology and particularly liver regeneration. Additionally, given the role of polyploidisation in many different tissues, it will likely be of interest to scientists studying polyploidy and endomitosis more generally.

      My area of expertise is in cytokinesis and cell division in general, although not specifically in hepatocytes. I am not an expert in organoids.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Darmasaputra and colleagues took advantage of human hepatocyte organoids (Hep-Org) to investigate the formation of binucleated cells that naturally occurs in liver. So far, the mechanism of hepatocyte binucleation has been studied in rodents, where binucleated hepatocytes arise upon weaning through an insulin/akt pathway that inhibits furrow contraction in a fraction of cells (Ref. 21, 22). In addition, it is known that E2F7 and E2F8 downstream of the Wnt signaling repress the expression in mouse hepatocytes of several key cytokinetic proteins (AuroraB, Mklp1, Ect2, Racgap1) and thereby promote binucleation (Ref. 23).

      Advances:

      As seen in vivo, the authors first show that a fraction (5-15%) of cells are binucleated in two independently derived human Hep-Orgs. Live cell imaging reveals that binucleation is not due to furrow ingression defects after anaphase but rather arises from post-furrowing intercellular bridge regression. Fixed data suggest that the cytokinetic midbody formed normally but lost its anchorage to the bridge membrane. Activation of the Wnt signaling resulted in a modest but significant increase in the proportion of binucleated cells (4.5 to Major comments

      1. An outstanding question is whether human Hep-Orgs represent a bona-fide model to study the process of human liver binucleation. The absence of cholangiocytes, vascularization, other cell types and physiological hormones etc. might impact on the mechanism of binucleation, which is the main focus of this study. Since the mechanism of binucleation in human Hep-Orgs appears radically different from what has been reported in vivo in rodents, the authors should reproduce the lack of furrow ingression in mouse Hep-Orgs (that they were able to generate in Ref. 44). This could be done in fixed cells as in Fig. 3. Alternatively, they could use live cell imaging and chemical dyes such as SiR-Tubulin and Cell Mask to label microtubules and the plasma membrane, respectively, without the need of creating genome-edited reporter lines.

      The mechanism of endomitosis that we observe in human hepatocyte organoids is indeed different from what has been observed in mouse hepatocytes. Unfortunately, mouse Hep-Orgs are more difficult to generate as they require a two-step perfusion protocol from live mice (described in Hu et al., 2018). Additionally, mouse Hep-Orgs do not survive freezing, so to be able to perform the suggested experiments, we would need to generate new mouse Hep-Org lines. As our collaborators are currently not performing any experiments with mouse livers, we would need to request an ethical permit to generate these organoids, which would take several months. We have seriously considered this option, however due to the substantial investment in time and resources, we feel these experiments would be more suited for a follow-up study.

      To nonetheless better clarify the differences between what has been observed in rodents, and what we see in the Hep-Orgs, we have added a paragraph in the discussion, see pages 14-15 lines 433-460.

      The videos acquired in Fig. 2 contain much more information than presented. The authors should measure the rate of furrow ingression, the extend of spindle elongation, the time of MT severing and the time of furrow/bridge regression after cytokinesis onset. All these parameters are important since spindle elongation and furrow ingression are altered in rodents. Is this also the case in human Hep-orgs? Furthermore, the spindle seems very different (bent bridges) in endomitotic compared to canonical cytokinesis (Fig. 2A). Finally, the authors should provide more time points during the time of furrow regression to better show how this phenomenon occurs. It seems, based on fixed images, that the midbody stays attached to the plasma yhmembrane in an asymmetric manner (i.e. does not fully detach, contrary to what is stated in the text). 3D reconstructions in fixed cells and a further characterization of the movies would clarify this point.

      We thank the reviewer for this suggestion. Although there are some technical limitations that pose some restrictions (explained below), we have extended our analyses where possible. In our live imaging, we use 5-minute time intervals with 4 mm z-slices, which allows a delicate balance between having enough frames in M phase, and imaging for at least 48 hours, which is required to catch enough divisions. We are unable to image with smaller time intervals or smaller z-slices, as this leads to phototoxicity. Nonetheless, using these settings, we can get an indication of the rate of furrow ingression, time of severing and the time of furrow regression:

      • We find that the time of furrowing onset and the rate of furrow ingression is very similar between canonical M phases and endomitosis M phases: we have now added this data in the results section, page 7, lines 192-199 and Fig. 2D.
      • The time of cytokinetic regression is more variable between endomitoses events, and can range between 30 minutes and 2,5 hours. We have also added this information to page 7, lines 199-202and Fig. 2E
      • The time of MT severing is similar between endomitosis M phases, as we show in Fig. 2C
      • Unfortunately, we cannot accurately measure the extent of spindle elongation, as the divisions occur in 3D and our Z resolution is not good enough. Regarding the observation that the spindle looks different in the endomitosis example in Fig. 2A: we have quantified how often we observe bent midzones in endomitosis versus canonical M phases, and this occurs in 60% of canonical (n=12/20) and 83% of endomitosis M phases (n=15/18). We have now added this information in the results section, page 6, lines 1862-185.
      • We have quantified how often we see the midbody remaining attached to one side of the plasma membrane versus fully detaching: we find that in 6 out of 9 late stage endomitotic regressions, the membrane is detached from both sides, and in 3 out of 9, it remains attached to one side. We have added this information to the results section, page 8 line 249-251.

        DAPI staining is not sensitive enough to detect thin chromatin bridges. To rule out that post-furrowing regression is not merely due to the present of DNA bridges, the authors should confirm their results with LAP2b staining (see PMID 19203582).

      To exclude the presence of ultrafine DNA bridges during anaphase, we have performed a staining for RIF1, a factor that localizes to ultrafine DNA bridges in anaphase and is required for their resolution (Hengeveld et al, 2015, PMID: 26256213). In early anaphase, we find many RIF1-positive thread-like structures, as has been described before in other non-transformed and non-stressed cells. However, in late anaphase and telophase, we never observe these fibers (n=57/57), suggesting that they are fully resolved and are not the cause of cytokinetic regression. We have added this data to the results section, see page 8, lines 226-234, and Fig. S1.

      The authors shows that binucleation results from defective anchorage of the bridge membrane to the midbody, but the molecular mechanism remains elusive and should be further probed. In Fig. 3, there is no obvious changes in the investigated markers. Are the intensities of RACGAP1, Anillin, CIT-K reduced in regressing cells? Are ECT2, activated (phospho) Myosin II, CEP55/ESCRT-III, (activated) AuroraB and MKLP1 normally localized/concentrated? ECT2, AuroraB and MKLP1 are regulated by E2F7/8 (Ref. 23) and AuroraB inactivation after bridge formation leads to late regression (PMID 19203582).

      We agree with the reviewer that the molecular mechanism by which midbodies lose their attachment to the membrane is currently unclear. We do not see any clear differences in the intensities of RACGAP1, Anillin, or CIT-K in cells undergoing endomitotic regression. We also do not expect large differences in localization or abundancies of ECT2, AuroraB or MKLP1, because if this were the case, you would expect differences in early cytokinesis in endomitosis, such as a delay or a slower rate of furrow ingression. We did perform additional IF experiments to investigate the localization of SEPT9, a septin that is expressed in human hepatocytes and that has essential functions in membrane anchorage during cytokinesis. Although we find that SEPT9 exhibits more variable localizations than RACGAP1, Anillin, and CIT-K, we find that in the majority of endomitotic regressions, it is also absent from the regressed membrane (n=5/7 cells). We have added this data to the results section on page 9, and in the figures Fig. 3C and Fig.4C.

      The results of Fig. 4F indicate that the increased proportion of binucleated cells upon CHIR99021 removal depends on E2F7/8. Without live cell imaging (or FISH experiments) the authors cannot conclude that conclude that the increase in endomitosis is dependent on E2F7/8. A decrease in binucleation could indeed not imply a reduced occurrence of endomitosis. For instance, it is possible that E2F7/8 KO induces the formation of mononucleated 4n cells due to early mitotic failure. This issue should be clarified.

      The reviewer raises an important point. Unfortunately, we were unable to generate E2F7/8 KO lines containing fluorescent nuclear and membrane markers, which would allow us to perform live-imaging and confirm that these organoids perform less endomitosis. As an alternative, we tried to use SiR-Tubulin dyes for live imaging, but even at very low concentrations these dyes are toxic for the organoids. To exclude the possibility that E2F7/8 KO induces the formation of mononucleated 4n cells, we have measured the DNA content in wildtype, E2F7 and E2F8 lines, and found that the distribution of ploidies is very similar between these lines, both in normal growth conditions as well as upon removal of CHIR99021 (see the new supplemental figure, Fig. S3). We thus think it is unlikely that E2F7/8 KO induces the formation of mononucleated 4n, however it remains possible that the differences in percentage of binucleated cells arise independently of endomitosis. We have now toned down our conclusions on the function of WNT signaling and E2F7/8 in endomitosis, and discussed alternative explanations for our findings in the discussion, see page 14, lines 451-460.

      Binucleation increases with age both in humans and rodents. Could this feature be mimicked in the human Hep-Org by leaving the organoids longer in culture? (optional but would reinforce the value of the model).

      We do not see an increase in binucleation percentages in organoids that are kept longer in cultures, and we have now also tested the effect of growing the organoids in a “differentiation medium”, which was previously described to give rise to more mature hepatocyte gene expression (Hu et al. 2018), however we see no significant differences in the percentages of binucleated cells per organoid. We have now included this data, as well as our analyses of the effect of insulin in the growth medium (see our response to point 12 below) in the results section on page 11 lines 341-353 and we further discuss this point in the Discussion, pages 12-13, lines 389-397.

      Minor comments

      The results of Table 1 are based on very few fixed cells (3 to 6). The authors should consider increasing the number of regressing cells.

      We are aware that the number of endomitotic regressions is very low. Unfortunately, it is extremely challenging to catch these events by IF: cells in Hep-Orgs cycle very slowly (they divide once every ±50 hours), and thus very few cells are in M phase at any given moment (only 1 or 2 cells per organoid) – the chance that this cell is then also in telophase is even lower, and then only ± 5% of the telophase are actually undergoing endomitosis. Due to technical limitations of the organoid IF staining protocol, it is not trivial to scale up these experiments, making it very difficult to find more than 3-5 endomitotic regressions per condition. Despite the low numbers of endomitotic regressions that we have identified, we find that RacGAP1, Anillin and CIT-K localize in a very similar manner in cells undergoing endomitosis. For SEPT9, we see a little bit more variation in the localizations, but also here the majority of cells in undergoing endomitosis have lost SEPT9 membrane association on the regressed membrane (see Fig. 4C).

      Is WNT signaling modified by E2F7/8 mutations? To conclude that "WNT signaling inhibits binucleation in an E2F7/8-dependent manner", the authors should check that E2F7/8 KO does not impair the increase of WNT signaling upon CHIR99021 removal.

      We had not thought of this option, but it is indeed possible that E2F7/8 influences the ability of cells to respond to CHIR99021 removal. WNT regulators are not known to be targets of E2F7 or E2F8 in mice (see PMIDs: 22180533, 18194653, and 23064264), however as we have not analyzed the gene expression changes in E2F7 or E2F8 mutant organoids, we cannot exclude the possibility that CHIR99021 has different effects in E2F7/E2F8 knock-out cells. We now discuss this possibility in the discussion, page 15, lines 459-460.

      Please provide movies of the cells presented in Fig. 2A.

      We have included movies of these cells, see Supplemental Movie 1 and Supplemental Movie 2.

      1. Removal of CHIR99021 induces major shape changes and lumen formation (rather than "exhibited some morphological changes" as stated). Could the author speculate on this?

      WNT signaling is likely important for many aspects of hepatocyte growth and differentiation, and it is possible that upon CHIR99021 removal, Hep-Orgs are starting to differentiate and become more secretory, which would explain why they start forming larger lumens. We now discuss this in more detail in the final part of the results section, see page 11 lines 341-353, and in the discussion, page 15 lines 462-472.

      1. Fig. 4: Why do the authors use the cell line-1 that has the lowest level of binucleation in this experiment? Would the results be the same in cell line 2? (optional)

      We perform most experiments in Hep-Org line 1 because this line is easier to maintain in culture, and we have been unable to generate CRISPR knock-outs in Hep-Org line 2.

      1. Would insulin increase the proportion of binucleated cells, as in rodents? (optional)

      We have tested this, but do not see a difference in the percentage of binucleated cells when we either increase or decrease the concentration of insulin in the growth medium. We have now added this data in the results section, see page 11, lines 347-350 and Fig. 5J.

      Reviewer #3 (Significance (Required)):

      Strengths and limitations:

      The manuscript is well written, easy to follow, and the quality of the data is overall high. A clear strength of this study is the use of state-of-the-art human hepatocyte organoids and genome editing (to generate reporter lines and to KO E2F7/8). This allows the authors to address the mechanism of binucleation in a human context. Interestingly, it revealed both similarities (e.g. E2F7/8 depends for binucleation) and striking mechanistic differences (e.g. post-furrowing regression) between rodent and human systems. The study is rather descriptive -which is fine- but deeper mechanistic insights would strengthen the conclusions of the manuscript. For instance, "our results identify how human hepatocytes inhibit cell division in endomitosis" appears as an overstatement since the molecular reason of midbody anchorage defects remains elusive.

      We thank the reviewer for their kind words. Unfortunately, we have been unable to gain deeper mechanistic insights into the molecular mechanism of membrane regression in endomitosis. We have therefore toned down our conclusions, see the new concluding sentence in the abstract, page 2, lines 35-36.

    1. In a recent AMS webinar on CRT, Dr. Valaida Wise makes the important distinction that as a theoretical framework, CRT helps us think about and critically analyze systems; as such, it can help teachers think about the right questions to ask regarding potential or actual inequities that may be present in our classrooms. Our concern at the classroom level, therefore, is not the theoretical work of CRT, but that we have a culturally responsive pedagogy (CRP) in place. We will come back to this.

      In a webinar, Dr. Valaida Wise points out that CRT helps teachers analyze systems and spot potential issues. The main focus should be on using culturally responsive teaching methods in classrooms.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study, the discovery and subsequent design of the AF03-NL chimeric antibody yielded a tool for studying filoviruses and provides a possible blueprint for future therapeutics. However, the data are incomplete and not presented clearly, which obscures flaws in the analyses and leaves unexplained phenomena. The work will be of interest to virologists studying antibodies.

      Author response: Thank for your very valuable comments. The ms has been revised substantially and some new data have been added to further support the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      Zhang et al. conducted a study in which they isolated and characterized a Marburg virus (MARV) glycoprotein-specific antibody, AF-03. The antibody was obtained from a phage-display library. The study shows that AF-03 competes with the previously characterized MARV-neutralizing antibody MR78, which binds to the virus's receptor binding site. The authors also performed GP mutagenesis experiments to confirm that AF-03 binds near the receptor binding site. In addition, the study confirmed that AF-03, like MR78, can neutralize Ebola viruses with cleaved glycoproteins. Finally, the authors demonstrated that NPC2-fused AF-03 was effective in neutralizing several filovirus species.

      Weaknesses:

      (1) The main premise of this study is unclear. Flyak et al. in 2015 described the isolation and characterization of a large panel of neutralizing antibodies from a Marburg survivor (Flyak et al., Cell, 2015). Based on biochemical and structural characterization, Flyak proposed that the Marburg neutralizing antibodies bind to the NPC1 receptor binding side. In the same study, it has been shown that several MARV-neutralizing antibodies can bind to cleaved Ebola glycoproteins that were enzymatically treated to remove the mucin-like domain and glycan cap. In the following study, it has been shown that the bispecific-antibody strategy can be used to deliver Marburg-specific antibodies into the endosome, where they can neutralize Ebola viruses (Wec et al., Science 2016). Finally, the use of lysosome-resident protein NPC2 to deliver antibody cargos to late endosomes has been previously described (Wirchnianski et al., Front. Immunol, 2021). The above-mentioned studies are not referenced in the introduction. The authors state that "there is no licensed treatment or vaccine for Marburg [virus] infection." While this is true, there are human antibodies that recognize neutralizing epitopes - that information can't be excluded while providing the rationale for the study. Furthermore, the authors use the word "novel" to describe the AF-03 antibody. How novel is AF-03 if multiple Marburg-neutralizing antibodies were previously characterized in multiple studies? Since AF-03 competes with previously characterized MR78, it binds to the same antigenic region as MR78. AF-03 also has comparable neutralization potency as MR78.

      Author response: Thank for your valuable advice. In terms of the novelty of AF-03, the inhibition assay indicates that Q128/N129/C226 functions as key amino acids responsible for AF-03 neutralization given that the neutralizing capacity of AF-03 to pesudotyped virus harboring these mutants is impaired (see revised Fig. 2A left panel). Furthermore, ELISA assays show that mutation of Q128S-N129S or C226Y significantly disrupts the binding of GP to AF-03, while the neutralizing and binding capacity of MR78 to mutant GP and pseudovirus harboring C226Y instead of Q128S-N129S is not almost affected (see revised Fig. 2A right panel and 2B). Considering the fact that AF-03 and MR78 could compete with each other to bind to MARV GP (Fig. 2D). we thus make a conclusion that the epitopes of these two mAbs overlapped partially. Therefore, AF-03 is not a clone of MR78 and is a novel neutralizing mAb to MARV.

      The work from Wirchnianski and colleagues has been referenced actually in the ms (see Ref. 38). Although our strategy for the design of broad-spectrum neutralizing antibody refers to their work, we further expand the species being evaluated including RAVN and mutated EBOV strains. The results show that NPC2-fused AF-03 exhibits neutralizing activity to 10 filovirus species and 17 EBOV mutants (Fig. 6A and B). The work by Flyak et al. in 2015 that described the isolation and characterization of a large panel of neutralizing antibodies from a Marburg survivor has been cited in Introduction section accordingly.

      (2) Without the AF-03-MARV GP crystal structure, it's unclear how van der Waals interactions, H-bonds, and polar and electrostatic interactions can be evaluated. While authors use computer-guided homology modeling, this technique can't be used to determine critical interactions. Furthermore, Flyak et al. reported that binding to the NPC1 receptor binding site is the main mechanism of Marburg virus neutralization by human monoclonal antibodies. Since both AF-03 (this study) and MR78 (Flyak study) competed with each other, that information alone was sufficient for GP mutagenesis experiments that identified the NPC1 receptor binding site as the main region for mutagenesis.

      Author response: Computer-guided homology modeling has been exploited successfully in our lab to determine key residues responsible for the interaction between antigen and mAbs (Immunol Res. 2015, 62:377; Scand J Immunol. 2019, 90:e12777; Sci Rep. 2022, 12:8469; Front Immunol. 2022, 13:831536). We refer to the crystal structure of MARV GP and the complex of MR78 and GP reported previously (Cell 2015, 160:904) and then model the complex of MARV GP and AF-03. Although AF-03 and MR78 compete with each other, we show that the epitopes of these two mAbs just overlap partially (Fig. 2A-D).

      (3) The AF-03-GP affinity measurements were performed using bivalent IgG molecules and trimeric GP molecules. This format does not allow accurate measurements of affinity due to the avidity effect. The reported KD value is abnormally low due to avidity effects. The authors need to repeat the affinity experiments by immobilizing trimeric GPs and then adding monovalent AF-03 Fab.

      Author response: As shown in Fig. 1A, GP protein used in this work is not trimer but largely monomer composed of MLD-deleted GP1 and GP2, which may at a certain extent weaken the engagement between GP and AF-03. It is noteworthy that we re-done the SPR assays for the binding of AF-03 to GP and show that KD value is 4.71x10-11M (see revised Fig. 1C). This GP protein is thus available to the evaluation of mAb affinity. In addition, it is reasonable to utilize bivalent IgG to detect the affinity of mAb to monomeric GP since the affinity likely decreases significantly when monovalent Fab is used.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe the discovery of a filovirus neutralizing antibody, AF03, by phage display, and its subsequent improvements to include NPC2 that resulted in a greater breadth of neutralization. Overall, the manuscript would benefit from considerable grammatical review, which would improve the communication of each point to the reader. The authors do not convincingly map the AF03 epitope, nor do they provide any strong support for their assumption that AF03 targets the NPC1 binding site. However, the authors do show that AF03 competes for MR78 binding to its epitope, and provides good support for the internalization of AF03-NL as the mechanism for improved breadth over the original AF03 antibody.

      Strengths:

      This study shows convincing binding to Marburgvirus GP and neutralization of Marburg viruses by AF03, as well as convincing neutralization of Ebolaviruses by AF03-NL. While there are no distinct populations of PE-stained cells shown by FACS in Figure 5A, the cell staining data in Figure 5C are compelling to a non-expert in endosomal staining like me. The control experiments in Figure 7 are compelling showing neutralization by AF03-NL but not AF03 or NPC2 alone or in combination. Altogether these data support the internalisation and stabilisation mechanism that is proposed for the gain in neutralization breadth observed for Ebolaviruses by AF03-NL over AF03 alone.

      Weaknesses:

      Overall, this reviewer is of the opinion that this paper is constructed haphazardly. For instance, the neutralization of mutant pseudoviruses is shown in Figure 2 before the concept of pseudovirus neutralization by AF03 is introduced in Figure 3. Similarly, the control experiments for AF03+NPC2 are described in Figure 7 after the data for breadth of neutralization are shown in Figure 6. GP quality controls are shown in Figure 2 after GP ELISAs / BLI experiments are done in Figure 1. This is disorienting for the reader.

      Author response: AF-03 production and its binding capacity to GP is determined in Fig. 1. The epitopes of AF-03 is identified in Fig. 2. The neutralizing activity of AF-03 to pseudotyped MARV in vitro and in vivo is detected in Fig. 3. The neutralizing activity of AF-03 to pseudotyped ebolavirus harboring cleaved GP is detected in Fig. 4. The endosome-delivering ability of AF03-NL is examined in Fig. 5. The neutralization of filovirus species and EBOV mutants by AF03-NL is detected in Fig. 6. The requirement of CI-MPR for neutralization activity of AF03-NL is determined in Fig. 7. We think that this arrangement is suitable.

      Figure 1: The visualisation of AF03 modelling and docking endeavours is extremely difficult to interpret. Firstly, there is no effort to orient the non-specialist reader with respect to the Marburgvirus GP model. Secondly, from the figures presented it is impossible to tell if the Fv docks perfectly onto the GP surface, or if there are violent clashes between the deeply penetrating AF03 CDRs and GP. This information would be better presented on a white background, perhaps showing GP in surface view from multiple angles and slices. The authors attempt to label potential interactions, but these are impossible to read, and labels should be added separately to appropriately oriented zoomed-in views.

      Author response: To be readily understood the rationale of computer-guided modeling, the descriptions in the Methods and Results section have been refined accordingly. In addition, the information of the theoretical structure was presented on white background (see revised Fig. 1D-F).

      Figure 2: The neutralization of mutant pseudoviruses cannot be properly assessed using bar graphs. These data should be plotted as neutralization curves as they were done for the wild-type neutralization data in Figure 3. The authors conclude that Q128 & N129 are contact residues, but the neutralization data for this mutant appear odd as the lowest two concentrations of AF03 show higher neutralization than the second highest AF03 concentration. Neutralization of T204/Q205/T206 (green), Y218 (orange), K222 (blue), or C226 (purple) appears to be better than neutralization of the wild-type MARV. The authors do not discuss this oddity. What are the IC50's? The omission of antibody concentrations on the x-axis and missing IC50 values give a sense of obscuring the data, and the manuscript would benefit from greater transparency, and be much easier to interpret if these were included. I am intrigued that the Q128S/N129S mutant is reported as having little effect on the neutralization of MR78. The bar graph appears to show some effect (difficult to interpret without neutralization curves and IC50 data), and indeed PDB:5UQY seems to suggest that these amino acids form a central component of the MR78 epitope (Q128 forms potential hydrogen bonds with CDRH1 Y35 and CDRL3 Y91, while N129 packs against the MR78 CDRH3 and potentially makes additional polar contact with the backbone). Lastly, since neutralization was tested in both HEK293T cells and Huh7 cells in Figure 3, the authors should clarify which cells were used for neutralization in Figure 2.

      Author response: Thank for your advice. Accordingly, in the revised ms, the neutralization curve of AF-03 and MR78 is presented in revised Fig. 2A. The neutralization of AF-03 to pseudotyped MARV harboring Q128S/N129S or C226Y is impaired significantly compared with WT MARV and those bearing other indicated mutations, while Q128S/N129S instead of C226Y mutation affect the neutralizing capacity of MR78 at a certain extent. This is consistent with the data on the binding of AF-03 or MR78 to MARV GP protein assayed by ELISA (see revised Fig. 2B). Overall, these results show that Q128/N129/C226 functions as key amino acids responsible for AF-03 neutralization.

      Figure 3: The first two images in Figure 3C showing bioluminescent intensity from pseudovirus-injected mice pretreated with either 10mg/kg or 3mg/kg AF03 are identical images. This is apparent from the location, shape, and intensity of the bioluminescence, as well as the identical foot placement of each mouse in these two panels. Currently, this figure is incomplete and should be corrected to show the different mice treated with either 10mg/kg or 3mg/kg of AF03.

      Author response: Thank for your carefulness. Indeed, it is our mistake. In the revised ms, this fault has been corrected. The correct images have been added (see revised Fig. 3C).

      Figure 4 would benefit from a control experiment without antibodies comparing infection with GP-cleaved and GP-uncleaved pseudoviruses. The paragraph describing these data was also difficult to read and would benefit from additional grammatical review.

      Author response: Accordingly, a control experiment comparing the infection of GP-cleaved with GP-uncleaved pseudoviruses is performed. The results show that The infection of pseudotyped ebolavirus harboring cleaved GP to host cells is comparable or stronger than those containing intact GP(see revised Fig. s1). Therefore, the data in Fig. 4 support the inhibition of cell entry of ebolavirus species harboring cleaved GP by AF-03, which is not attributed to the possible impairment of cell entry capacity of GPcl-containing ebolavirus. In addition, the sentences have been modified to be read smoothly.

      Figure 5: The authors should clarify in the methods section that the "mock" experiment included the PE anti-human IgG Fc antibody. Without this clarification, the lack of a distinct negative population in the FACS data could be interpreted as non-specific staining with PE. If the PE antibody was added at an equivalent concentration to all panels, what does the directionality of the arrowheads in Figure 5A (labelled PE) and 5B (labelled pHrodo Red) indicate?

      Author response: Thank for your advice. In the revised version, we denote that the mock is actually a human IgG isotype in the figure legend. The arrowheads denote the fluorescence intensity of PE or pHrodo on the lateral axis of the plots. Of course, herein the percentage of PE or pHrodo-positive cells is shown.

      Figure 6B: These data would benefit from the inclusion of IC50, transparency of antibody concentrations used, and consistency in the direction of antibody concentrations (increasing to the right or left of the x-axis) when compared to Figure 2.

      Author response: The concentration of antibody titrated is shown in figure legends. The direction of antibody concentrations is unified throughout the paper. Although IC50 is not included, these data clearly show that AF03-NL rather than AF-03 prominently inhibits the cell entry of EBOV mutants.

      Reviewer #1 (Recommendations For The Authors):

      Line 143: anti-human should be anti-human.

      Line 223: From the SDS-PAGE results, it's not clear that the AF-03 was expressed in the eukaryotic cell line. Please, rephrase the sentence.

      Line 263: ELISA experiments can't be used to determine affinity.

      Line 394: Flyak et al. generated human antibodies from PBMC samples of Marburg survivors, not plasma samples.

      Author response: According to reviewer's advice, the sentences have been modified or corrected to more accurately describe the results. As well, the grammatic errors in the ms have been corrected carefully.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines psychophysics, fMRI, and TMS to reveal a causal role of FEF in generating an attention-induced ocular dominance shift, with potential relevance for clinical applications. The evidence supporting the claims of the authors is solid, but the theoretical and mechanistic interpretation of results and experimental approaches need to be strengthened. The work will be of broad interest to perceptual and cognitive neuroscience.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Based on a "dichoptic-background-movie" paradigm that modulates ocular dominance, the present study combines fMRI and TMS to examine the role of the frontoparietal attentional network in ocular dominance shifts. The authors claimed a causal role of FEF in generating the attention-induced ocular dominance shift.

      Strengths:

      A combination of fMRI, TMS, and "dichoptic-background-movie" paradigm techniques is used to reveal the causal role of the frontoparietal attentional network in ocular dominance shifts. The conclusions of this paper are mostly well supported by data.

      Weaknesses:

      (1) The relationship between eye dominance, eye-based attention shift, and cortical functions remains unclear and merits further delineation. The rationale of the experimental design related to the hemispheric asymmetry in the FEF and other regions should be clarified.

      Thanks for the reviewer’s comments! We have further clarified the relationship between eye dominance shift, eye-based attention, and cortical functions in the Introduction and Discussion. In the Introduction, we introduce the modulating effects of eye-based attention on eye dominance. On one hand, eye-based attention can enhance eye dominance of the attended eye in real time (see page 3 first paragraph or below):

      ”For instance, presenting top-down attentional cues to one eye can intensify the competition strength of input signals in the attended eye during binocular rivalry (Choe & Kim, 2022; Zhang et al., 2012) and shift the eye balance towards the attended eye (Wong et al., 2021).”

      On the other hand, prolonged eye-based attention can induce a shift of eye dominance to the unattended eye (see page 3 second paragraph or below):

      “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).”

      Moreover, we discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or below, which also respond to this reviewer’s comment of Weakness #2):

      “Then how does FEF regulate the attention-induced ocular dominance shift? Our previous work has found that the aftereffect (for simplicity, hereafter we use aftereffect to denote the attention-induced ocular dominance shift) can be produced only when the adapting stimuli involve adequate interocular competition, and is measurable only when the testing stimuli are not binocularly fused (Song et al., 2023). Given the indispensability of interocular competition, we explained those findings in the framework of the ocular-opponency-neuron model of binocular rivalry (Said & Heeger, 2013). The model suggests that there are some opponency neurons which receive excitatory inputs from monocular neurons for one eye and inhibitory inputs from monocular neurons for the other eye (e.g. AE-UAE opponency neurons receive excitatory inputs from the attended eye (AE) and inhibitory inputs from the unattended eye (UAE)). Then a difference signal is computed so that the opponency neurons fire if the excitatory inputs surpass the inhibitory inputs. Upon activation, the opponency neurons will in turn suppress the monocular neurons which send inhibitory signals to them.

      Based on this model, we proposed an ocular-opponency-neuron adaptation account to explain the aftereffect, and pointed out that the attentional system likely modulated the AE-UAE ocular opponency neurons (Song et al., 2023). So why would FEF modulate the AE-UAE opponency neurons? The reason may be two fold. Firstly, understanding the logic during the dichoptic-backward-movie viewing may require filtering out the distracting information (from the unattended eye) and sustaining attention (to the attended eye), which is exactly the role of FEF (Esterman et al., 2015; Lega et al., 2019).

      Secondly, due to the special characteristics of binocular vision system, filtering the distracting input from the unattended eye may have to rely on the interocular suppression mechanism. According to the ocular-opponency-neuron model, this is achieved by the firing of the AE-UAE opponency neurons that send inhibitory signals to the UAE monocular neurons.

      As mentioned previously, the firing of the AE-UAE opponency neurons requires stronger activity for the AE monocular neurons than for the UAE monocular neurons. This is confirmed by the results shown in Figure 8 of Song et al. (2023) that monocular response for the attended eye during the entire adaptation phase was slightly stronger than that for the unattended eye. Accordingly, during adaptation the AE-UAE opponency neurons were able to activate for a longer period thus adapted to a larger extent than the UAE-AE opponency neurons. This would cause the monocular neurons for the unattended eye to receive less inhibition from the AE-UAE opponency neurons in the post-test as compared with the pre-test, leading to a shift of ocular dominance towards the unattended eye. In this vein, the magnitude of this aftereffect should be proportional to the extent of adaptation of the AE-UAE relative to UAE-AE opponency neurons. Attentional enhancement on the AE-UAE opponency neurons is believed to strengthen this aftereffect, as it has been found that attention can enhance adaptation (Dong et al., 2016; Rezec et al., 2004). Inhibition of FEF likely led such attentional modulation to be much less effective. Consequently, the AE-UAE opponency neurons might not have the chance to adapt to a sufficiently larger extent than the UAE-AE opponency neurons, leading to a statistically non-detectable aftereffect in Experiment 2. Therefore, the results of Experiments 2-4 in the present study suggest that within the context of the ocular-opponency-neuron adaptation account, FEF might be the core area to fulfill the attentional modulations on the AE-UAE opponency neurons.”

      We used the experimental design with hemispheric asymmetry in the FEF and other regions for two reasons. First, many studies have shown that the dorsal attentional network has a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010). This was also indicated by the results of Experiment 1 (Figure 3). Second, we found that a recent research applying TMS to FEF and IPS stimulated only the right hemisphere (Gallotto et al., 2022). Therefore, we selected the right FEF and right IPS as the target regions for cTBS. In the Methods section of Experiment 2, we have elucidated the reasons for the selection of cTBS target regions (see page 35, first paragraph or below):

      “Given that the dorsal attentional network primarily consists of the FEF and the IPS (Corbetta & Shulman, 2002; Mayrhofer et al., 2019), with a functional right-hemisphere dominance (Duecker et al., 2013; Mayrhofer et al., 2019; Sack, 2010), we selected the right FEF and right IPS from the four clusters identified in Experiment 1 as the target regions for cTBS (Gallotto et al., 2022).”

      (2) Theoretically, how the eye-related functions in this area could be achieved, and how it interacts with the ocular representation in V1 warrant further clarification.

      Thanks for the reviewer’s comment! In the revised manuscript, we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph or the quoted paragraphs under this reviewer’s first Public comment).

      Reviewer #2 (Public Review):

      Summary

      Song et al investigate the role of the frontal eye field (FEF) and the intraparietal sulcus (IPS) in mediating the shift in ocular dominance (OD) observed after a period of dichoptic stimulation during which attention is selectively directed to one eye. This manipulation has been previously found to transiently shift OD in favor of the unattended eye, similar to the effect of short-term monocular deprivation. To this aim, the authors combine psychophysics, fMRI, and transcranial magnetic stimulation (TMS). In the first experiment, the authors determine the regions of interest (ROIs) based on the responses recorded by fMRI during either dichoptic or binocular stimulation, showing selective recruitment of the right FEF and IPS during the dichoptic condition, in line with the involvement of eye-based attention. In a second experiment, the authors investigate the causal role of these two ROIs in mediating the OD shift observed after a period of dichoptic stimulation by selectively inhibiting with TMS (using continuous theta burst stimulation, cTBS), before the adaptation period (50 min exposure to dichoptic stimulation). They show that, when cTBS is delivered on the FEF, but not the IPS or the vertex, the shift in OD induced by dichoptic stimulation is reduced, indicating a causal involvement of the FEF in mediating this form of short-term plasticity. A third control experiment rules out the possibility that TMS interferes with the OD task (binocular rivalry), rather than with the plasticity mechanisms. From this evidence, the authors conclude that the FEF is one of the areas mediating the OD shift induced by eye-selective attention.

      Strengths

      (1) The experimental paradigm is sound and the authors have thoroughly investigated the neural correlates of an interesting form of short-term visual plasticity combining different techniques in an intelligent way.

      (2) The results are solid and the appropriate controls have been performed to exclude potential confounds.

      (3) The results are very interesting, providing new evidence both about the neural correlates of eye-based attention and the involvement of extra-striate areas in mediating short-term OD plasticity in humans, with potential relevance for clinical applications (especially in the field of amblyopia).

      Weaknesses

      (1) Ethics: more details about the ethics need to be included in the manuscript. It is only mentioned for experiment 1 that participants "provided informed consent in accordance with the Declaration of Helsinki. This study was approved by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences". (Which version of the Declaration of Helsinki? The latest version requires the pre-registration of the study. The code of the approved protocol together with the code and date of the approval should be provided.) There is no mention of informed consent procedures or ethics approval for the TMS experiments. This is a huge concern, especially for brain stimulation experiments!

      Response: Thanks for the reviewer’s comment! In the revised manuscript, we have provided the code of the approved protocol and date of the approval (see page 25 second paragraph or below):

      “This study was approved (H21058, 11/01/2021) by the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences.”

      Indeed, ethics approval and informed consent were obtained for each experiment. To avoid duplication in the text, we only presented the ethics instructions in the Methods section of Experiment 1. We have now clarified in that section that all the experiments in this study were approved by the IRB in our Institute.

      (2) Statistics: the methods section should include a sub-section describing in detail all the statistical analyses performed for the study. Moreover, in the results section, statistical details should be added to support the fMRI results. In the current version of the manuscript, the claims are not supported by statistical evidence.

      Response: Thanks for the reviewer’s suggestion! In the Methods section of revised manuscript, we have added a section to describe the detailed statistical analyses for each experiment (see page 37 last paragraph for Experiment 2 and page 38 last paragraph for Experiment 3 or below):

      “Statistical analyses were performed using MATLAB. A 3 (stimulation site: Vertex, FEF, IPS) × 2 (test phase: pre-test and post-test) repeated measures ANOVA was used to investigate the effect of cTBS delivery on ocular dominance shift. Moreover, for the blob detection test, the target detection rate of each experimental condition was calculated by dividing the summed number of detected blob targets by the total number of blob targets. Then, a 2 (eye: attended eye, unattended eye) × 3 (stimulation site: Vertex, FEF, IPS) repeated measures ANOVA on the detection performance was performed. Post-hoc tests were conducted using paired t-tests (2-tailed significance level at α = 0.05), and the resulting p-values were corrected for multiple comparisons using the false discovery rate (FDR) method (Benjamini & Hochberg, 1995).”

      “In addition to the data analysis in Experiment 2, we complemented the standard inferential approach with the Bayes factor (van den Bergh et al., 2023; van Doorn et al., 2021; Wagenmakers et al., 2018), which allows quantifying the relative evidence that the data provide for the alternative (H1) or null hypothesis (H0). We conducted the Bayesian repeated measures ANOVA using JASP with default priors and computed inclusion Bayes factors (BFincl) which suggest the evidence for the inclusion of a particular effect calculated across matched models. A BF greater than 1 provides support for the alternative hypothesis. Specifically, a BF between 1 and 3 indicates weak evidence, a BF between 3 and 10 indicates moderate evidence, and a BF greater than 10 indicates strong evidence (van Doorn et al., 2021). In contrast, a BF below 1 provides evidence in favor of the null hypothesis.”

      Furthermore, in the Results section of revised manuscript, we have added the statistical details to support the fMRI results (see page 9 last paragraph or below):

      “To seek these brain regions, we used the AFNI program “3dttest++” to access the difference of ‘dichoptic-binocular’ contrast between the experimental and control runs. The AFNI program “ClustSim” was then applied for multiple comparison correction, yielding a minimum significant cluster size of 21 voxels (voxel wise p = .001; cluster threshold α = 0.05). We found 4 clusters showing stronger responses to the dichoptic movies than to the binocular movies especially in the experimental runs.”

      (3) Interpretation of the results: the TMS results are very interesting and convincing regarding the involvement of the FEF in the build-up of the OD shift induced by dichoptic stimulation, however, I am not sure that the authors can claim that this effect is related to eye-based attention, as cTBS has no effect on the blob detection task during dichoptic stimulation. If the FEF were causally involved in eye-based attention, one would expect a change in performance in this task during dichoptic stimulation, perhaps a similar performance for the unattended and attended eye. The authors speculate that the sound could have an additional role in driving eye-based attention, which might explain the lack of effect for the blob discrimination task, however, this hypothesis has not been tested.

      Response: Thanks for the reviewer’s comment! Following this reviewer’s insightful suggestion, we have conducted a new experiment to examine the effect of sound on blob detection task (see Experiment 4 in the revised manuscript). The procedure was similar to that of Experiment 2 except that the sound was no longer presented during the dichoptic-backward-movie adaptation. The results showed that the interocular difference of blob detection rate after sound elimination remained unaffected by the cTBS, which disagreed with our explanation in the previous version of manuscript. Based on the new data, we now question the validity to use the blob detection rate to precisely quantify eye-based attention, and have tried to explain why the blob detection results do not contradict with our account for the function role of FEF in modulating the aftereffect in the Discussion of the revised manuscript (see page 23 second paragraph to page 24 first paragraph or below):

      “An unresolved issue is why inhibiting the cortical function of FEF did not impair the performance of blob detection task. One potential explanation is that the synchronized audio in Experiment 2 might help increase the length of time that the regular movie dominated awareness. However, the results of Experiment 4 did not support this explanation, in which the performance of blob detection survived from the inhibition of FEF even when silent movies were presented. Although this issue remains to be explored in future work, it does not contradict with our notion of FEF modulating AE-UAE opponency neurons. It should be noted that our notion merely states that FEF is the core area for attentional modulations on activities of AE-UAE opponency neurons. No other role of FEF during the adaptation is assumed here (e.g. boosting monocular responses or increasing conscious level of stimuli in the attended eye). In contrast, according to the most original definition, the blob detection performance serves as an estimation of visibility (or consciousness level) of the stimuli input from each eye, despite the initial goal of adopting this task is to precisely quantify eye-based attention (which might be impractical). Thus, according to our notion, inhibition of FEF does not necessarily lead to deteriorate performance of blob detection. Furthermore, our findings consistently indicated that the visibility of stimuli in the attended eye was markedly superior to that of stimuli in the unattended eye, yet the discrepancy in the SSVEP monocular responses between the two eyes was minimal though it had reached statistical significance (Song et al., 2023). Therefore, blob detection performance in our work may only faithfully reflect the conscious level in each monocular pathway, but it is probably not an appropriate index tightly associated with the attentional modulations on monocular responses in early visual areas. Indeed, previous work has argued that attention but not awareness modulates neural activities in V1 during interocular competition (Watanabe et al., 2011), but see (Yuval-Greenberg & Heeger, 2013). We have noticed and discussed the counterintuitive results of blob detection performance in our previous work (Song et al., 2023). Here, with the new counterintuitive finding that inhibition of FEF did not impair the performance of blob detection, we suspect that blob detection performance in the “dichoptic-backward-movie” adaptation paradigm may not be an ideal index that can be used to accurately quantify eye-based attention.

      (4) Writing: in general, the manuscript is well written, but clarity should be improved in certain sections.

      (a) fMRI results: the first sentence is difficult to understand at first read, but it is crucial to understand the results, please reformulate and clarify.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have reformulated this sentence (see page 9 last paragraph or below):

      “It was only in the dichoptic condition of experimental runs that participants had to selectively pay more attention to one eye (i.e., eye-based attention). Therefore, we speculate that if certain brain regions exhibit greater activities in the dichoptic condition as compared to the binocular condition in the experimental runs but not in the control runs, the activation of these brain regions could be attributable to eye-based attention.”

      (b) Experiment 3: the rationale for experiment one should be straightforward, without a long premise explaining why it would not be necessary.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have streamlined the lengthy premise explaining to make the rationale of Experiment 3 more straightforward (see page 15 last two paragraphs or below):

      “The results of Experiment 2 support the notion that eye-based attention was the cause for attention-induced ocular dominance plasticity. However, an alternative account is that the significant two-way interaction between test phase and stimulation site did not stem from any persistent malfunction of FEF in modulating ocular dominance, but rather it was due to some abnormality of binocular rivalry measures in the post-test that occurred after stimulation at the FEF only (and not at the other two brain sites). For instance, stimulation at the FEF might simply reduce the ODI measured in the binocular rivalry post-test.

      Therefore, we conducted Experiment 3 to examine how suppression of the three target sites would impact binocular rivalry performance, in case that any unknown confounding factors, which were unrelated to adaptation but related to binocular rivalry measures, contributed to the results.”

      (c) Discussion: the language is a bit familiar here and there, a more straightforward style should be preferred (one example: p.19 second paragraph).

      Response: Thanks for the reviewer’s suggestion! We have carefully revised the language in the discussion. The discussion following the example paragraph has been largely rewritten.

      (5) Minor: the authors might consider using the term "participant" or "observer" instead of "subject" when referring to the volunteers who participated in the study.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have replaced the term “subject” with “participant”.

      Reviewer #3 (Public Review):

      Summary:

      This study studied the neural mechanisms underlying the shift of ocular dominance induced by "dichoptic-backward-movie" adaptation. The study is self-consistent.

      Strengths:

      The experimental design is solid and progressive (relationship among three studies), and all of the raised research questions were well answered.

      The logic behind the neural mechanisms is solid.

      The findings regarding the cTMS (especially the position/site can be useful for future medical implications).

      Weaknesses:

      Why does the "dichoptic-backward-movie" adaptation matter? This part is severely missing. This kind of adaptation is neither intuitive like the classical (Gbison) visual adaptation, nor practical as adaptation as a research paradigm as well as the fundamental neural mechanism. If this part is not clearly stated and discussed, this study is just self-consistent in terms of its own research question. There are tons of "cool" phenomena in which the neural mechanisms are apparent as "FEF controls vision-attention" but never tested using TMS & fMRI, but we all know that this kind of research is just of incremental implications.

      Response: Thanks for the reviewer’s comment! We designed the "dichoptic-backward-movie" adaptation to study the perceptual consequence and mechanisms of sustained attention to a monocular pathway. Since the overall visual input to both eyes during adaptation were identical, any effect (i.e. the change of ocular dominance in our study) after adaptation can be easily ascribed to unbalanced eye-based attention between the two eyes rather than unbalanced input energy across the eyes. In typical short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is undoubtedly distributed to the non-deprived eye. The fact that in a short-term monocular deprivation paradigm the deprived eye is also the unattended eye prevents researchers from ascertaining whether unbalanced eye-based attentional allocation contributes to the shift of ocular dominance just like unbalanced visual input across the two eyes. That is why the “dichoptic-backward-movie” adaptation was adopted in the present study. This new paradigm balances the input energy across the eyes but leaves attention unbalanced across the eyes. In the revised manuscript, we have added the description of the “dichoptic-backward-movie” adaptation (see page 3 last paragraph and page 4 first paragraph or below). Hope this complementary information improves the clarity.

      “In Song et al. (2023)’s “dichoptic-backward-movie” adaptation paradigm (see Figure 1B), participants are presented with regular movie images in one eye (i.e., attended eye) while the other eye (i.e., unattended eye) received the backward movie images of the same episode. They were also instructed to try their best to follow the logic of the regular movie and ignore the superimposed backward movie. Therefore, the goal-directed eye-based attention was predominantly focused on the attended eye. Song et al. (2023) found that the predominance of the unattended eye in binocular rivalry increased after one hour of adaptation to the “dichoptic-backward-movie”, indicating a shift of perceptual ocular dominance towards the unattended eye. Since the overall energy of visual input from the two eyes was balanced throughout the adaptation period, the change of ocular dominance after adaptation is thought to result from unbalanced eye-based attention rather than unbalanced input energy as in typical short-term monocular deprivation (Bai et al., 2017; Lunghi et al., 2011; Zhou et al., 2014).” In short-term monocular deprivation, input signal from one eye is blocked. Accordingly, attention is biased towards the non-deprived eye. However, it is difficult to tease apart the potential contribution of unbalanced eye-based attention from the consequence of the unbalanced input energy, as the deprived eye is also the unattended eye. Therefore, the advantage of the “dichoptic-backward-movie” adaptation paradigm is to balance the input energy across the eyes but leave attention unbalanced across the eyes.

      Our previous work (Song et al., 2023) has shown that eye-based attention plays a role in the formation of ocular dominance shift following adaptation to dichoptic backward movie. However, because the “dichoptic-backward-movie” adaptation paradigm is new, to our knowledge, no literature has ever discovered the brain areas that are responsible for eye-based attention. Our fMRI experiment for the first time resolves this issue, which, we believe, is one of the novelties of the present study. Attention is a pretty general definition of our ability to select limited information for preferential or privileged processing, yet it includes numerous aspects (e.g. spatial attention for spatial locations, feature-based attention for visual features, object-based attention for objects, social attention for social cues, and eye-based attention for monocular pathways etc). Are we 100% sure that the same brain network always underlies every aspect of attention including eye-based attention? No test, no answer. Maybe the answer is Yes, but we are not aware of any evidence for that from literature. It is not unlikely that attention is like an elephant while researchers are like blind people touching the elephant from different angles. Even if all previous researchers have touched the side of the elephant and state that an elephant is no different from a wall, as long as one researcher grabs the elephant’s tail, the “wall” knowledge will be falsified. From this perspective of the essence of science (falsifiable), we have the confidence to say that our fMRI experiment on eye-based attention is novel, because to our knowledge our experiment is the first one to explore the issue. On the basis of the fMRI experiment (otherwise we would have no idea on which precise brain site to apply the cTBS), we could successfully complete the subsequent TMS experiments.

      Of course, if the reviewer can kindly point out any previous neuroimaging work we missed that has already disclosed the neural mechanisms underlying human’s eye-based attention, we would truly appreciate the reviewer very much. But even so, we would like to emphasize that the purpose of the current study was actually not to use TMS & fMRI to confirm that “FEF controls visual attention”. As we mentioned in the Abstract and expanded the introduction in the last two paragraphs of Introduction, the goal of the TMS experiments is to examine the causal role of eye-based attention in producing the aftereffect of “dichoptic-backward-movie” adaptation. This research question is also new, thus we do not think the TMS experiments are incremental, either. Our findings provided direct causal evidence for the effect of FEF on modulating ocular dominance through eye-based attention. Please see the last two sentences in the first paragraph on page 20 in the revised manuscript or below,

      “Interestingly, in our Experiment 2 this aftereffect was significantly attenuated after we temporarily inhibited the cortical function of FEF via cTBS. This finding indicates the crucial role of FEF in the formation of attention-induced ocular dominance shift.”

      as well as the last sentence of the Abstract,

      “…and in this network, FEF plays a crucial causal role in generating the attention-induced ocular dominance shift.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The hemispheric asymmetry in the eye-based attention-related cortex should be further examined and discussed. For example, IPS in both hemispheres was identified in the fMRI experiment. It is not clear why only the right IPS was stimulated in the TMS experiment.

      Response: Thanks for the comment. We have elucidated the reasons for the experimental design with hemispheric asymmetry in FEF and IPS. Please see our response to the Weakness #1 raised by Reviewer #1 in the Public Review section.

      (2) It is known that the frontoparietal cortex plays a role in the contralateral shift of attentional allocation. Meanwhile, the latest stage of ocular-specific representation is V1. The authors should discuss how the eye-related function can be achieved in FEF.

      Response: Thanks for the comment. we have discussed how FEF regulates attention-induced ocular dominance shift (see page 21 second paragraph to page 23 first paragraph in the revised manuscript, and our response to the Weakness #2 raised by Reviewer #1 in the Public Review section).

      (3) To further validate the role of FEF in eye-related attention shifts, the authors may consider using the traditional monocular deprivation paradigm with fMRI and TMS. It would be valuable to compare the neural mechanisms related to the classical monocular deprivation paradigm with the current findings.

      Response: Thanks for the reviewer’s suggestion! That is indeed an interesting research topic that we are currently exploring. The current study investigated the attention-induced ocular dominance shift with the “dichoptic-backward-movie-adaptation” paradigm. This paradigm is substantially different from traditional short-term monocular deprivation. In our Neuroscience Bulletin paper (Song et al. 2023), we discuss the reason as follows.

      “An alternative account of our results is the homeostatic plasticity mechanism. The function of this mechanism is to stabilize neuronal activity and prevent the neuronal system from becoming hyperactive or hypoactive. For this goal, the mechanism moves the neuronal system back toward its baseline after a perturbation [51, 52]. In our case, the aftereffect can be explained such that the visual system boosts the signals from the unattended eye to maintain the balance of the network’s excitability. However, this account cannot easily explain why the change of neural ocular dominance led by prolonged eye-based attention was observed here using the binocular rivalry testing stimuli, but absent in the previous research using the binocularly fused stimuli [11]. In contrast, a recent SSVEP study also using the binocularly fused stimuli has successfully revealed a shift of neural ocular dominance after two hours of monocular deprivation [31], which is in line with the homeostatic plasticity account. Therefore, the mechanisms underlying the “dichoptic-backward-movie” adaptation and monocular deprivation are probably not fully overlapped with each other; and the binocular rivalry mechanism described in the ocular-opponency-neuron model seems to be more preferable than the homeostatic plasticity mechanism in accounting for the present findings.”

      Therefore, before asking whether FEF plays a role in the attention-induced ocular dominance shift in a traditional monocular deprivation paradigm, one should probably first examine whether attention also plays a role in traditional monocular deprivation, and whether the ocular-opponency-neuron adaptation account can also be used to explain the traditional monocular deprivation effect. Our newly accepted paper “Negligible contribution of adaptation of ocular opponency neurons to the effect of short-term monocular deprivation” (https://www.frontiersin.org/articles/10.3389/fpsyg.2023.1282113/full) gives a generally negative answer to the second question. And as to the first question, we have one manuscript under review and another ongoing study. In other words, to get a satisfactory answer to this particular comment of this reviewer, we need to first obtain clear answers to the two above questions. We think this is far beyond the scope of one single manuscript.

      (4) The authors only presented regular movies to the dominant eye to maximize the ocular dominance shift. This critical information of design should be clarified, not only in the method section.

      Response: Thanks for the reviewer’s suggestion! In the Results section of Experiment 2, we have added a description of this critical information of design (see page 11 last paragraph to page 12 first paragraph or below):

      “Then, participants adapted to the “dichoptic-backward-movie” in which regular movie images were presented to the dominant eye to maximize the effect of eye dominance shift (Song et al., 2023). Meanwhile they were asked to detect some infrequent blob targets presented on the movie images in one eye at the same time.”

      (5) The frame rate of the movie is 30 fps, which is much lower than a typical 60 fps visual presentation, does this have an effect on the adaptation outcome?

      Response: To our best of knowledge, there is no evidence that the frame rate of the movie influences the aftereffect of attention-induced ocular dominance shift. In our previous research, the frame rate of the movie during adaptation was 25 fps, which still produced a stable adaptation aftereffect (Song et al., 2023). And the frame rate of the movie was 30 fps in our monocular deprivation work (Lyu et al., 2020), which showed a similar monocular deprivation effect we previously observed in an altered reality study (Bai et al., 2017). The frame rate of the altered-reality video in Bai et al.’s (2017) work was 60 fps. All these clues suggest that the frame rate does not have an effect on the adaptation outcome.

      (6) Figure 5: The ODSE derived from ODI in Experiment 3 should also be illustrated, for a better comparison with results from Experiment 2.

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have added the results of ODSE in Experiment 3 to Figure 5 (see page 15 or below):

      Author response image 1.

      Figure 5. The results of (A) the ocular dominance index (ODI), (B) the ocular dominance shift effects (ODSE) in Experiment 2, (C) the ODI and (D) the ODSE in Experiment 3. The bars show the grand average data for each condition. The individual data are plotted with gray lines or dots. The dashed gray line represents the absolute balance point for the two eyes (ODI = 0.5). Error bars indicate standard errors of means. * p < .05; ** p < .01; n.s. p > .05.

      (7) Spelling issues: "i.e." → "i.e.,"

      Response: Thanks for the reviewer’s suggestion! In the revised manuscript, we have changed “i.e.” to “i.e.,”.

      Reviewer #2 (Recommendations For The Authors):

      Linked to weakness 3: Ideally, a control experiment with cTBS and dichoptic stimulation without sound but with the blob discrimination task should be performed to be able to make important claims about the neural mechanisms involved in eye-based attention.

      Response: Thanks for the comment. We have performed a new experiment as the reviewer suggested. Please see our response to the Weakness #3 raised by Reviewer #2 in the Public Review section.

      Reviewer #3 (Recommendations For The Authors):

      (1) The neural mechanisms are so apparent. We all know the FEF\IPS\SC matter in vision and attention and gaze. This is not groundbreaking.

      Response: As we addressed in our response to Reviewer #3’s public comment, the current study aimed at investigating the causal mechanism for eye-based attentional modulation of ocular dominance plasticity rather than simply the role of FEF\IPS\SC in visual attention. Moreover, eye-based attention is a less investigated aspect of visual attention. The neural mechanism underlying eye-based attention is still largely unknown, and seeking the brain areas for controlling eye-based attention is the necessary preparation work for applying the cTBS. We have responded in detail to Reviewer #3’s public comment why we think both the fMRI and TMS experiments are novel to the field, which we will not reiterate it here to avoid redundancy.

      (2) Why does the "dichoptic-backward-movie" adaptation matter? Is playing a backward movie to one eye realistic? Does that follow the efficient coding? Is that a mere consequence of information theory?

      Response: Thanks for the comments. We have added the description of the “dichoptic-backward-movie” adaptation paradigm in the revised manuscript (see page 3 last paragraph and page 4 first paragraph or our response to this reviewer’s Public comment).

      Is it realistic to play backward movie to one eye? We feel this question is somehow ambiguous to us. If the reviewer means the technical operability for such stimulus presentation, we can assure it since we have used this paradigm in both the current and previously published studies. To be more specific, we made the video stimuli in advance. The left half of the video was the regular movie and the right half was the backward version of the same movie (or vice versa). When viewing such video stimuli through stereoscopes, participants could only see the left half of the video with the left eye and the right half of the video with the right eye. In other words, the regular movie and backward movie were viewed dichoptically. Alternatively, if the reviewer means that such dichoptic presentation rarely happens in real world thus not realistic, we agree with the reviewer on one hand. On the other hand, we have explained on page 3 last paragraph and page 4 first paragraph why it is a particular useful paradigm for the main purpose of the present study. Let us make a similar example. The phenomenon of binocular rivalry rarely happens in everyday life. So people may say binocular rivalry is not realistic. However, our visual system does have the ability to deal with such conflicting visual inputs across the eyes, even binocular rivalry is unrealistic! Sometimes it is fun to investigate those seemingly unrealistic functions of our brains since those may also reveal the mystery of our neural system. As we know, despite binocular rivalry is uncommon in daily life, it is frequently used to investigate awareness. And in our work, we use binocular rivalry to measure perceptual ocular dominance.

      Finally, the reviewer queried about if the "dichoptic-backward-movie" adaptation paradigm follow efficient coding and information theory. The information theory and efficient coding assume that messages with low expectedness or of rare occurrence would attract more attention and induce larger neural responses than those with high expectedness. In the "dichoptic-backward-movie" adaptation paradigm, the backward movie should be less expected since the actions of the characters in the backward movie appeared illogical. Thus, according to the information theory and efficient coding, it would be expected that more attention was paid to the backward movie and thus the backward movie might dominate the awareness for a longer period during adaptation (Zhang et al., 2012). However, we instructed participants to follow the regular movie during adaptation. The results of blob detection task also showed a better task performance when the targets appeared in the eye presented with the regular movie, which contradicted with the prediction of the information theory and efficient coding. Thus, it seems not very likely that the "dichoptic-backward-movie" adaptation followed efficient coding and information theory.

      References

      Bai, J., Dong, X., He, S., & Bao, M. (2017). Monocular deprivation of Fourier phase information boosts the deprived eye’s dominance during interocular competition but not interocular phase combination. Neuroscience, 352, 122-130. https://doi.org/10.1016/j.neuroscience.2017.03.053

      Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1), 289-300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

      Choe, E., & Kim, M.-S. (2022). Eye-specific attentional bias driven by selection history. Psychonomic Bulletin & Review, 29(6), 2155-2166. https://doi.org/10.3758/s13423-022-02121-0

      Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3), 201-215. https://doi.org/10.1038/nrn755

      Dong, X., Gao, Y., Lv, L., & Bao, M. (2016). Habituation of visual adaptation. Sci Rep, 6, 19152. https://doi.org/10.1038/srep19152

      Duecker, F., Formisano, E., & Sack, A. T. (2013). Hemispheric differences in the voluntary control of spatial attention: direct evidence for a right-hemispheric dominance within frontal cortex. Journal of Cognitive Neuroscience, 25(8), 1332-1342. https://doi.org/10.1162/jocn_a_00402

      Esterman, M., Liu, G., Okabe, H., Reagan, A., Thai, M., & DeGutis, J. (2015). Frontal eye field involvement in sustaining visual attention: evidence from transcranial magnetic stimulation. Neuroimage, 111, 542-548. https://doi.org/10.1016/j.neuroimage.2015.01.044

      Gallotto, S., Schuhmann, T., Duecker, F., Middag-van Spanje, M., de Graaf, T. A., & Sack, A. T. (2022). Concurrent frontal and parietal network TMS for modulating attention. iScience, 25(3), 103962. https://doi.org/10.1016/j.isci.2022.103962

      Lega, C., Ferrante, O., Marini, F., Santandrea, E., Cattaneo, L., & Chelazzi, L. (2019). Probing the neural mechanisms for distractor filtering and their history-contingent modulation by means of TMS. Journal of Neuroscience, 39(38), 7591-7603. https://doi.org/10.1523/JNEUROSCI.2740-18.2019

      Lunghi, C., Burr, D. C., & Morrone, C. (2011). Brief periods of monocular deprivation disrupt ocular balance in human adult visual cortex. Curr Biol, 21(14), R538-539. https://doi.org/10.1016/j.cub.2011.06.004

      Lyu, L., He, S., Jiang, Y., Engel, S. A., & Bao, M. (2020). Natural-scene-based Steady-state Visual Evoked Potentials Reveal Effects of Short-term Monocular Deprivation. Neuroscience, 435, 10-21. https://doi.org/10.1016/j.neuroscience.2020.03.039

      Mayrhofer, H. C., Duecker, F., van de Ven, V., Jacobs, H. I., & Sack, A. T. (2019). Hemifield-specific correlations between cue-related blood oxygen level dependent activity in bilateral nodes of the dorsal attention network and attentional benefits in a spatial orienting paradigm. Journal of Cognitive Neuroscience, 31(5), 625-638. https://doi.org/10.1162/jocn_a_01338

      Rezec, A., Krekelberg, B., & Dobkins, K. R. (2004). Attention enhances adaptability: evidence from motion adaptation experiments. Vision Res, 44(26), 3035-3044. https://doi.org/10.1016/j.visres.2004.07.020

      Sack, A. T. (2010). Using non-invasive brain interference as a tool for mimicking spatial neglect in healthy volunteers. Restorative neurology and neuroscience, 28(4), 485-497. https://doi.org/10.3233/RNN-2010-0568

      Said, C. P., & Heeger, D. J. (2013). A model of binocular rivalry and cross-orientation suppression. PLoS computational biology, 9(3), e1002991. https://doi.org/10.1371/journal.pcbi.1002991

      Song, F., Lyu, L., Zhao, J., & Bao, M. (2023). The role of eye-specific attention in ocular dominance plasticity. Cerebral Cortex, 33(4), 983-996. https://doi.org/10.1093/cercor/bhac116

      van den Bergh, D., Wagenmakers, E.-J., & Aust, F. (2023). Bayesian Repeated-Measures Analysis of Variance: An Updated Methodology Implemented in JASP. Advances in Methods and Practices in Psychological Science, 6(2), 25152459231168024. https://doi.org/10.1177/25152459231168024

      van Doorn, J., van den Bergh, D., Böhm, U., Dablander, F., Derks, K., Draws, T., Etz, A., Evans, N. J., Gronau, Q. F., Haaf, J. M., Hinne, M., Kucharský, Š., Ly, A., Marsman, M., Matzke, D., Gupta, A., Sarafoglou, A., Stefan, A., Voelkel, J. G., & Wagenmakers, E. J. (2021). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826. https://doi.org/10.3758/s13423-020-01798-5

      Wagenmakers, E. J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Selker, R., Gronau, Q. F., Dropmann, D., Boutin, B., Meerhoff, F., Knight, P., Raj, A., van Kesteren, E. J., van Doorn, J., Šmíra, M., Epskamp, S., Etz, A., Matzke, D., de Jong, T., van den Bergh, D., Sarafoglou, A., Steingroever, H., Derks, K., Rouder, J. N., & Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example applications with JASP. Psychonomic Bulletin & Review, 25(1), 58–76. https://doi.org/10.3758/s13423-017-1323-7

      Watanabe, M., Cheng, K., Murayama, Y., Ueno, K., Asamizuya, T., Tanaka, K., & Logothetis, N. (2011). Attention but not awareness modulates the BOLD signal in the human V1 during binocular suppression. Science, 334(6057), 829-831. https://doi.org/10.1126/science.1203161

      Wong, S. P., Baldwin, A. S., Hess, R. F., & Mullen, K. T. (2021). Shifting eye balance using monocularly directed attention in normal vision. J Vis, 21(5), 4. https://doi.org/10.1167/jov.21.5.4

      Yuval-Greenberg, S., & Heeger, D. J. (2013). Continuous flash suppression modulates cortical activity in early visual cortex. J Neurosci, 33(23), 9635-9643. https://doi.org/10.1523/jneurosci.4612-12.2013

      Zhang, P., Jiang, Y., & He, S. (2012). Voluntary attention modulates processing of eye-specific visual information. Psychol Sci, 23(3), 254-260. https://doi.org/10.1177/0956797611424289

      Zhou, J., Reynaud, A., & Hess, R. F. (2014). Real-time modulation of perceptual eye dominance in humans. Proc Biol Sci, 281(1795). https://doi.org/10.1098/rspb.2014.1717

    1. Author Response

      The following is the authors’ response to the current reviews.

      Joint Public Review

      This study is concerned with the general question as to how pools of synaptic vesicles are organized in presynaptic terminals to support different types of transmitter release, such as fast synchronous and asynchronous release. To address this issue, the authors employed the classical method of load- ing synaptic vesicle membranes with FM-styryl dyes and assessing dye destaining during repetitive synapse stimulation by live imaging as a readout of the mobilization of vesicles for fusion. Among other 1ndings, the authors provide evidence indicating that there are multiple reserve vesicle pools, that quickly and slowly mobilized reserves do not mix, and that vesicle fusion does not follow a mono-exponential time course, leading to the notion that two separate reserve pools of vesicles - slowly vs. rapidly mobilizing - feed two distinct releasable pools - reluctantly vs. rapidly releasing. These 1ndings are valuable to the 1eld of synapse biology, where the organization of synaptic vesicle pools that support synaptic transmission in different temporal and stimulation regimes has been a focus of intense experimentation and discussion for more than two decades.

      On the other hand, the present study has limitations, so that the authors’ key conclusions remain incompletely supported by the data, and alternative interpretations of the data remain possible. The approach of using bulk FM-styryl dye destaining as a readout of precise vesicle arrangements and pools in a population of functionally very diverse synapses bears problems. In essence, the approach is ’blind’ to many additional processes and confounding factors that operate in the back- ground, from other forms of release to inter-synaptic vesicle exchange. Further, averaging signals over many - functionally very diverse - synapses makes it diicult to distinguish the dynamics of separate vesicle pools within single synapses from a scenario where different kinetics of release originate from different types of synapses with different release probabilities.

      We thank the editors and reviewers for their time and patience, and are happy that they found our results valuable.

      We do not have a clear understanding of what the alternative interpretations might be - beyond those already addressed - but would like to. At present, we believe that the evidence for parallel processing of slowly and quickly mobilized reserve vesicles is solid and hope that people who are open to the possibility will evaluate the reasoning described within our report. The hypothesis that reserves are kept separate because they feed distinct subdivisions of the readily releasable pool remains to be tested.

      Beyond that, we have used FM-dye de-staining as a bulk measurement of sub-synaptic events in the sense that we have made no attempt to measure mobilization of isolated individual vesicles. We do not see how this necessarily leaves viable alternative interpretations, but this is diZcult to evaluate without knowing what the alternatives might be. On the other hand, the FM-dye technique has had good resolution at the level of distinguishing between individual synapses since at least Murthy et al. (2001). For our part, we are con1dent that our analysis in Figure 3 combined with the results in Figures 4-11 shows that the multiple reserve pools co-occur in many individual presynaptic terminals. We did not use electron microscopy to con1rm that all of the punctae analyzed in Figure 3 were indeed single synapses, but the reviewers did not recommend this, and we believe there is already enough published about the spatial distribution of synapses in cell culture to be con1dent that many of the punctae that are smaller than 1.5 µm were individuals.

      Overall, we have attempted to address all of the individual concerns raised by reviewers, and our understanding is that these concerns and our responses will be available on the eLife website. The reviewers were not convinced on every point, but these are cases where the nature of the concern was not clear to us. We hope that people who share these concerns will check out our responses and contact us with any further questions or alternative interpretations.

      (1) The authors sincerely addressed many of the previous concerns, mainly by clari1cation. The data are consistent with the authors’ hypothesis. The pool concept is somewhat similar to that of Richards et al (2000) and Rey et al (2015). The authors further propose that two reserve pools feed vesicles to two readily-releasable pools independently.

      To clarify further: The possibility that distinct reserve pools feed distinct readily releasable pools is predicted by our working model, and is something that we would like to test in the future, but is not a conclusion of the present study. Instead, in the present study, we tested the prediction that quickly and slowly mobilized reserve vesicles are processed in parallel without making assumptions about the the underlying mechanism.

      Unfortunately, the heterogeneity among individual synapses remains a concern as shown in (some of) the raw data (Fig. 3 and supplements).

      We emphasize that we have not attempted to minimize the extensive heterogeneity among synapses, but actually highlight this. In fact, we chose the image in Figure 3 for an example in part because of the lower left region replicated in Figure 3 supplement 2 demonstrating extensive heterogeneity along what appears to be a single axon. We are not the 1rst to notice the heterogeneity (see Waters and Smith, 2002), but we do provide a new possible explanation which, if correct, might be impor- tant for understanding biological computation (see our Discussion). At the same time, we believe that our evidence for multiple reserve pools within individual synapses with heterogenous properties is compelling. We see no contradiction, and indeed, our conclusion that the ratio of slowly to quickly mobilized varies extensively between synapses can only be correct if individual synapses contain mul- tiple types. We hope that people who are interested in our conclusions will evaluate the evidence and reasoning presented in our report.

      Bulk imaging of FM de-staining does not really measure the fraction of non-stained vesicles, which changes dynamically during stimulation, so that the situation calls for an independent readout of stained and non-stained vesicles. Moreover, direct correspondence between two speci1c stimulation frequencies (with long stimulation) and vesicle pools is not straightforward. These issues make the experimentally measured pools not well-de1ned.

      We think that the reviewer is suggesting an alternative scenario where decreases in the fractional rate of FM-dye de-staining seen during 1 Hz stimulation might be caused by a large (4-fold) increase in the total size of the reserve pool that dilutes the stained vesicles by mixing. This scenario is consis- tent with the results in Figures 2 and 4-7, and initially seems plausible because previous studies have shown that many vesicles are not mobilized, and therefore are not stained, during our standard load- ing protocol of 100 s at 20 Hz (Harata et al., 2001). However, liberation of this "deep reserve" as an explanation for the decrease in fractional destaining is not compatible with the results in Figures 10-11 that rule out mixing. For example, liberation of the deep reserve would cause fractional destaining to appear equally depressed during subsequent 20 Hz stimulation, and Figure 10 shows that this is not the case. The scenario cannot be rescued by postulating that the subsequent 20 Hz stimulation caused the deep reserve to quickly recapture the liberated vesicles because Figure 11D-E shows that fractional de-staining continues to be depressed at the very beginning of a second 1 Hz train that follows the 20 Hz stimulation.

      (2) The authors’ latest round of responses did not alleviate most of my major previous concerns. The additional data now shown in Fig 3 rely on conceptually the same type of bulk measurements and thus suffer from the same limitations as outlined in the earlier review.

      We believe that the new evidence in Figure 3 for multiple reserve pools at individual synapses is strong when evaluated in combination with the results in Figures 4-11. We do not, at present, see how the fact that FM-dye destaining is used as a bulk measurement at the sub-synaptic level could undercut our logic.

      Moreover, the image of neuronal cultures shown in Fig. 3 might be problematic. It shows very bright staining with large round lumps, which may be indicative of unhealthy cultures.

      Unhealthy cultures are not a concern because we used strict quantitative criteria to assess health that are better than we have seen elsewhere (details below). We think the reviewer might be reacting to the way we rendered the image; i.e., as “overexposed”. We did this to highlight the dimmest punctae, which is a key element of the analysis. The same image rendered with less contrast is now displayed in Author response image 1 (3rd panel from left).

      Author response image 1.

      Image to left is a reproduction of the example image in Figure 3, which was the average of 120 time lapse raw data images; scale bar is 20 µm. The second image is a replicate except all 69 punctae that were included in the study are occluded by 1.5 µm × 1.5 µm yellow squares. The third image is another replicate except with a different brightness setting. The rightmost image is one of the raw data images with brightness matched to the third image.

      More details (relevance to in vivo is in point 4):

      (1) Identifying unhealthy cultures is straightforward with our technique because synapses in un- healthy cultures destain spontaneously. Our criteria for accepting experiments for further analy- sis was less than 1.5 % spontaneous rundown/minute. This is a better way to judge health than we have seen elsewhere because it eliminates subjective decisions, and would be equally appli- cable for microscopes and imaging software of any quality. For our part, we used a 25X objective with a low numerical aperture and low intensity illumination that allowed us to completely avoid photobleaching. The images will look worse to some compared to when acquired with a higher quality microscope, but the absence of photobleaching is an important bene1t because it allowed us to avoid complicated corrections.

      (2) Stained areas larger than 1.5 µm across - such as the ones noted by the reviewer - were expressly excluded from our study because they could have been clusters of multiple synapses. The size criteria are detailed in the Legend of Figure 3. Punctae and larger areas that were excluded are the ones that are not occluded by yellow squares in the 2nd image from the left, above; at least two of the largest were likely clusters of synapses that were out of focus. Nevertheless, despite being excluded, it is unlikely that the stained areas larger than 1.5 µm in the image in Figure 3 were characteristic of unhealthy cultures because these areas did not de-stain spontaneously, but instead de-stained in response to 1 and 20 Hz electrical stimulation much like the small punctae that were included in the analysis.

      (3) Electron microscopy results have shown that individual synapses vary >10-fold in size, so a large range of brightness is expected (Murthy et al., 2001). The large range would either make the brighter punctae and clusters appear to be overexposed in a printed image, or render the dimmer punctae invisible. We have opted to present an image with overall brightness adjusted so that the dimmest punctae are visible. This is appropriate because one of the concerns was that analyzing the dimmest punctae would reveal underlying populations where the rate of fractional destaining was constant. In the end, no evidence for underlying populations emerged, which supports the conclusion that the decreases in fractional destaining occur at individual synapses. Note that adjusting brightness for example images was unavoidable; we used the camera in a range that was far below saturation and, because of this, images presented without adjusting brightness would appear to be completely black.

      (4) Primary cell cultures are non-physiological by de1nition, so the concept of health is intrinsically arbitrary, and relevance to synapses in brains is questioned routinely. However, the new 1ndings in the present report are that: (1) individual hippocampal synapses contain multiple reserve pools; (2) the reserves remain separate but are not distinguishable by the timing of mobilization when the frequency of stimulation is high; and (3) the reserves are nevertheless processed in parallel even when the frequency of stimulation is high. Of these, 1nding (1) has been reported previously for other synapse types, but 1ndings (2) and (3) were both unexpected, and 1nding (3) was not compatible with current concepts. Nevertheless, all three 1ndings were predicted by a model that was developed to explain orthogonal results from studies of intact synapses in ex vivo slices that did not 1t with current concepts either, as referenced in the Introduction. Because of this, we think that the parallel processing of quickly and slowly mobilized reserve vesicles likely occurs in individual Schaffer collateral synapses in vivo, and is not a cell culture artifact; the alternative would be too much of an unlikely coincidence.

      References

      Harata N, Pyle JL, Aravanis AM, Mozhayeva M, Kavalali ET & Tsien RW (2001). Limited numbers of recycling vesicles in small CNS nerve terminals: implications for neural signaling and vesicular cycling. Trends in Neurosciences 24, 637–43.

      Murthy VN, Schikorski T, Stevens CF & Zhu Y (2001). Inactivity produces increases in neurotransmitter release and synapse size. Neuron 32, 673–82.

      Waters J & Smith SJ (2002). Vesicle pool partitioning in2uences presynaptic diversity and weighting in rat hippocampal synapses. Journal of Physiology 541, 811–23.


      The following is the authors’ response to the original reviews.

      Reviewer 1

      Mahfooz et al. investigated the time course of synaptic vesicle fusion of cultured mouse hippocampal synapses using FM-styryl dyes. The major finding is that the FM destaining time course deviates from a mono-exponential function during 1 Hz, but not 20 Hz stimulation. The deviation from a mono-exponential function was also seen during a second stimulus train applied after recovery periods of several minutes, or after depletion of the readily-releasable vesicle pool. Furthermore, this "decreased fractional destaining" was unlikely due to long-term synaptic depression, or incomplete dye clearance. Fractional destaining was enhanced when the dye was loaded with 1 Hz compared with 20 Hz stimulation, suggesting that vesicles recycled during 1 Hz stimulation are predominantly sorted into a rapidly mobilized pool. Finally, they show that 20 Hz stimulation does not affect the decrease in fractional destaining induced and recorded during 1 Hz stimulation. Based on these observations, they put forward a model in which slowly and quickly resupplied synaptic vesicles are mobilized in parallel.

      The demonstration that FM destaining time courses deviate from single exponentials during 1 Hz stimulation (Figs 2-3) is a starting point used to rule out simple models where vesicles intermix freely and to introduce a mathematical technique for quantifying the extent of the deviations that is essential for the analysis of later experiments, where curve fitting could not be used. We then:

      1) Show that the deviation from simple models is not caused by depletion of the readily releasable pool, as noted by the reviewer;

      2) rule out a number of explanations for the deviation that do not involve reserve pools at all, again as noted;

      3) provide affirmative evidence for the presence of multiple reserve pools by labeling them with distinct colors;

      4) show that the vesicles within the distinct reserve pools do not intermix even when activity is intense enough to drive destaining with single exponential kinetics.

      We believe that the 4th point - documented in Figs 10-11 - is a key element.

      Beyond that, we note that our working model arose from previous studies, as referenced in the Introduction, not from the present results. The model did predict the parallel processing of quickly and slowly mobilized reserves, and the present study was designed to test this prediction. In that sense, the evidence in the current study supports our working model, not the other way around.

      In any case, most readers in the near term will be more interested in the serial versus parallel question, and less in precisely what the present results mean for evaluating our working model. Because of this, we emphasize that evidence for parallel processing of separate reserve pools depends solely on experimental results within the study, and not on modeling. As a consequence, the evidence will continue to be equally strong even if problems with our working model arise later on (lines 382-386).

      We do have additional unpublished evidence for the working model that does not bear directly on the parallel versus serial question. Some of this was removed from an earlier version of the manuscript and some has been newly gathered since the original submission. We will publish the additional evidence at a later point. We decided not to include it in the present manuscript expressly to avoid confusion about the relationship between modeling and the evidence for parallel processing in general.

      The paper addresses an interesting question - the relationship between the resupply and release of synaptic vesicles. The study is based on a lot of data of high quality. Most data are solid. However, some of the major conclusions are not well supported by the data. Moreover, it remains unclear how speci1c the findings are to the experimental design.

      The following points should be addressed:

      1) Most traces display a decrease in fluorescence intensity before stimulation. Data with a decrease in baseline fluorescence intensity of up to 1.5 % were considered for the analysis (Fig 2-supplement 2). I may have missed it, but were the data corrected for the observed decrease in baseline fluorescence intensity? (In the model shown in Appendix 1 Figure 1, they correct for "rundown"). For instance, are the residuals shown in Fig 2D, E based on corrected data? In case the data would not be corrected for a decrease in baseline fluorescence, would the decay kinetics also deviate from a single exponential after correction?

      We did not correct for rundown - as now noted on lines 96-97 - except in the figure in the Appendix, noted by the reviewer, where the uncorrected and corrected time courses are plotted side by side for easy comparison. However, our study includes an analysis showing that correcting for rundown during 1 Hz stimulation would increase - not decrease - the deviation from a single exponential (2 bars in rightmost panel in Fig 2C, and lines 113-116 of Results), so the absence of a correction does not weaken our conclusions.

      2) The analysis of "fractional destaining" is not clear to me. How many intervals of which length were chosen and why? For instance, the intervals often differ in length, number and do not cover the complete decay (e.g., Fig 2B).

      We calculated fractional destaining from longer intervals at later times because the overall amount of stain was less, meaning signal/noise was less, and scatter was more. We did this because increased scatter at later times could be counteracted by estimating the slope of destaining from longer intervals. An additional bene1t is that elongating the later intervals allowed us to plot only 6 bars for 25 min of 1 Hz destaining, which works better visually than 17.

      Increasing the interval length for later times is mathematically sound because the key factor causing distortions related to deviations from linearity is not the length of the interval per se but, instead, the fractional destaining over the interval. The fractional destaining is greater at the start of 1Hz stimulation, thus requiring shorter intervals.

      It would be possible to choose inappropriately long intervals that would distort estimates of the change in fractional destaining. However, we now include Fig 2-supplement 6 – which includes all 17 1.5 min intervals - to con1rm that any distortions after the first interval were minimal. The Appendix predicts a biologically important distortion for the first interval which we are following up, but this would underestimate the true deviation from quickly mixing pools, so would not be problematic for the present conclusions.

      Sometimes, only the interval right after stimulation onset was considered (e.g., Fig 7, 8).

      Figs 7, 8 in the previous version are now Figs 8, 9.

      This is appropriate because the goal was to estimate the fractional destaining at the very start, before the quickly mobilized fraction has destained.

      How quickly fractional destaining is expected to revert to the lowest value seen after 15 min of 1Hz stimulation in Fig 2 (and elsewhere) depends very much on assumptions - such as the number of reserve pools, etc. We sought to avoid this kind of additional analysis because we are keen to avoid the impression that our main conclusions depend on the speci1cs of modeling.

      How sensitive are the changes in fractional destaining to the choice of the intervals?

      Minimally. This can be seen by eye because the magenta lines in Fig 2B 1t the data well, but see Fig 2-supplement 6 for a quantitative comparison.

      For instance, would fractional destaining be increased if later intervals would have been chosen for the second 20 Hz stimulus in the experiment shown in Fig 9B?

      Previous Fig 9B is now Fig 10B.

      We cannot be certain, but think it probably would not be different. Neither an increase nor a decrease would be problematic for our conclusions.

      More detail: There is not enough data to evaluate this specifically for Fig 10B because the total amount of stain remaining at later intervals is little, meaning signal/noise is low, which causes extensive experimental scatter. However, synapses were even more extensively destained prior to time course c of Figure2-supplement 2C, which nevertheless matches time courses a, b, and d.

      I propose fitting all baseline-corrected data with a single and a double-exponential function (as well as single exponential plus line?) and reporting the corresponding time constants (slopes) and amplitudes.

      As noted above, we purposefully do not baseline correct data in a way that would make this possible. However, we do include exponential fits when appropriate, in Fig 2D-E, Fig 2- supplement 1, Fig 2-supplement-7, Fig 2-supplement-8, and Fig 12B.

      Indeed, the absence of any change in the weighting parameter despite substantial changes for both time constants seen after raising the temperature to 35C (Fig 2-supplement-8 vs Fig12B) is notable because it suggests that the contents of the reserve pools are not altered by changing temperature, even though vesicle trafficking is accelerated. Fig 2-supplement-8 is a supplementary figure because the result is outside the scope of the main point, not because the quality is lower than for other figures.

      Beyond that, exponential fits would not be adequate for most of the study because many experiments - including the core experiments in Figs 10-11 - require discontinuous stimulation, such as when we stop stimulating at 1 Hz, rest for minutes, and then start up again at 1 or 20 Hz. And, although widely used, exponentials are non-linear equations after all. Even when they can be used to quantify time courses, the fractional destaining measurement is almost always more informative, in the technical sense, because it avoids complications when estimating the importance of deviations occurring at the two extremes versus deviations in the middle of the time course.

      3) Along the same lines, is the average slow time constant indeed around 40 min? (Are the data shown in Fig 2 S7 based on an average?) If this would be the case, I suggest conducting a control experiment with a recording time > 40 min. Would fitting an exponential or a line to baseline data (without stimulation) also give a similar slow component?

      Fig 2-supplement 7 in the previous version is now Fig 2-supplement 8.

      First, yes, the time course shown in Fig 2-supplement 8 is the mean across preparations. The time courses of the individual preparations were quanti1ed as the median value of the individual ROIs before averaging.

      Second, no, fitting baseline data would give an approximately 3-fold greater time constant (i.e., 120 min) because fractional destaining decreases by about 3-fold when we stop stimulating after 25 min of 1 Hz stimulation (i.e., Fig 2C, 3B, and many others).

      The key point is that fractional destaining decreases greatly over long trains of 1 Hz stimulation.

      For Fig 2, we saw a 2.7+/-0.1-fold decrease before accounting for baseline destaining (lines 106-110), which increased to a 4.4-fold decrease when we did account for baseline destaining (lines 113-116). Overall, the 2.7-fold value is simultaneously a safe minimum boundary, and much greater than the value of 1.0 expected from models where vesicles mix freely.

      Note that future studies will show that even the 4.4-fold value is probably an underestimate because 1 Hz stimulation misses a fast component at the very beginning of the time courses, as predicted in the Appendix.

      4) How speci1c are the findings to 1 Hz (and 20 Hz) stimulation? From which frequency onward can a decrease in fractional destaining be no longer observed?

      Our logic depends only on the premise that we are able to find some frequency where fractional destaining no longer decreases. We knew that 20 Hz was a good place to start because of previous electrophysiological experiments - frequency jumps (Fig 1 of Wesseling and Lo, 2002 and Fig 2C of Garcia-Perez and Wesseling, 2008), and trains of action potentials followed by osmotic shocks (Fig 2A of Garcia-Perez et al., 2008) - showing that 20 Hz stimulation is enough to nearly completely exhaust the readily releasable pool. This is noted in lines 202-203, and Box 2.

      would previous stimulation with frequencies <20 Hz interfere with fractional destaining? These control experiments would help assessing how general/speci1c the findings are.

      Yes (Figs 4 and 11A at 1 Hz). Also, we have done experiments at 0.1 Hz, which will be published later; some of these were actually removed from an earlier version of the manuscript because the results are primarily relevant to deciding between particular parallel models, and are not relevant to the conclusion of the present study that quickly and slowly mobilized reserves are processed in parallel.

      Similarly, a major conclusion of the paper - the parallel mobilization of two vesicle pools - is largely based on these two stimulation frequencies. Can they exclude that mixing between the two pools occurs at other frequencies?

      We cannot exclude the possibility of breakdown at a higher frequency, but this would not undercut our conclusions. We do not have plans to try this experiment because: (1) a positive result would be open to concerns about non-physiologically heavy stimulation; and (2) a negative result would be difficult to interpret because of the possibility that the axons cannot follow at higher frequencies.

      6) Some information in the methods section is lacking. For instance, which species is the cell culture based on?

      Mice from both sexes were used. This is now speci1ed in the Methods.

      Reviewer 2

      By using optical monitoring of synaptic vesicles with FM1-43 at hippocampal synapses, the authors try to show the evidence for two parallel reserve pools of synaptic vesicles, which feed the vesicles to the readily releasable pool. The major strength of the study is the use of a quantitative model, which can be readily testable by experiments: in the course of the study, the authors propose the best vesicle pool model, which fits the experimental data "averaged over synapses" nicely. On the other hand, the weak point of the study comes from the optical method and the data: bulk imaging of vesicle dynamics monitored at each synapse is noisy and the signals vary considerably among synapses. Therefore, the average signals over many synapses may not reflect the vesicle dynamics of two reserve pools within a synapse, but something else, such as the different kinetics of release from multiple synapses with different release probability. Nevertheless, a new framework of two reserve pools offers a testable hypothesis of vesicle dynamics, and the use of single vesicle tracking and EM may allow one to give a de1nitive answer in the future studies Therefore, the study may be of interest to the community of synaptic neurobiology.

      1) The current version includes a new figure (Fig 3) showing that the deviations from single pool models seen in populations are caused by deviations occurring at the level of single synapses. The heterogeneity between synapses actually causes population statistics to underestimate - not overestimate - the mean and median size of the deviations at individuals.

      We think the new evidence in Fig 3 and supplements is conclusive without follow-on EM of the same punctae given the substantial body of already published EM on similar cultures. Essentially, the only way to explain the results without invoking multiple reserve pools in individual synapses would be to say that individual synapses ALWAYS come in clumps containing multiple types and are NEVER separated from neighbors by more than 1.5 microns - even when the clumps are separated from each other by 5 microns. There is already clear evidence against this.

      2) No new model is proposed here, see the first response to the first reviewer.

      3) We are not aware of alternative hypotheses that could account for our results, so cannot evaluate if single vesicle tracking and EM could add meaningful additional support.

      1) The existence of non-stained vesicles complicates the interpretation of the data. Because the release by 20 Hz and 1 Hz stimulation do not entirely reflect the release from fast and slow vesicle pools. the estimation of non-stained vesicles using synaptopHluorin (+ba1lomycin) and EPSCs would be helpful to examine fraction of non-stained / stained vesicles over time (with stimulation, the ratio may change dynamically, which may bring complications).

      Non-stained vesicles are not a complication, but instead a key element of our logic which is included in the diagrams in Boxes 1 and 2 and Figure 9. That is, quickly and slowly mobilized reserves can be distinguished at 1 Hz precisely because 1 Hz is not intense enough to exhaust the readily releasable pool (Box 2). The corollary is that stained vesicles must be replaced by non-stained vesicles, because otherwise 1 Hz stimulation would exhaust the readily releasable pool. And this is why FM-dyes (plus a beta-cyclodextrin during washing) are ideal for the current questions whereas other techniques, such as electrophysiology or synaptopHluorin imaging are obviously indispensable for other questions, but could not replace the FM-dyes in the current study. This is now noted on lines 86-89.

      We are aware that synaptopHluorin + ba1lomycin could, in principle, accomplish some of the same goals. However, ba1lomycin ended up being toxic when applied for tens of minutes, as it would have to be in our experiments. And, we do not see what critical question is not already answered with strong evidence using FM dyes.

      2) Individual synapses show marked differences in the time course of de-staining, suggesting differences in release probability. The averaging of the whole data may reflect "average" behavior of synapses, but for example, bi-exponential time course may reflect high Pr and low Pr synapses, rather than vesicle recruitment.

      The authors may comment on this issue.

      See newly added Fig 3, and responses above.

      3) Some differences are very small (Fig 10, the same amplitude as bleaching time course), and I am not certain if the observed differences are meaningful, given low signal to noise ratio in each synapse.

      Fig 10 in the previous version is Fig 11 in the current version.

      Even if correct, this would not be problematic because 20 Hz stimulation clearly did not cause fractional destaining to return to the initial value when stimulation was resumed at 1 Hz (compare d and f in Fig 11E). In any case, Figs 2C, 3B, 5B, 7B, and Fig 10-supplement 2A all show that the minimum fractional destaining value during 1 Hz stimulation is about 3-fold greater than during subsequent rest intervals, which is not a small difference. Also, note that Fig 2-supplement 3 shows that photobleaching likely did not play a role.

      Reviewer 3

      Reviewer #3 (Recommendations For The Authors):

      This study attempts to conceptualize the long-standing question of vesicle pool organization in presynaptic terminals. Authors used classical FM dye release experiments to support a hypothesis that rapidly and slowly releasing vesicles are mobilized in parallel without intermixing. This modular model is also supported indirectly by the authors’ recent findings of molecular links that connect a subset of vesicles in linear chains (published elsewhere).

      Our study should be seen as a test of the hypothesis that quickly and slowly mobilized reserves are processed in parallel. The evidence is independent of any modeling, and would continue to be equally strong if our working model turns out to be incorrect (lines 382-386).

      The scope of the original model was limited by a number of caveats. The main concerns included a limited data set measured in bulk from a highly heterogeneous synapse population, and a complex interrelationship between vesicle mobilization and the bulk FM dye de-staining kinetics. The second major limitation was measurements being performed at room temperature, which inhibits or alters a number of critical synaptic processes that are being modeled. This includes the efficiency of exo/endocytosis coupling, vesicle mobility and release site refractory period, which are stimulus- and temperature-dependent, but were not accounted for in the original model.

      The present study contains experiments at body temperature (Fig 12 and Fig 12-supplement 1 in the current version) and analyses of individual synapses (especially Fig 3 in the current version). To our knowledge all results are consistent with everything that is known about the efficiency of exo/endocytosis coupling, vesicle mobility and release site refractory periods.

      The authors made strong efforts to address previous concerns. However, the main conceptual point, i.e. linking the bulk FM dye de-staining kinetics with precise arrangement of vesicle pools, is not well supported and is generally highly problematic because it ignores many additional processes and confounding factors.

      For example, vesicle exchange between neighboring synapses constitutes from 15% to over 50% of total recycling vesicle population, and therefore is a major contributing factor to FM dye loss/redistribution, but is not considered in this study. Additionally, this vesicle exchange process undergoes calcium/activity-dependent changes, contributing to difficulty in interpreting the current experiments comparing FM de-staining at different stimulation frequencies.

      We do not see how exchange of vesicles between synapses could be a problem for our logic, so cannot evaluate this without a more detailed description of the concern. Instead, our results rule out random inter-synaptic exchange between quickly and slowly mobilized reserve pools because this would show up in our assays as mixing, which does not occur. We think there are three remaining possibilities:

      1) vesicles are exchanged primarily between quickly mobilized reserve pools

      2) vesicles are exchanged primarily between slowly mobilized reserve pools

      3) vesicles in quickly mobilized reserve pools are targeted to quickly mobilized reserve pools in other synapses and vesicles in slowly mobilized reserve pools are targeted to slowly mobilized reserve pools in other synapses.

      It would be interesting to know which of these is correct, but this is outside the scope of the current study.

      Moreover, other forms of release, such as asynchronous release, contribute a large fraction of released vesicles, but are not factored in. Asynchronous release varies widely in synapse population from 0.1 to >0.4 of synchronous release, but is entirely ignored. Spontaneous release may also contribute to FM dye loss over extended 25min recordings used.

      Spontaneous release and asynchronous release are not caveats.

      First, spontaneous: We suspect that spontaneous release contributes to the background destaining rate, but this is 3-fold slower than the minimum during 1 Hz stimulation on average (Figs 2C, 3C, 5B etc), so we know that the slowly mobilized reserve is mobilized by low frequency trains of action potentials (lines 410-412). Note that a different outcome - where the rate of destaining decreased to a very low level during long trains of 1 Hz stimulation - would not have been consistent with the idea that slowly mobilized vesicles are only released spontaneously because the remaining fluorescence can always be destained rapidly by increasing the stimulation intensity to 20 Hz (e.g., see examples in Fig 3).

      Second, asynchronous: We know that slowly mobilized reserves must be released synchronously at 35C because the asynchronous component is eliminated at this temperature (Huson et al., 2019), without altering the quantity of slowly mobilized reserves that are mobilized by 1 Hz stimulation (lines 350-360 of Results, and 445-452 of Discussion; we can con1rm from our own unpublished experiments that the disappearance of asynchronous release at 35C is a robust phenomenon in these cell cultures). Asynchronous release of slowly mobilized vesicles might occur at room temperature, but this would not argue against the conclusion that slowly mobilized vesicles are processed in parallel with quickly mobilized.

      Speci1c comments:

      Points 1-4 are already addressed above.

      5) The notion of the chained vesicles is somewhat confusing: how does the "first" vesicle located at the plasma membrane/release site get released if it is attached to the chain? Wouldn’t this "first" vesicle be non-immediately releasable since it must first be liberated? Since all vesicles shown in the Figure 1 have chains attached to them, what vesicle population then give rise to sub-millisecond release?

      This is not a concern relevant to the present study because none of the conclusions rely on the model in any way (see Introduction, and lines 382-386 of the Discussion). Beyond that: We previously published clear evidence that docked vesicles are tethered to non-docked vesicles (Figure 8 of Wesseling et al., 2019). We see no reason to suspect that a tether to an internal vesicle would prevent the docked vesicle from priming for release.

      7) Model: For fitting de-staining during 20 Hz stimulation, authors state that it was necessary to allow >5-fold Facilitation. This seems to be non-physiologically relevant, since previous studies found only very mild facilitation at room temperature (typically below a factor of 1.5-2.0) and the authors themselves state that, at most, a 1.3 fold facilitation was found.

      If the 1.3-fold facilitation estimate comes from us, it must have been in a different context.

      Most estimates of facilitation that are published are heavily convolved with simultaneous depression, and there is additionally a saturation mechanism for readily releasable vesicles with high release probability that is not widely known (Garcia-Perez and Wesseling, 2008). The standard method for eliminating the depression is to lower the probability of release by lowering extracellular [Ca2+], which additionally relieves occlusion by the saturation mechanism. And, lowering [Ca2+] uncovers an enormous amount facilitation at synapses in hippocampal cell culture. For example, see Figure 2B of Stevens and Wesseling (1999), which shows a 7-fold enhancement during 9 Hz stimulation, and Figure 3 of the same study, which shows a linear relationship with frequency. Taken together these two results suggest 15-fold enhancement during 20 Hz stimulation, which far exceeds the 5-fold value needed at inefficient release sites to make our working model 1t the FM-dye destaining results.

      References

      Garcia-Perez E, Lo DC & Wesseling JF (2008). Kinetic isolation of a slowly recovering component of short-term depression during exhaustive use at excitatory hippocampal synapses. Journal of Neurophysiology 100, 781–95.

      Garcia-Perez E & Wesseling JF (2008). Augmentation controls the fast rebound from depression at excitatory hippocampal synapses. Journal of Neurophysiology 99, 1770–86.

      Huson V, van Boven MA, Stuefer A, Verhage M & Cornelisse LN (2019). Synaptotagmin-1 enables frequency coding by suppressing asynchronous release in a temperature dependent manner. Scienti1c reports 9, 11341.

      Stevens CF & Wesseling JF (1999). Augmentation is a potentiation of the exocytotic process. Neuron 22, 139–46.

      Wesseling JF & Lo DC (2002). Limit on the role of activity in controlling the release-ready supply of synaptic vesicles. Journal of Neuroscience 22, 9708–20.

      Wesseling JF, Phan S, Bushong EA, Siksou L, Marty S, Pérez-Otaño I & Ellisman M (2019). Sparse force-bearing bridges between neighboring synaptic vesicles. Brain Structure and Function 224, 3263–3276.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports comprehensive multi-omic data on the changes induced in young and aged male mouse tail fibroblasts after treatment with chemical reprogramming factors. The authors claim that chemical reprogramming factors induce changes consistent with a reduction of cellular 'biological' age (e.g., correlations with established aging markers in whole tissues). However, the study relies on previously identified aging markers (instead of aging in the tail fibroblast system itself), and thus, at this stage, the evidence in support of the observed molecular changes truly reflecting changes in biological age in the study system is still incomplete.

      Essential revisions

      After discussion with reviewers, we believe that the conclusions of the manuscript would be significantly strengthened with the following revisions:

      (1) Rather than basing the analysis of age-related markers on public tissue data, it is recommended that authors use their own data on pre-reprogramming fibroblasts to define molecular aging-related markers/signatures specifically for male tail fibroblasts at 4 vs 20 months. This should also always be included in figures as reference points.

      We appreciate these helpful comments. Please refer to our responses to Reviewers #1 and #2 concerning these suggestions and the corresponding changes we have made in the revised manuscript.

      (2) In general, the methods as written lack the details necessary to fully understand the study/reproduce it independently, notably in terms of data analysis choices (e.g. use of FWER/FDR type correction for multiple testing, use of raw vs normalized RNA counts for PCA, etc).

      Thank you for this feedback. We have modified our text to address this issue. Please refer to our responses to Reviewer #1 for the specific changes we have made.

      (3) More generally, the authors should better outline the limitations/caveats of their experimental design in the discussion and/or abstract, including the specific cell type and the choice of using only male data (since aging itself is very sex-dimorphic, and the impact of partial reprogramming on aging phenotypes may also be sex-dimorphic).

      Thank you for this important feedback. We have now added a section to our Discussion in which we directly address potential limitations of our study concerning sex-specific differences and the cell type used.

      Public Reviews:

      Reviewer #1:

      Summary:

      The investigators employed multi-omics approach to show the functional impact of partial chemical reprogramming in fibroblasts from young and aged mice.

      Strengths:

      Multi-omics data was collected, including epigenome, transcriptome, proteome, phosphoproteome, and metabolome. Different analyses were conducted accordingly, including differential expression analysis, gene set enrichment analysis, transcriptomic and epigenetic clock-based analyses. The impact of partial chemical reprogramming on aging was supported by these multi-source results.

      We appreciate the reviewer noting the strength and comprehensiveness of our approach.

      Weaknesses:

      More experimental data may be needed to further validate current findings.

      We thank the reviewer for this suggestion. To further validate our findings, we have proceeded as follows: (1) First, we have investigated the role of Prkaca activation during partial chemical reprogramming with 7c (see updated Fig. 5C, Fig. 5 – figure supplement 1B). By confocal microscopy, we show that partial chemical reprogramming with 7c does not cause Prkaca to localize to mitochondria; rather, its cellular distribution is altered to favor nuclear localization. We also use RNAi to knockdown Prkaca and find that Prkaca is not necessary for mediating the increase in mitochondrial membrane potential upon partial chemical reprogramming with 7c.

      (2) We have determined the effect of partial chemical reprogramming with 7c on apoptosis using Annexin V assay (see updated Fig. 5 – figure supplement 1C). We show that during the course of partial chemical reprogramming, the proportion of apoptotic cells steadily increases to about 20 percent.

      (3) We have re-analyzed our multi-omics data to determine the molecular differences (e.g. at the epigenome, transcriptome, proteome, and metabolome levels) between fibroblasts isolated from young and old mice (see updated Fig. 2 – figure supplement 1, Fig. 6 – figure supplement 1, and Fig. 7 – figure supplement 2). Additionally, we have updated Fig. 7A to include statistical comparisons of transcriptomic age of 4-month-old and 20-month-old fibroblasts. Finally, we have updated Fig. 3D to include functional enrichment of gene and protein expression levels of aged fibroblasts.

      (4) We have more thoroughly characterized the effects of partial chemical reprogramming on the epigenome (see Fig. 7 – figure supplement 3).

      (5) Julie Y. Chen was added on as an additional co-author for producing the analyses shown in Fig. 7 – figure supplement 2, and Fig. 7 – figure supplement 3.

      Reviewer #2:

      The short-term administration of reprogramming factors to partially reprogram cells has gained traction in recent years as a potential strategy to reverse aging in cells and organisms. Early studies used Yamanaka factors in transgenic mice to reverse aging phenotypes, but chemical cocktails could present a more feasible approach for in vivo delivery. In this study, Mitchell et al sought to determine the effects that short-term administration of chemical reprogramming cocktails have on biological age and function. To address this question, they treated young and old mouse fibroblasts with chemical reprogramming cocktails and performed transcriptome, proteome, metabolome, and DNA methylation profiling pre- and post-treatment. For each of these datasets, they identified changes associated with treatment, showing downregulation of some previously identified molecular signatures of aging in both young and old cells. From these data, the authors conclude that partial chemical reprogramming can rejuvenate both young and old fibroblasts.

      The main strength of this study is the comprehensive profiling of cells pre- and post-treatment with the reprogramming cocktails, which will be a valuable resource for better understanding the molecular changes induced by chemical reprogramming. The authors highlighted consistent changes across the different datasets that are thought to be associated with aging phenotypes, showing reduction of age-associated signatures previously identified in various tissues. However, from the findings, it remains unclear which changes are functionally relevant in the specific fibroblast system being used. Specifically:

      (1) The 4 month and 20 month mouse fibroblasts are designated "young" vs "old" in this study. An important analysis that was not shown for each of the profiled modalities was a comparison of untreated young vs old fibroblasts to determine age-associated molecular changes in this specific model of aging. Then, rather than using aging signatures defined in other tissues, it would be more appropriate to determine whether the chemical cocktails reverted old fibroblasts to a younger state based on the age-associated changes identified in this comparison.

      In our study, we have used 4 biological samples per group for young and old untreated fibroblasts, and these samples have been used to calculate the effect of 7c and 2c cocktails on gene expression in each age group. Therefore, the correlation between logFC induced by 7c/2c treatment and logFC between young and old fibroblasts would be biased, since the same untreated samples would be used in both calculations: estimates B-A and C-B will be, on average, negatively correlated even if A, B and C are independent random variables. For this reason, to investigate the effect of cocktails on biological age, we utilized gene expression signatures of aging, estimated based on more than 2,600 samples of different ages from 25 data sources (PMID: 37269831). Notably, our multi-tissue signatures of aging were identified based on data from 17 tissues, including skin. Therefore, these biomarkers seem to represent more reliable and universal molecular mechanisms of aging. Since they have been identified using independent data, the signatures also don’t introduce the statistical bias described above. For these reasons, we think that they are more applicable for the current analysis. To demonstrate that the utilized aging signatures are overall consistent with the changes observed in studied fibroblasts, we performed GSEA-based analysis, testing association between logFC in aged fibroblasts and various signatures of aging and reprogramming (similar to our analysis in Fig. 2E). We found that the changes in aged fibroblasts from the current study demonstrated positive association with the majority of aging signatures (kidney, liver and multi-tissue signatures in mouse and rat) (Fig. 2 – figure supplement 1A) and were negatively associated with signatures of reprogramming. In addition, we characterized functional changes perturbed in untreated aged fibroblasts at the level of gene expression and protein concentrations and observed multiple changes consistent with the aging signatures, such as upregulation of genes and proteins involved in inflammatory response and interferon signaling (Fig. 3D, Fig. 2 – figure supplement 1C). Therefore, changes observed in untreated aged fibroblasts seem to agree with age-related molecular changes identified across mammalian tissues in our previous studies.

      We would also like to mention that the epigenetic clocks used in this study consistently show that the fibroblasts from 20-month-old fibroblasts are significantly older than the fibroblasts from 4-month-old mice (Fig. 7B). Moreover, we have revised the manuscript to show that these epigenetic differences between young and old untreated fibroblasts are not due to overall changes in mean DNA methylation (Fig. 7 – figure supplement 2). In contrast, in the revised manuscript, we observe that 7c treatment is reducing the epigenetic age of cells by decreasing mean DNA methylation levels (Fig. 7 – figure supplement 3).

      (2) Across all datasets, it appears that the global profiles of young vs old mouse fibroblasts are fairly similar compared to treated fibroblasts, suggesting that the chemical cocktails are not reverting the fibroblasts to a younger state but instead driving them to a different cell state. Similarly, in most cases where specific age-related processes/genes are being compared across untreated and treated samples, no significant differences are observed between young and old fibroblasts.

      We agree that our data shows that partial chemical reprogramming seems to induce a similar effect on young and old fibroblasts. In Fig. 2 – figure supplement 1B, the Spearman correlation coefficients for the effects on gene expression in young and old fibroblasts are 0.80 and 0.85 for 2c and 7c, respectively. It is important to note that the effect of partial chemical reprogramming is a magnitude higher (say in terms of number of differentially expressed genes) than the effect of aging in the untreated fibroblasts. Partial chemical reprogramming with 7c, we believe, is pushing the cells to a younger state as a byproduct of producing a different cellular metabolic state with a strong increase in OXPHOS capacity.

      (3) Functional validation experiments to confirm that specific changes observed after partial reprogramming are indeed reducing biological age is limited.

      Functional validation of rejuvenating interventions is limited in vitro, as cells do not completely maintain their “aged” phenotype once isolated and cultured, and pursuing partial chemical reprogramming in vivo in naturally-aged mice was beyond the scope of the study. One of the best reporters of biological age that are preserved in primary cells in vitro are epigenetic and transcriptomic clocks, which were both utilized in this manuscript to show that 7c treatment, but not 2c, reduces biological age. We show that splicing-related damage is marginally elevated in old fibroblasts compared to young, and that 7c reduces splicing damage by reducing intron retention. Moreover, the epigenetic clocks used in this study show that the 20-month-old fibroblasts are significantly older than the 4-month-old fibroblasts, indicating that the “aged” phenotype is at least partially preserved. Furthermore, according to previous studies (PMIDs: 37269831, 31353263), one of the strongest functional biomarkers of aging is downregulation of mitochondrial function and energy metabolism, including oxidative phosphorylation, while upregulation of these functions is usually associated with extended lifespan in mice. For this reason, we have focused on these pathways in our study and assessed them with functional assays.

      (4) Partial reprogramming appears to substantially reduce biological age of the young (4 month) fibroblasts based on the aging signatures used. It is unclear how this result should be interpreted.

      This is a caveat of all reprogramming strategies/”anti-aging” interventions developed and tested to date. Currently, there are no genetic or pharmacological methods that target only the “aged” state and not the “young” state as well (i.e. an intervention that would only cause a change in old cells and revert them to a younger state). However, “young” cells in our study and many other studies are still the cells of an intermediate age, as aging appears to begin early during development. Therefore, perhaps unsurprisingly, partial chemical reprogramming seemed to have similar effects on fibroblasts isolated from young and old mice, which is in line with OSK/OSKM reprogramming. These results should be interpreted as follows: partial chemical reprogramming does not depend on the epigenetic state (biological age) of adult cells to induce rejuvenation. We have updated the discussion section of our manuscript accordingly.

      Recommendations for the authors:

      Reviewer #1:

      (1) How was the PCA conducted for RNA-seq data? Were the raw or normalized counts used for PCA?

      Normalized counts were used for PCA of the RNA-seq data.

      (2) Supplementary Fig 3c, why was the correlation between the red rows and red columns low? Was the color of group messed up? Why was the Pearson correlation used instead of Spearman correlation? Most of the correlation analyses in the manuscript used Spearman correlation.

      We thank the reviewer for noticing this mistake. The colors of the groups have now been corrected. Furthermore, to be consistent with the rest of the manuscript, we have performed a Spearman correlation analysis on the normalized proteomics data to evaluate sample-to-sample similarities and updated Fig. 3 – figure supplement 1 accordingly. Overall, the results are similar to those obtained by Pearson correlation.

      (3) Were the significant metabolites tested by one-way ANOVA adjusted for family-wise type I error rate? It is surprising that over 50% metabolites were significant.

      Yes, the significant metabolites were adjusted for family-wise type I error rate (with a 5% significance threshold) in Fig. 6B.

      (4) Missing full names of several abbreviations, such as NIA, RLE, PSI, etc.

      Thank you for noticing the missing abbreviations. We have corrected this by writing out the full term in the first instance in which each abbreviation appears.

      (5) Methods section may be too long. Some paragraphs could be moved to supplementary text.

      eLife does not have a limit to the number of figures or amount of text. Therefore, we have kept the methods section largely unaltered as we feel that they would be helpful to the scientific community.

      Reviewer #2:

      (1) As discussed in the public review, I would recommend first establishing what differences exist between 4 month and 20 month fibroblasts to identify potential age-related changes in these fibroblasts.

      We thank the reviewer for this suggestion. We have now thoroughly characterized the molecular differences between fibroblasts taken from young and old mice at the epigenome, transcriptome, proteome, and metabolome levels. Please refer to previous responses for more specific details.

      We have also attempted to establish aging-related differences at the phosphoproteome level, particularly in regards to mitochondrial processes (see figure below), but only GOcc: mitochondrion and GObp: mitochondrial transport come close to being statistically significant (raw p-values of 0.05 and 0.08, respectively) in the control comparison.

      Author response image 1.

      (2) While the global changes currently highlighted in the study are informative and should remain in the revised manuscript, additional analyses to show which age-related changes identified in point 1 are reverted upon 2c or 7c treatment would better address the question of whether these cocktails revert age-related changes seen in fibroblasts. These analyses should be performed for each dataset (i.e transcriptomic, proteomic, epigenomic, metabolomic) generated.

      Thank you for this comment. We have now evaluated the effects of partial chemical reprogramming on the specific molecular differences between fibroblasts isolated from young and old mice (see updated Fig. 2 – figure supplement 1, Fig. 6 – figure supplement 1, Fig. 7 – figure supplement 2, and Fig. 7 – figure supplement 3). For functional enrichment of aged fibroblasts at the gene and protein level, please refer to updated Fig. 3D.

      (3) Comparisons between partial reprogramming and OSKM reprogramming signatures are repeatedly made in the paper, but it is not clear from the text whether similarity to OSKM reprogramming signatures is a desired or undesired feature. Since there are likely both rejuvenating and oncogenic aspects of the OSKM signatures, it is unclear what conclusions can be made from these comparisons.

      Two central questions of this study were (1) if partial chemical reprogramming could induce cellular rejuvenation, and (2) if so, would it do so by merely chemically activating expression of Yamanaka factors. In this study, we find that 7c, the cocktail that demonstrated the most profound effect on biological age, only minorly upregulates Klf4, downregulates c-Myc, and has no effect on Sox2 or Oct4 expression. Thus, partial chemical reprogramming seems to operate through a mechanism independent of upregulating OSK/OSKM gene expression. This is crucial as it suggests that there are other transcription factors outside of OSKM that can be targeted to induce cellular rejuvenation and reversal of biological age. However, the direct transcriptional targets of partial chemical reprogramming are currently unknown and require further investigation.

      Partial reprogramming with OSK/OSKM has several limitations, including low efficiency, oncogenic risk, and differences in the speed of reprogramming according to cell/tissue type. These risks could be inherently tied to the transcription factors OSKM themselves; thus, partial chemical reprogramming, by avoiding strong activation of these genes, could potentially avoid these risks and provide a safer means for reversing biological age in vivo. However, extensive follow-up studies beyond the scope of this manuscript are certainly required to determine this.

      We have addressed this comment by modifying the discussion to include these points.

      (4) When analyzing the phospho-proteomics data, results are discussed as general changes in phosphorylation of proteins involved in different cellular processes. However, phosphorylation can either activate or inhibit a specific protein, and can depend on the specific residue in a protein that is modified. Different proteins in a cellular process can also respond in opposite directions to phosphorylation. Treating activating and inactivating phosphorylation events separately in describing these results would be more informative.

      We agree that an analysis that considers for each specific phosphosite whether it activates or inactivates a particular pathway would in principle be preferable over our current enrichment analysis that only accounts for the increase or decrease in phosphorylation of each site without knowing its biological meaning. However, unfortunately, we think it is currently practically not possible to conduct such an analysis. The proposed analysis would require a database with information on which residues are (de-)phosphorylated when a certain pathway is activated. However, as far as we know, there are currently no databases that link activation or inactivation of specific phosphosites to pathways in repositories like KEGG, HALLMARK, GObp, GOcc, GOmf, Reactome, etc.

      Some databases link phosphosites to drugs, diseases and kinases (e.g. PTMsigDB (PMID: 30563849)). However, these authors explicitly state: “We note that we do not capture functional annotations of PTM sites in PTMsigDB, such as activating or inactivating effect on the modified protein.” Furthermore, even in these databases, for the vast majority of the registered phosphosites, the responsible kinases are unknown, especially in mice. In our work, we made use of PhosphoSitePlus for kinase substrate enrichment analysis (see Fig. 5B). Such analyses, where kinase activity is inferred based on activated phosphosites are indeed commonly performed (see PMIDs: 34663829, 37269289, 37585503).

      In the absence of a repository that assigns activity to phosphosites, if enrichment analysis is being done for biological pathways, it is standard practice to so without accounting for whether phosphosites are activating or inactivating (see PMID: 34663829), as we have done in our manuscript (Fig. 5A).

      Despite the drawbacks, we believe our analysis is relevant, as it demonstrates important biological activity in these pathways uopn 2c/7c treatments as compared to controls. For example, the observed increase in abundance in mitochondrial OXPHOS complexes (Fig. 3E) combined with an increase in general phosphorylation of mitochondrial proteins (Fig. 5A) likely points to an increase mitochondrial activity, although one cannot exclude that some individual phosphorylation events might have inhibitory effects on certain mitochondrial proteins, while others might indicate increases in activity.

      (5) For the transcriptomic and epigenetic aging clocks used in Fig 7, significance tests need to be included for untreated 4 month vs 20 month fibroblasts. Particularly for the transcriptional clock, the differences are small and suggest that it may not be a strong aging signature.

      We have updated our clock analysis with the most recent versions of the clocks and added statistical significance between 4-month-old and 20-month-old untreated fibroblasts there (Fig. 7A). The difference is statistically significant for the chronological clock. However, when the lifespan-adjusted clock was applied, no statistical significance was observed, suggesting that 20-month-old fibroblasts do not exhibit substantial changes in gene expression associated with decreased healthspan and increased mortality.

      (6) For heatmaps shown in Figure 3D and Figure 4, please include untreated 4 month and 20 month fibroblasts as well to determine if pathways being compared are different between young and old fibroblasts.

      We have updated Figure 3D with functional enrichment results for aged fibroblasts at gene and protein expression levels, as requested. As for Fig. 4, we explained in our reply to point 1 of Reviewer #2 in the public review why addition of aged fibroblasts there would be biased there. Instead, we have performed GSEA-based association analysis for changes observed in aged fibroblasts and signatures of aging (Fig. 2 – figure supplement 1), confirming that our signatures are overall consistent with patterns of 20-month-old fibroblasts from the current study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      The authors investigated the antigenic diversity of recent (2009- 2017) A/H3N2 influenza neuraminidases (NAs), the second major antigenic protein after haemagglutinin. They used 27 viruses and 43 ferret sera and performed NA inhibition. This work was supported by a subset of mouse sera. Clustering analysis determined 4 antigenic clusters, mostly in concordance with the genetic groupings. Association analysis was used to estimate important amino acid positions, which were shown to be more likely close to the catalytic site. Antigenic distances were calculated and a random forest model was used to determine potential important sites.

      This has the potential to be a very interesting piece of work. At present, there are inconsistencies in the methods, results and presentation that limit its impact. In particular, there are weaknesses in some of the computational work.

      Strengths

      (1) The data cover recent NA evolution and a substantial number (43) of ferret (and mouse) sera were generated and titrated against 27 viruses. This is laborious experimental work and is the largest publicly available neuraminidase inhibition dataset that I am aware of. As such, it will prove a useful resource for the influenza community.

      (2) A variety of computational methods were used to analyse the data, which give a rounded picture of the antigenic and genetic relationships and link between sequence, structure and phenotype.

      Weaknesses

      (1) Inconsistency in experimental methods

      Two ferret sera were boosted with H1N2, while recombinant NA protein for the others. This, and the underlying reason, are clearly explained in the manuscript. The authors note that boosting with live virus did not increase titres. Nevertheless, these results are included in the analysis when it would be better to exclude them (Figure 2 shows much lower titres to their own group than other sera).

      As an exercise, we have excluded the H1N2 boosted ferrets sera and no major impact was observed in the antigenic grouping (see Author response image 1a). Another way to control for differences in immunogenicity is to normalize the NAI values with the homologous ELISA titers for each antigen. Clustering based on these ELISA normalized NAI titers reveals the same 4 distinct antigenic groups but with one change: Kan17 is shifted from group 1 to group 2 (Author response image 1b). Note that a homologous ELISA titer is not available for A/West-Virginia/17/2012 and thus this serum sample is not included in Author response image 1b.

      Author response image 1.

      Antigenic and phylogenetic relatedness of N2 NAs. Phylogenetic tree based on the N2 NA head domain amino acid sequences and heat-map representing the average of normalized neuraminidase inhibition titer per H6N2 [log2 (max NAI/NAI)] determined in ferret sera after the boost (listed vertically). The red-to-blue scale indicates high-to-low NAI observed in ELLA against the H6N2 reassortants (listed at the bottom). UPGMA clustering of H6N2s inhibition profiles are shown on top of the heat map and colored according to the phylogenetic groups.(a) Based on the ferret sera with exclusion of the sera that were obtained following prime-boost by infection with H1N2 (A/Estonia/91625/2015 and A/Stockholm/15/2014). (b) Based on serum NAI titers that were normalized by the homologous ELISA titer.

      (2) Inconsistency in experimental results

      Clustering of the NA inhibition results identifies three viruses which do not cluster with their phylogenetic group. Again, this is clearly pointed out in the paper. Further investigation of this inconsistency is required to determine whether this has a genetic basis or is an experimental issue. It is difficult to trust the remaining data while this issue is unresolved.

      We understand the concern of the reviewer. It is important to keep in mind that discrete grouping of antigens allows to visualize major antigenic drifts. However, within closely related groups the cross reactivity of antisera is more likely distributed in a spectrum. When we constructed an antigenic map based on the antigenic cartography algorithm (as described by Smith D. et al, 2004), Kansas17, Wis15, and Ala15 are positioned more closely to antigenic group 1 than the majority of other antigens that were classified as group 2 (Author response image 2a). Similar results were obtained when individual ferret sera from the biological duplicates were used (Author response image 2b). This antigenic cartography map is now added as Figure 2. Figure supplement 3 to the revised manuscript.

      Author response image 2.

      The antigenic cartography was constructed using averaged data from pairs of ferrets (a). Similar analysis was performed on individual ferrets sera (b).

      (3) Inconsistency in group labelling

      A/Hatay/4990/2016 & A/New Caledonia/23/2016 are in phylogenetic group 1 in Figure 2 and phylogenetic group 1 in Figure 5 - figure supplement 1 panel a.

      Our apologies: there was indeed a mistake in labeling of Figure 5. A new antigenic cartography was constructed and included in the revised manuscript. As a result Figure 5 - figure supplement has now become redundant and was removed from the manuscript.

      A/Kansas/14/2017 is selected as a representative of antigenic group 2, when in Figure 2 it is labelled as AC1 (although Figure 2 - supplement 4 which the text is referring to shows data for A/Singapore/Infimh-16-0019/2016 as the representative of AC2). A/Kansas/14/2017 is coloured and labelled as AC2 in Figure 2 - supplement 5.

      Thank you for pointing out this inconsistency. Kan17 clustered antigenically in group 1 based on the NAI values that were normalized relative to the serum with the maximal NAI value against the H6N2 virus that was tested. When using NAI titers that are normalization with the homologous ELISA titer, Kan17 is positioned in group 2. Likewise, antigenic cartography mapping positions Kan17 in group 2. Therefore, we conclude that A/Kansas/14/2017 NA is a representative of group 2.

      The colouring is changed for Figure 3a at the bottom. A/Heilongjiang-Xiangyang/1134/2011 is coloured the same as AC4 viruses when it is AC1 in Figure 2. This lack of consistency makes the figures misleading.

      We apologize for this mistake. The coloring in Figure 3a has been corrected.

      (4) Data not presented, without explanation

      The paper states that 44 sera and 27 H6N2 viruses were used (line 158). However, the results for the Kansas/14/2017 sera do not appear to be presented in any of the figures (e.g. Figure 2 phylogenetic tree, Figure 5 - figure supplement 1). It is not obvious why these data were not presented. The exclusion of this serum could affect the results as often the homologous titre is the highest and several heatmaps show the fold down from the highest titre.

      Serum against A/Kansas/14/2017 was not prepared. For that reason, it is not included in the analysis. We agree that such homologous serum ideally should have been included and in the NAI assay would have resulted in a high if not the highest titer. However, we noticed that homologous sera did not always have the highest titers, especially in panels like ours were some antigens are closely related. The highest titer obtained against Kan17 H6N2 was from A/Bris/16 sera: 1/104, a titer that is in the range of other, homologous titers observed in the panel (Table S3). The Bris16 and Kan17 NAs have five amino acid differences. In summary, inclusion of Kan17 homologous sera would likely not impact the analysis and interpretation of the results because there are multiple highly cross-inhibiting heterologous serum samples against Kan17.

      (5) The cMDS plot does not have sufficient quality assurance A cMDS plot is shown in Figure 5 - figure supplement 1, generated using classical MDS. The following support for the appropriateness of this visualisation is not given. a. Goodness of fit of the cMDS projection, including per point and per titre. b. Testing of the appropriate number of dimensions (the two sera from phylogenetic group 3 are clustered with phylogenetic group 2; additional dimensions might separate these groups). c. A measure of uncertainty in positioning, e.g. bootstrapping. d. A sensitivity analysis of the assumption about titres below the level of detection (i.e. that <20 = 10). Without this information, it is difficult to judge if the projection is reliable.

      We agree with these comments. We have removed Figure 5 – figure supplement 1, and added new figure 2 – figure supplement 3 (antigenic cartography) instead.

      (6) Choice of antigenic distance measure

      The measure of antigenic distance used here is the average difference between titres for two sera. This is dependent on which viruses have been included in the analysis and will be biased by the unbalanced number of viruses in the different clusters (12, 8, 2, 5).

      To verify the impact of the number of antigens on our analysis, the matrix of differences was generated with only 4 H6N2s representing at least one phylogenetic group (Per09, Sin16, Hel823 and Ind11) (Author response image 3a). This matrix is very similar to the one calculated based on all 27 antigens (Author response image 3b). The obtained matrix (Author response image 3a) was used in random forest to model antigenic distances and the result of prediction was plotted against real differences calculated based on the full data. The correlation coefficient (R2) of predicted vs observed values dropped from 0.81 to 0.71, suggesting that the number of antigens tested does not drastically affect the antigenic differences calculated based on serum values (Author response image 3e). Importantly, amino acid substitutions potentially associated with increased antigenic distances are similarly identified (Author response image 3c, d and f).

      Author response image 3.

      Matrix of differences was calculated using only 4 H6N2 antigens (a) or the full panel (b). The matrixes from (c) 4 or (d) 27 antigens were used in random forest modeling to estimate the impact of amino acid changes, respectively. The rf modeling data generated from 4 H6N2 only was plotted and correlated with values calculated from the full panel of 27 H6N2s (e). The multi-way importance plot indicates in red that 7 out of the 10 most important substitutions were identified by the analysis using only 4 H6N2s (f).

      Interestingly, when matrix of differences is calculated using only 4 H6N2s data but not including at least one representative of antigenic group 1 and 2, the correlation coefficient between the predicted values and values obtained from the full panel is dramatically impacted (R2 values drops from 0.81 to 0.5 and 0.57. It is important to note that most of the sera also belong to phylogenetic antigens from groups 1 and 2. As a consequence, poorer prediction of those antigens would more drastically impact the correlation. No drastic drop was observed when representative H6N2s from group 3 or 4 were excluded from the data (from 0.81 to 0.75 and 0.73, Author response image 4 c and d).

      Author response image 4.

      Random forest analysis was repeated using only 4 antigens, but excluding representatives of one of the phylogenetic groups (a) no group 1, (b) no group 2, (c) no group 3, and (d) no group 4.

      We also used Euclidean distances as a measure of differences (Author response image 5). The predictive values obtained in rf have a slightly reduced R2 compared to the values obtained using average of differences.

      In conclusion the unbalanced number of antigens used per group and metric of distance does not seem to impact per se our analysis.

      Author response image 5.

      Antigenic distances were calculated using Euclidian distances of sera to sera. Those antigenic distances were used in rf for estimation of antigenic distance and importance of each amino acid substitution.

      (7) Association analysis does not account for correlations

      For each H6N2 virus and position, significance was calculated by comparing the titres between sera that did or did not have a change at that position. This does not take into account the correlations between positions. For haemagglutinin, it can be impossible to determine the true antigenic effects of such correlated substitutions with mutagenesis studies.

      Most of the potential correlated effects cannot be addressed with the panel of N2s, except for combinations of substitution that are included in the panel, such as 245/247 with or without 468. Only mutagenesis studies would shed light on the epistatic effects. However, it is important to keep in mind that those individual substitutions in such kind of study likely do not reflect natural evolution of N2 (cfr. the importance of the NA charge balance (Wang et al., 2021: 10.7554/eLife.72516).

      (8) Random forest method

      25 features are used to classify 43 sera, which seems high (p/3 is typical for classification). By only considering mismatches, rather than the specific amino acid changes, some signals may be lost (for example, at a given position, one amino acid change might be neutral while another has a large antigenic effect). Features may be highly, or perfectly correlated, which will give them a lower reported importance and skew the results.

      The number of features were optimized in the range from 5 to 80, with 25 being optimal (best R-value in predicted vs observed antigenic distances). Those features refer to the number of amino acid substitutions used in each tree. The number of trees was also optimized in the range of 100 to 2000.

      In random forest the matrix of differences is made considering only position based and not the type of substitution in pairs of NA. Indeed, substitutions with distinct effects may skew results by indicating lower reported importance.

      We have highlighted such potential bias in our discussion:

      “Also, our modelling does not consider that substitution by other amino acids can have a distinct impact on the antigenic distance. As a consequence, predictions based on the model could underestimate or overestimate the importance of a particular amino acid residue substitution in some cases.”

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the antigenicity of N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 using ferret and mice immune sera. Four antigenic groups were identified, which correlated with their respective phylogenic/ genetic groups. Among 102 amino acids differed by the 44 selected N2 proteins, the authors identified residues that differentiate the antigenicity of the four groups and constructed a machine-learning model that provides antigenic distance estimation. Three recent A(H3N2) vaccine strains were tested in the model but there was no experimental data to confirm the model prediction results.

      Strengths:

      This study used N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 and generated corresponding panels of ferret and mouse sera to react with the selected strains. The amount of experimental data for N2 antigenicity characterization is large enough for model building.

      Weaknesses:

      The main weakness is that the strategy of selecting 44 A(H3N2) viruses from 2009-2017 was not explained. It is not clear if they represent the overall genetic diversity of human A(H3N2) viruses circulating during this time. A comprehensive N2 phylogenetic tree of human A(H3N2) viruses from 2009-2017, with the selected 44 strains labeled in the tree, would be helpful to assess the representativeness of the strains included in the study.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method calculates MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization of in a phylogenetic tree, only 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 6.

      The second weakness is the use of double-immune ferret sera (post-infection plus immunization with recombinant NA protein) or mouse sera (immunized twice with recombinant NA protein) to characterize the antigenicity of the selected A(H3N2) viruses. Conventionally, NA antigenicity is characterized using ferret sera after a single infection. Repeated influenza exposure in ferrets has been shown to enhance antibody binding affinity and may affect the cross-reactivity to heterologous strains (PMID: 29672713). The increased cross-reactivity is supported by the NAI titers shown in Table S3, as many of the double immune ferret sera showed the highest reactivity not against its own homologous virus but to heterologous strains. Although the authors used the post-infection ferret sera to characterize 5 viruses (Figure 2, Figure Supplement 4), the patterns did not correlate well. If the authors repeat the NA antigenic analysis using the post-infection ferret sera with lower cross-reactivity, will the authors be able to identify more antigenic groups instead of 4 groups?

      This is a very valuable remark. In their paper, Kosikova et al. (CID 2018) report that repeated infection of ferrets with antigenically slightly different H3N2 viruses results in a broader anti-HA response, compared to a prime infection of an influenza naïve ferret, which results in a narrower anti-HA response. In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Author response image 7). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      Author response image 7.

      Correlation obtained on NAI data from ferrets at day 14 after infection vs data from day 42 after boost.

      Another weakness is that the authors used the newly constructed model to predict the antigenic distance of three recent A(H3N2) viruses but there is no experimental data to validate their prediction (eg. if these viruses are indeed antigenically deviating from group 2 strains as concluded by the authors).

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 8 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      Author response image 8.

      Antigenic distances from Swe17 and HK17 calculated using the random forest algorithm that was constructed without experimental data from Swe17 and HK17. The predicted distances were plotted side by side to the experimental distances in (a) and correlations are shown in (b).

      Reviewer #3 (Public Review):

      Summary:

      This paper by Portela Catani et al examines the antigenic relationships (measured using monotypic ferret and mouse sera) across a panel of N2 genes from the past 14 years, along with the underlying sequence differences and phylogenetic relationships. This is a highly significant topic given the recent increased appreciation of the importance of NA as a vaccine target, and the relative lack of information about NA antigenic evolution compared with what is known about HA. Thus, these data will be of interest to those studying the antigenic evolution of influenza viruses. The methods used are generally quite sound, though there are a few addressable concerns that limit the confidence with which conclusions can be drawn from the data/analyses.

      Strengths:

      • The significance of the work, and the (general) soundness of the methods.

      • Explicit comparison of results obtained with mouse and ferret sera.

      Weaknesses:

      • Approach for assessing the influence of individual polymorphisms on antigenicity does not account for the potential effects of epistasis.

      Indeed, possible epistatic effects or individual polymorphisms were not assessed, which is limited by the nature of the panel of N2s selected in the study. We now emphasize this in the discussion as follows:

      “Also, our modelling does not consider that substitution by different amino acids can have distinct impact on antigenic distance. As a consequence, predictions based on the model could underestimate the importance of a particular amino acid residue substitution in some cases.”

      • Machine learning analyses were neither experimentally validated nor shown to be better than simple, phylogenetic-based inference.

      This is a valid remark and indeed we have found a clear correlation between NAI cross reactivity and phylogenetic relatedness. However, besides achieving good prediction of the experimental data (as shown in Figure 5 and in FigureR7), machine Learning analysis has the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. ML can also support the selection and design of broader reactive antigens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major corrections

      No major corrections, beyond the issues I touched on in the public review, for which I give a little more detail below:

      Point 2. If there's not a putative genetic basis for the unexpected clustering seen in the NAI, then reiterating a small subset of the data would show the reliability of the experimental methods and substantiate this unexpected finding.

      We thank the reviewer for this pertinent point and suggestion. We have modified our analysis by reiterating individual ferret data normalized with the homologous ELISA titers. This reiteration is shown in figure R1b. In this case both Kan17 and Wis15 are switched to antigenic group 2. The profile of sera inhibition against those 2 strains that shift from antigenic cluster 1 to 2, is clearly an intermediate between profiles observed in those 2 groups. Considering that antigenic evolution occurs gradually, it is not unexpected that those intermediate profiles would swing from one side to another when pushed to forced discrimination. Antigenic cartography mapping, as in Smith et al. (2004), also indicated that those H6N2s are located closer to G1 than overall antigens from G2. Raw data distribution (max and min EC50) also do not indicate potential bias in analysis.

      Point 5. If you want to use antigenic cartography (Smith et al 2004), there is the R CRAN package (https://CRAN.R-project.org/package=Racmacs) which can handle threshold titres (like <20) and has functions for the diagnostic tools I describe, in order to quality assure the resulting plot. It does use a different antigenic distance metric than the paper currently uses, so you might not want to take that route.

      Thank you for this suggestion. We have performed antigenic cartography using the methodology described by Smith et al made accessible by Sam Wilks. The outcome of this analysis has been added to the manuscript as Figure 2 – Figure supplement 3.

      Point 6. More robust measures of antigenic distance take into account the homologous titre, homologous and heterologous titres (Archetti & Horsfall, 1950) or use the highest observed titre for a serum (Smith et al 2004). A limitation of the first two is that the antigenic distance can only be calculated when you have the homologous titre, which will limit you as you only have this for 26/43 sera. They may give similar results to your average antigenic distance, in which case your analysis still stands. Calculating antigenic distance using the homologous or maximum titre only gives the antigenic distance between the antigen and the serum. If you want the distance between all the sera, then further analysis is required (making an antigenic map and outputting the serum-serum distances, see the point above).

      We thank the reviewer for these suggestions. A complete set of 43 H6N2 viruses that matches all 43 sera would have been ideal. This would require the generation of 17 additional H6N2 viruses and their testing in ELLA, a significant amount of work in terms of time and resources. Instead, we have generated an antigenic map of the 27 antigens and homologous sera (cfr. our response to point 5 above). Despite different methods the outcome showing 4 major antigenic groups is consistent.

      Minor corrections

      Table S1

      A/New_Castle/67/2016 should be A/Newcastle/67/2016

      A/Gambia/2012 is not the full virus name

      Corrected.

      Table S3 has multiple values of exactly 10.0. I think these should be <20 as they are below the threshold of detection for the assay.

      All the values lower than 20 in Table S3 were replaced by “< 20”.

      Line 376: A/Sidney/5/1997 should be A/Sydney/5/1997

      Corrected.

      Line 338: "25 randomly sampled data" is a bit vague, "25 randomly sampled features" would be better

      Corrected.

      Include RMSE of the random forest model.

      RMSE=19.6 RMSE/mean = 0.207 is now mentioned in the manuscript.

      Figure 5 - supplement 1: These plots are difficult to interpret as the aspect ratio is not 1:1, and panels a & b are difficult to compare as they have not been aligned (using a Procrustes analysis). It would be neater if they were labelled with short names.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Line 562: 98 variable residues, where it is 102 elsewhere in the text.

      There are 4 mutations near the end of the NA stalk domain, which are not resolved in the N2 structure. Therefore, amino acid distances to these residues cannot be calculated.

      No data availability statement. Some of the raw data is available in Table S3 and there is no link to the code.

      The data and code used for generation of rf modelling was uploaded to Github and made available. The following statement has been added to the manuscript: “The data and code used for the generation of the rf model is available at https://github.com/SaelensLAB/RF..”

      Reviewer #2 (Recommendations For The Authors):

      (1) More than 42,000 NA sequences are available for the mentioned period on GISAID, it is therefore important to understand the selection criteria for the 44 strains and if these strains represent the overall genetic diversity of N2 of human A(H3N2) viruses. To demonstrate the representativeness of the 44 selected strains, please construct a representative N2 phylogenetic tree for human A(H3N2) viruses circulated in 2009-2017 and label the 44 selected strains on the tree.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method uses MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization tree only of 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 9.

      (2) Double immune ferret sera may increase antibody binding affinity and cross-reactivity against heterologous strains. Using single-infection ferret sera may yield different antigenic grouping results (eg. may identify more antigenic groups). Can the authors repeat the NA antigenic grouping using single-infection ferret sera? Although data from a subset of 5 strains was presented (Figure 2, Figure Supplement 4), the information was not sufficient to support if the use of single-infection or double immune ferret sera will yield similar antigenic grouping results.

      In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Figure R6). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      (3) NA antigenicity data is presented in heat maps and the authors would often describe the heat map patterns matches without further explanations. Line 234-235, the heat map of mouse sera (Figure 2. Figure supplement 5) was described to match the results of ferret sera (Figure 2), but this tends to be subjective. A correlation analysis of 7 selected antigens showed a positive correlation, what about the other 37 antigens?

      The interpretation of heatmaps is indeed very subjective, for this reason the correlation of the 7 selected antigens was also provided. The other 37 antigens were not tested. Considering the results using post boost sera, a simulation of using random forest modeling indicate that the data from one antigen of each antigenic group is sufficient to achieve a reliable predictive output (R2=0.71) (Figure R3 of this rebuttal).

      (4) Can the authors explain in more detail how data in Figure 4a was generated? According to the authors, residues close to the catalytic pocket are more likely to impact NAI. Can the authors explain how they define if a residue is close to the catalytic pocket?

      The correlation of distances of amino acid residues with significance values is explained as follows. Consider 7 distinct elements that are distributed horizontally as shown by the squares in the figure below (Author response image 10a). The elements highlighted in yellow have a numerical propriety (in case of N2 neuraminidase this was the significance values obtained in the association study). Taking P1 as reference we can calculate the distance (red arrows) between P1 and P2, P4 and P7, those distances can them be correlated to intrinsic values of P2, P4 and P7, which enables the calculation of the correlation coefficient Tau. This same process is repeated for each position (or each amino acid), as a consequence every position will have a correlation coefficient calculated (Author response image 8b). This correlation coefficient can be represented as a heat map at the surface of N2.

      Author response image 10.

      The 2D scheme represents the strategy used to calculate the correlation (i.e. the Tau values) between distances and p-values. Tau values can then be presented in a heat map.

      (5) Can the authors provide experimental data using the three recent A(H3N2) viruses as antigens and perform NAI assay to confirm if they are antigenic all deviating from group 2 viruses?

      The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      (6) According to Ge et al. 2022 (PMID: 35387078), N2 NA's before 2014 (2007-2013) showed a 329-N-glycosylation and E344, and they were subsequently replaced by H3N2 viruses with E344K and 329 non-glycosylation changing the NI reactivity in ferret antisera towards later strains. Were these residues also predicted to be important to N2 antigenicity from your machine-learning method?

      Three of the N2 NAs used in our panel, A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in another 3 NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted in our modeling. However, the differences in NAI reported by Ge et al. are low (not even twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI. We have added the following to the discussion in our revised manuscript:

      “It has been reported that an N-glycosylation site at position 329 combined with E344 in NA from human H3N2 viruses from 2007 to 2013 was gradually lost in later H3N2 viruses (Ge et al., 2022). This loss of an N-glycosylation site at position 329 combined with an E344K substitution was associated with a change in NAI reactivity in ferret sera. Three N2 NAs in our panel, derived from A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in three other NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted by our modeling. However, the differences in NAI reported by Ge et al. are very modest (lower than twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI.”

      Reviewer #3 (Recommendations For The Authors):

      Specific suggestions:

      Line 132: Did the authors confirm the absence of compensatory mutations due to a heterologous H6 background that could potentially confound downstream NAI results?

      All NAs genes of the rescued H6N2 viruses were fully sequenced and were found to be identical to the expected NA sequences, with the only exception being the A/Tasmania/1018/2015 were a mixed population of wt and M467I was found. This substitution is located at the surface and at the top of the NA head domain, and thus could potentially impact NA antigenicity. However, A/Tasmania/1018/2015 H6N2s had a similar inhibition profile as other H6N2s in phylogenetic and antigenic group 1. This indicates that, at least in this mixed population, antigenicity was not drastically affected by the M467I substitution.

      Line 96: how do these data rule out variation in the fraction of properly folded protein across NAs? They certainly show that properly folded NA protein is present, but not whether amounts vary between the different NAs.

      SEC-MALS (size exclusion chromatography-Multiangle light scattering) data and enzymatic activity were considered as a proxy for correctly folded NA. Although the specific activity of the recombinant N2 NAs is expressed per mass unit (microgram), we cannot exclude that the fraction of properly folded protein across the different recombinant NAs may vary.

      Lines 262-269: this analysis approach (based on my reading) seems to consider each polymorphism in isolation and thus does not seem well suited for accounting for epistatic interactions within the NA. For example, the effect of a substitution on NAI may be contingent upon other alleles within NA that are not cleanly segregated between the two serum comparator groups. Can the authors address the potential of epistasis within NA to confound the results shown in Figure 3?

      Unfortunately, epistatic interactions cannot be solved using the panel of N2 selected for the study. This limitation is mentioned in our discussion:

      “It is important to highlight that co-occurring substitutions in our panel (the ones present in the main branches of the phylogenetic tree) cannot be individually assessed by association analysis or the random forest model. The individual weight of those mutation on NA drift thus remains to be experimentally demonstrated.”

      Line 331: is there a way to visualize and/or quantify how these two plots (F5 supplement 1a/b) reflect each other or not? Without this, it is hard to ascertain how they relate to each other.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Figure 4B structural images are not well labelled.

      The active site in 1 of the protomers is now indicated with an arrow in the top and side views of the NA tetramer.

      Lines 339-359: the ML predictions are just predictions and kind of meaningless without experimental validation of the predicted antigenic differences between recent NAs. This section would also be strengthened by an assessment of whether the ML approach obtains more accurate results than simply using phylogeny to predict antigenic relationships.

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in figure R7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively. A major advantage of antigenic modeling is the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. The support in selecting or designing broader reactive antigens is another advantage of machine learning analysis.

      Lines 416-421: appreciate the direct comparison of results obtained from ferrets versus mice.

      We thank the reviewer for expressing this appreciation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Lee et al. compared encoding of odor identity and value by calcium signaling from neurons in the ventral pallidum (VP) in comparison to D1 and D2 neurons in the olfactory tubercle (OT).

      Strengths:

      They utilize a strong comparative approach, which allows the comparison of signals in two directly connected regions. First, they demonstrate that both D1 and D2 OT neurons project strongly to the VP, but not the VTA or other examined regions, in contrast to accumbal D1 neurons which project strongly to the VTA as well as the VP. They examine single unit calcium activity in a robust olfactory cue conditioning paradigm that allows them to differentiate encoding of olfactory identity versus value, by incorporating two different sucrose, neutral and air puff cues with different chemical characteristics. They then use multiple analytical approaches to demonstrate strong, low-dimensional encoding of cue value in the VP, and more robust, high-dimensional encoding of odor identity by both D1 and D2 OT neurons, though D1 OT neurons are still somewhat modulated by reward contingency/value. Finally, they utilize a modified conditioning paradigm that dissociates reward probability and lick vigor to demonstrate that VP encoding of cue value is not dependent on encoding of lick vigor during sucrose cues, and that separable populations of VP neurons encode cue value/sucrose probability and lick vigor.

      Weaknesses:

      The conclusions of the data are mostly well supported by the analyses, but the statistical analysis is somewhat limited and needs to be clarified and extended.

      (1) The manuscript includes limited direct statistical comparison of the neural populations, and many of the comparisons between the subregions are descriptive, including descriptions of the percentage of neurons having specific response types, or differences in effect sizes or differing "levels" of significance. An additional direct comparison of data from each subpopulation would help to confirm whether the differences reported are statistically meaningful.

      Response: We thank the reviewer for their helpful suggestions. As the reviewer noted, the first version of our manuscript had limited direct comparisons of single-neuron metrics across subpopulations. These analyses were also limited to the supplementary figures: 1) {SK vs. XK} and {SK vs. ST} decoder auROC (S10F), 2) Valence scores (S10G), and 3) S-cue confusion after MNR classification (S11D). We have now included the following statistical comparisons of single-neuron metrics across subpopulation: 1) % of neurons that respond to both S cues (Tables S10, S11), 2) % of neurons that have auROC >0.75 for {SK vs. XK}, {SK vs. PK}, and {SK vs. ST} (Tables S12-S17), 3) response magnitudes to S cues (Table S38), and 4) valence scores (Tables S44-46).

      (2) When hypothesis tests are conducted between the neural populations, it is not clear whether the authors have accounted for the random effect of the subject, or whether individual units were treated as fully independent. For instance, pairwise differences are reported in Figures 4I, 5G/I/L, and others, but the statistical methods are unclear. Assessment of the statistics is further limited by the lack of reporting of degrees of freedom. If the individual neurons are treated as independent in these analyses, it could increase the likelihood of

      Response: We have clarified when statistical analyses are comparing individual neurons vs. simultaneously recorded populations. Per the reviewer’s recommendation, we have also incorporated linear mixed-effects models when statistically analyzing individual neurons. Lastly, to further clarify the statistical analyses used, we have added multiple supplementary tables that better describe the statistical tests used and the relevant outputs.

      Reviewer #2 (Public Review):

      Summary:

      This work is interesting since the authors provide an in vivo analysis into how odor-associations may change as represented at the level of olfactory tubercle (presynaptic) and next at the level of the ventral pallidum (postsynaptic). First the authors start-off with a seemingly careful characterization of the anterograde and retrograde connectivity of dopamine 1 receptor (D1) and dopamine 2 receptor (D2) expressing medium spiny neurons in the olfactory tubercle and neurons in the ventral pallidum. From this work they claim that regardless of D1 or D2 expression, tubercle neurons mainly project to the lateral portion of the ventral pallidum. Next, to compare how odor-associated neuronal activity in the ventral pallidum and the olfactory tubercle (D1 vs D2 MSNs) transforms across association learning, the authors performed 2photon calcium imaging while mice engaged in a lick / no-lick task wherein two odors are associated with reward, two odors are associated with no outcome, and two odors are associated with an air puff.

      This manuscript builds off of prior work by several groups indicating that the olfactory tubercle neurons form flexible learned associations to odors by looking at outputs into the pallidum (but without looking specifically at palladial neurons that truly get input from tubercle I should highlight) and with that, this work is novel. We appreciated the use of a straight-forward odoroutcome behavioral paradigm and the careful computational methods and analyses utilized to disentangle the contributions of single neurons vs population level responses to behavior. With one exception from the Murthy lab, 2P imaging in the tubercle is a new frontier and that is appreciated - as is the 2P imaging in the pallidum which was well-supported by the histology. The anatomical work is also well presented.

      Overall the approach and methods are superb. The issues come when considering how the authors present the story and what conclusions are made from these data. Several key points before going into specifics about each are: 1) The authors can not conclude that their results are contradictory to prior results, 2) The authors over-interpret the results and do not discuss several key methodological issues. We were concerned with the ability to make strong claims regarding the circuitry presented, especially given how much the presented claims contradict prior work. There were also issues with the interpretability of neuronal encoding of value vs valence based on the present behavior (in which a distinction between the air puff and neutral trial types was not clear) and the imaging methodology (in which the neuronal populations analyzed were not clearly defined). In addition to toning down and rectifying some of the language and interpretations, we suggest including a study limitations section where these methodological and interpretation issues are discussed. Over-interpreting and playing up the significance of this work is unnecessary, especially given eLife's new review and publication policy. Readers should be given a sufficiently detailed and nuanced presentation of these thought-provoking results, and from there allowed to interpret the results as they want.

      Strengths:

      State-of-the-art approaches (as detailed above)

      Possible conceptual innovation in terms of looking into output from the olfactory tubercle which has yet to be investigated in this avenue.

      Weaknesses:

      On the first point regarding the authors repeated and unsupported claims that their results are contradictory. There are papers by numerous groups, in respected journals including this one, all together which used 5 different methods (cfos, photometry, 2P, units, fMRI), in animals ranging from humans to mice, which support that tubercle neurons reflect the emotional association of an odor, whether spontaneous or learned. With that, it is on the authors to not claim that their results contradict as if the other papers are suspect, but instead, from our standpoint it is on the authors to explain how and why their results differ from these other papers versus just simply saying they found something different [which at present is framed in a way that is 'correct' due to primacy if nothing else].

      Response: We acknowledge that the first version of the manuscript contained unnecessary disagreeing language. We do not think that our results are broadly in disagreement with the existing literature, but we do come to different conclusions about what the OT is representing. Namely, our comparison of valence encoding in OT to that in the VP strongly indicates that the anteromedial OT has a less robust representation of valence, and we argue that this reflects either an intermediate form of valence representation or potentially might not be important for valence representation at all. We have toned down our conclusions, made clear that we are only recording from one domain of the OT, limited our speculation to the discussion and added a “speculations” section.

      Second, onto the points of interpretation of results, there are several specific areas where this should be rectified. As is, the authors overinterpret their results and draw too far-reaching conclusions. This needs to be corrected.

      In particular, the claims that D1 and D2 neurons of the olfactory tubercle nearly exclusively send projections to the ventral pallidum must be interpreted with caution given that the authors injected an anterograde AAV into the anteromedial olfactory tubercle, and did not examine the projections from either the posterior or lateral portions of the olfactory tubercle. This is especially significant since the retrograde tracing performed from the ventral pallidum indicates that the lateral olfactory tubercle, not the medial olfactory tubercle, primarily projects to the ventral pallidum (Fig 1D-F), however this may be due to leakage into the nucleus accumbens, as seen in the supplementary figure, S1G.

      Response: We thank the reviewer for the point of caution. We have now made it clear that our conclusions are limited to the anteromedial portion of the OT, and other areas may have other projections.

      The same caution must be advised when interpreting the retrograde tracing performed in Fig 1G-I, since the neuronal tracer used and the laterality and rostral-caudal injection site within the VTA could result in different projection patterns and under- or over-labelling. Additionally, the metric used, %Fiber Density (Figure 1C), as in the percentage of 16-bit pixels within the region of interest with an intensity greater than 200, is semi-quantitative, and is more applicable for examining axonal fibers that pass through a region rather than the synaptic terminals (like with a synaptophysin fusion protein-based tracing paradigm) found within a region (puncta). The statements made in contrast to prior studies should therefore be softened, and these concerns should be addressed in the introduction, discussion, and the limitations section if added.

      Response: We have added statements to address these limitations.

      The other major concern is whether the behavioral data generated is indicative of the full spectrum of valence. The authors appropriately state that the mice "perceive" the air puff, yet based on their data the mice did not clearly experience the puff-associated odor as emotionally aversive (viz., negative valence). The way the authors describe these results, it seems they agree with this. With that, the authors can't say the puff is aversive without data to show such - that is an assumption which, while seemingly intuitive, is not supported by the data unfortunately. To elaborate more since this is important to the messaging of the paper: The authors utilized a simple behavioral design, wherein two molecular classes of odors were included in either a sucrose rewarded, neutral no outcome, or air puff punished trial type. The odor-outcome pairs were switched after three days, allowing the authors to compare neuronal responses on the basis of odor identity and the later associated outcome. While the mice showed clear learning of the rewarded trial types by an increase in anticipatory licking during the odor, they did not show any significant changes in behavior that indicated learning of the air puff trial type (change in running velocity or % maximal eye size), especially in contrast to the neutral trial type. This brings up the concern that either the odor-air puff aversive associations (to odors) were not learned, or that the neutral trial types, in which a reward was omitted, were just as aversive as the air puff to the rear, despite the lack of startle response - perhaps due to stimulus generalization between neutral and air puff odor. The possibility of lack of learning is addressed in the paragraph starting at line 578, but does not account for the possibility that the lack of reward is also sufficiently punishing. The authors also address the possibility that laterality in the VP contributed to the lack of neural responsivity observed, but should also include a statement regarding laterality in the olfactory tubercle, as described in https://doi.org/10.7554/eLife.25423 and https://doi.org/10.1523/JNEUROSCI.0073-15.2015, since the effects of modulating the lateral portion of the olfactory tubercle are not yet reported. Lastly, use of the term "reward processing" should be avoided/omitted since the authors did not specifically study the processing of reinforcers.

      Response: As the reviewer points out, we tried to be cautious interpreting the “aversive” odor response, and focused mainly on the reward association. This was discussed in the discussion. We don’t see the need to further add a redundent statement to a “limitations section”. We have also added a note about the previously identified laterality of the OT, which might account for lack of aversive responsive neurons in the OT. The reviewer makes an interesting suggestion that behavioral responses to airpuff-associated odors are not significantly different from un-associated because the lack of reward in this context is already aversive. We note that the walking velocity between reward- and puff-associated odor is significantly different, but not that to unassociated. This is in agreement with the suggestion, and we have added a statement to reflect this.

      Also, I would appreciate justification of the term "value". How specifically does the assay used assess value versus a more simplistic learned association which influences perceived hedonics or valence of the odors.

      Response: We have removed the term “value” with the exception of areas where we cite the work of others. We acknowledge that the word value is complicated in the incentive learning field and appreciate the suggestion. Our experimental design was meant to investigate learned association for positive and negative stimuli, thus valence is more appropriate and we have used this term.

      More information is needed regarding how neurons are identified day-to-day, both in textual additions to the Methods and also in terms of elaborating more in the results and/or figure legends about what neurons are included:

      (a) The ROI maps for identifying/indicating cells in the FOVs are nice to see and at the same time raise some concerns about how cells are identified and/or borders for those specific ROIs drawn. For instance, Figure 4, A & D, ROI #13 (cell #13) between those two panels is VERY different in shape/size. Also see ROIs 15 and 4. Why was an ROI map not made on day 1 and then that same map applied and registered to frames from consecutive imaging days in that same mouse? As it is new ROIs are drawn, smaller for some "cells" and larger for others. And at least in ROI #13 above, one ROI is about twice as large as the other. This inconsistency in the work flow and definition of the ROIs is needing to be addressed in Methods. Also, the authors should address if and how this could possibly impact their results.

      Response: We have added details and clarified the methods section to make this more clear. We note that we extracted calcium transients from the raw data with the the widely used Constrained Nonnegative Matrix Factorization (CNMF) algorithm. This processing algorithm simultaneously identifies spatial and temporal components using modeled kinetics of calcium transients and pre-trained CNN classifiers. Using 2-photon microscopy the optical resolution in the z plane is narrow and we may not always capture components of a neuron that look like “neurons”, but all ROIs were confirmed manually to ensure they were not artifacts.

      (b) Also, more details are needed in results and/or figure legends regarding the changes in cell numbers over days that are directly compared in the results. Some days there are 10% or more or less cells. Why? It is not the same population being compared in this case and so some Discussion of this is needed.

      Response: The shapes of the spatial components can vary across days due to nonrigid motion in the brain and/or miniscule differences in the imaging angle across days. Although we visually verified that we are imaging approximately the same z plane across days, we cannot (and do not) claim to image identical populations of neurons across days.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes a study of the olfactory tubercle in the context of reward representation in the brain. The authors do so by studying the responses of OT neurons to odors with various reward contingencies and compare systematically to the ventral pallidum. Through careful tracing, they present convincing anatomical evidence that the projection from the olfactory tubercle is restricted to the lateral portion of the ventral pallidum.

      Using a clever behavioral paradigm, the authors then investigate how D1 receptor- vs. D2 receptor-expressing neurons of the OT respond to odors as mice learn different contingencies. The authors find that, while the D1-expressing OT neurons are modulated marginally more by the rewarded odor than the D2-expressing OT neurons as mice learn the contingencies, this modulation is significantly less than is observed for the ventral pallidum. In addition, neither of the OT neuron classes shows significant modulation by the reward itself. In contrast, the OT neurons contained information that could distinguish odor identities. These observations have led the authors to conclude that the primary feature represented in the OT is not reward.

      Strengths:

      The highly localized projection pattern from olfactory tubercle to ventral pallidum is a valuable finding and suggests that studying this connection may give unique insights into the transformation of odor by reward association.

      Comparison of olfactory tubercle vs. ventral pallidum is a good strategy to further clarify the olfactory tubercle's position in value representation in the brain.

      Weaknesses:

      The authors' interpretation of the physiologic results - that a novel framework is needed to interpret the OT's role - requires more careful treatment.

      Response: We thank the reviewer for their recommendation. We have toned down the conclusiveness of our language in the discussion. Additionally, we have removed several speculative sentences from the concluding paragraph.

      Reviewer recommendations for Authors:

      We thank the reviewers for this helpful list of recommended changes to the manuscript.<br /> Regrettably, a few of the recommendations were overlooked in the revision, as indicated below.<br /> We do agree with the suggestions and plan to add appropriate changes to the version of record.

      Reviewer #1 (Recommendations For The Authors):

      If the comparisons mentioned in point 2 in the public review do not account for the lack of independence of individual neurons, I suggest the authors do so by either running linear mixed effects models with a random effect for subject, or one-way ANOVAs with a random effect of subject, where appropriate. The authors could also run analyses on summarized individual subject data (averages, % of neurons, etc.), though the authors would lose substantial power when assessing whether average changes differ between subjects in each recording group.

      We have clarified when statistical analyses are comparing individual neurons vs. simultaneously recorded populations. Per the reviewer’s recommendation, we have also incorporated linear mixed-effects models when statistically analyzing individual neurons. Lastly, to further clarify the statistical analyses used, we have added supplementary tables for every statistical test that better describe the parameters used and the relevant outputs.

      Reviewer #2 (Recommendations For The Authors):

      Of minor note, there are some symbols/special characters that did not translate in the figure caption for Figure 6C, repeated text between lines 700-705 and 707-712, and some other small grammatical errors. Additionally, the source of the anterograde tracing virus (AAV9-phSyn1FLEX-tdTomato-T2A-SypEGFP-WPRE) needs to be stated.

      Thank you for pointing these out. We have added description to the figure legend, and deleted the repeated lines and fixed grammatical errors. During the revision, we Regrettably overlooked the request to provide the source for the AAV9-phSyn1-FLEX-tdTomato-T2A-SypEGFP-WPRE. We agree that this small detail is important and will add it before publication of the version of record. This viral vector was purchased from The Salk Institute GT3 Core.

      Reviewer #3 (Recommendations For The Authors):

      The authors' interpretation of the physiologic results - that a novel framework is needed to interpret the OT's role - requires more careful treatment. As the authors note, there is rewardcontingency modulation in OT, especially when D1 neurons are compared against D2, as shown in Fig. 3D,E, Fig. 4I, and Fig. F,J. Though small in effect size, presumably, these modulations cannot be explained by the odor identity. These observations, to this reviewer, suggest the D1 neurons of OT have a component of cue-reward representation. In other words, rather than developing an entirely new framework, an alternative possibility that D1 neurons of OT occupy an intermediate stage in associating cues with reward (i.e., under the same framework, but occupying a different position in the emergence of value representation) should be considered.

      We thank the reviewer for this thoughtful comment. We have eliminated the statement that “novel framework is needed” and have been more conservative in our interpretations. We have also acknowledged that our results are not necessarily in conflict with existing literature, but we do draw different conclusions, namely that the anteromedial OT is not a robust valence encoding population in comparison to that in the VP. We appreciate the suggestion of the term “intermediate stage” in reward association and have now included this in the discussion. Lastly, we have limited broader speculation to a “speculation” section of the discussion.

      Related to the above point, have the authors analyzed if the similarities in the chemical structures correspond to perceptual and neural similarities? In the data presented in Figure S4, there are greater similarities in the population patterns within the same rewarding condition than within chemical groups. A comparison of the reward vs. chemical group (a simpler version of Fig. 5B) may be beneficial and take full advantage of the experimental design.

      This comparison already exists in 5B and lines 285-289 of results. In VP populations, the distribution was structured such that intervalence pairwise comparisons between sucrose-paired and not sucrose-paired odors (e.g. ||SK-PK|| and ||SK-XK||) were larger than intravalence pairwise comparisons (e.g. ||SK-ST||, or ||XK-XT||). OTD1 populations showed an intermediate trend where most intravalence pairwise distances were smaller than intervalence pairwise distances with the exception of ||SK-ST||.

      Related to the point about chemical similarities - is the smaller effect size (amount of modulation associated with reward contingency) in this study, compared to the study by Martiros et al, explained by the similarities of odorants used?

      This is an interesting point. Although the odorants we use are different from those in Martiros et al, we think it is unlikely to the basis of smaller effect size due to reward modulation. If OT represents odor in a population code, whereby identity is encoded in unique ensembles of activity, then variation in the expression of D1R between OT neurons could account for different effects in different ensembles. However, there is no evidence for such varied expression and it doesn’t seem like an ideal mechanism for the OT to broadly associate odor with reward. Moreover, we do not observe any differences in effect size of reward association between the different odorants used in our study. Rather, we think the difference between our findings is more likely to result from recording in different populations of neurons, which is addressed in lines 522-535.

      Regarding the data presented in Fig. 3I - the rewarded odor responses (Sk) are compared against neutral ones (Xk responses), but an S vs. P comparison may be informative, too. Even though the authors mention that the effect of air puff is subtle, the behavioral data presented in Fig. 2F and G suggest that these serve as aversive stimuli. For example, on day 4, the first day after the reward contingency switch, the licking levels seem the lowest for the P odors.

      We have added the S vs P comparison. Indeed, we had originally omitted this because the neural and behavioral response to puff cues was not robust. This is discussed in the discussion (lines 563-579), and our conclusions about aversive conditioning are cautious.

      Regarding the data presented in Fig. 4G: it is difficult to interpret the data when the data for day 1 reward period and day 3 reward cue period are combined. Or do the authors mean day 1 S cue and day 3 S cue?

      These data were based on an observation that some neurons in the VP only responded to sucrose (not odor) on day 1, but later became responsive to the associated odor on day 4. To quantify this, Fig. 4G shows the percentage of these neurons by reporting the percentage that were both responsive to sucrose (not odor) on day 1 and also rewarded odor on day 3. This is described in lines 260-274.

      Figure 6 presentation would benefit from a revision. For example, it is unclear if the water port becomes available for the "N" odors with 100% or 50% chance of reward delivery, and if so, how that happens. There are some errors e.g., colormap used for panel G; odors listed may be wrong in line 752 etc. It was unfortunately not possible to understand what was presented.

      We have added a schematic (Fig 6B) to better describe the movement of the port and details to the methods. The color scale was indeed inverted in panel G (now H), and it has been corrected. We have verified that the odors listed in the methods are correct. Although not included in the revision, in the version of record we will also add corresponding descriptors (e.g., LHi & Lx) to the odors in the methods for easier comparison.

      Minor comments

      For Figure 2H, an alternative description in the legend may be beneficial, as the phrasing is not intuitive. A suggested alternative is "licks in response to sugar-associated odors expressed as fraction of all odors".

      We appreciate the suggestion and have changed this to “licks during either sucrose cue expressed as a fraction of all licks during any odor.”

      Figure 2H: please explain the color code for crosses in the legend and the statistical comparison shown in the figure.

      We have added a legend to explain the color code and included a statement about the statistics in the legend with a link to a supplemental table for statistical parameters.

      Figure 3D: may contain mislabeling in the legend - the legend for 3D does not match the plot (legend refers to bar graph while plot shows line graphs)

      Unclear what is meant. 3D legend says: “Percentage of total neurons that were significantly excited or inhibited by each odor (Bonferroni- adjusted FDR < 0.05) as a function of time relative to odor. Lines represent the mean across biological replicates and the shaded area reflects the mean ± SEM.” This is not a bar plot and is not referred to as one. 3E does show bar plots and is correctly described in the legend.

      Figure 3M: uses letters to refer to cell populations that are identical to the roman numerals used in Fig 3 A-C as well as colours similar to the ones in Fig 3C. However, the cell groups are unrelated; splitting the figures or using a different nomenclature might help

      We have adapted a different color code that we think makes this more distinct.

      Figure 4I: statistical comparison shown in figure not explained (neither in main text nor legend)

      We have added a statement about the statistical comparison and referenced a supplementary table.

      Figure 5 D: color code appears to have a different range than the values shown (i.e. lower limit is 0.7 while the plot shows values below 0.7)

      We confirm this is not a mistake but a stylistic choice. The displayed color scale does only show values to lower limit of 0.7, while the lower limit of values is 0.67. Although the color for 0.67 is not shown in the scale it is approximately the same as the lower limit. The values are reported for full transparency and accuracy.

      Figure 5 G, I, & L: statistical comparison shown in figure not explained

      The comparisons have been explained in supplemental tables (S22-29) and referenced in the legend.

      Figure 5 I: meaning of symbols overlayed over bars not explained

      “Markers represent the mean across biological replicates” has been added.

      Figure 5 J&K: please state if error bars show SEM or SD; also please describe individual thinner lines in the legend

      This has been added to describe 5I. The same format applies to J&K.

      Figure 5L: please describe the individual crosses overlayed over bars in the legend

      Described in 5I.

      Figure S6A-C: please mention the odors used.

      S6A-C shows kinetics for the odor a-terpinene, which is now indicated in the legend.

      Line 129: mentions a 70 psi airpuff but methods say 75 psi - please clarify This has been corrected. 70 psi is the correct value.

      Line 134 typo: SP should be PK

      This has been corrected.

      Line 428: typo; should be cluster 3, not 2

      This has been corrected.

      Line 474 (and figure 6O): please explain what "P" is

      “P” is probability, used as P(S), as in probability of sucrose. This is defined in in line 466.

      Line 692: please describe the staining protocol in the methods (rather than just listing the antibodies and concentrations)

      We have added more details (lines 692-699).

      Line 707-712: duplicate text (identical to Line 700-705)

      This has been deleted.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript the authors are interested in understanding how fission yeast respond to a Nitrogen Signaling Factor (NSF) that has previously been shown to allow Leucine auxotrophs to grow in the presence of Leucine when Nitrogen Catabolite Repression (NCR) is triggered by the presence of a high quality Nitrogen source such as Ammonium Chloride (NH4Cl).

      The authors begin with a screen to identify genes that affect the ability of wild type cells grown near cells with leucine auxotrophy to enhance or abolish NCR phenotype. They screened the non-essential gene deletion library which they manipulate so that it only contains a leucine auxotrophy (unlike the original gene deletion library which contains additional auxotrophies). They identify 137 genes whose deletion allows growth of Leu auxotrophs in the presence of Leucine and Ammonia without the presence of WT cells. These genes are required for NCR. They further identify 203 genes which do not bypass NCR even in the presence of wild type cells, and are thus important for bypassing NCR in the presence of WT cells.

      They then conduct a second screen to identify which of these genes are important for bypassing NCR in response to the Synthetic NSF, 10(R)-hydroxy-8(Z)-octadecenoic acid, by looking for genes which grow in the presence of leucine when ammonia is not present, but do not grow in the presence of leucine when ammonia is present, even when NSF is added. This second screen identifies 117 strains carrying deletions in a gene set enriched for genes related to cellular respiration and mitochondria. They then show that the NSF bypass of NCR is linked to respiration by showing that it is abolished in the presence of the respiration inhibitor Antimycin A, that growth in low levels of glucose can bypass NCR in the absence of NSF< and that cells supplemented with NSF have a higher oxygen consumption rate.

      To gain insight into how the cell responds to NSF, the authors then gather RNA expression data from cells grown in high ammonium concentrations following treatment with NSF relative to a negative control treated only with Methanol (the vehicle into which NSF is dissolved). They argue that the gene expression pattern resembles gene expression data from cells undergoing respiration in glycerol relative to cells undergoing fermentation in glucose. They show that the upregulated genes relate to trehalose synthesis, detoxification of Reactive Oxygen Species, and cellular fusion and the downregulated genes are related to cellular adhesion and flocculation.

      They validate their RNA-seq measurements by showing that the two most highly induced and two most highly repressed genes respond to NSF addition in a dose dependent manner and do not respond oleic acid which is chemically similar to NSF. The most highly responsive gene they identify is an uncharacterized gene, SPBPB2B2.01, which they suggest naming "NSF-responsive amino acid transporter 1" (nrt1). They also show that the nrt1 response is dependent on the culture density, and that the response is present (though the magnitude varies) in YES and in EMM under varying nitrogen concentrations, and that yfp driven by the nrt1 promoter is induced by NSF.

      The authors then investigate the 8 transcription factors that were present in their list of genes required for NSF-mediated adapted growth. They note that Hsr1 was the only one of these transcription factors, indeed the only gene, that was a hit in their screen for NSF-mediated adapted growth and whose expression was induced upon NSF treatment. To see if the activity of the other transcription factors changed in response to NSF treatment, the authors then gathered ChIP-seq data using 6 of these transcription factors as targets for IP. They saw that for Hsr1 and Php3, targets that had increased RNA-seq expression showed an increase in promoter occupancy while for Hsr1, Php3, Adn2, and Atf1, genes that had decreased RNA-seq expression showed a decrease in promoter activity.

      Finally the authors attempt to identify the mode of action of NSF by generating a functionalized NSF with an alkyne tag (AlkNSF) which they then use as a probe to identify NSF binding partners. They first show that AlkNSF does allow bypass of NCR, although at 30-fold higher concentration. Also AlkNSF induces nrt1 expression in a dose dependent manner, although the expression saturates at a lower level and requires a much higher concentration for induction. They then look for proteins that co-purify with AlkNSF compared to a control that was pre-incubated with NSF which was expected to compete off AlkNSF. The only significant protein they saw was Ayr1, which was not identified in their screen and which did not abrogate NSF bypass of NCR when deleted independantly. They saw that Ayr1 deletion actually increases the response of nrt1 and mei2 targets to NSF, and speculate that Ayr1 metabolises NSF and reduces the cell's ability to respond to NSF to bypass NCR.

      They then repeat the affinity purification / mass spec protocol in an Ayr1 delete cells to identify other interaction partners, this time incubating with a higher concentration of NSF, and also comparing to an experiment using Alkeyne Oleic Acid as a control for non-specific binding. The top two specific hits from this assay are Hmt2 and Gst3. NSF was still able to rescue NCR in gst3 deletes, indicating that it was not relevant for the phenotype. Cells lacking hmt2 did not grow in EMM, but did grow in YES when not supplemented with ammonium and when supplemented with ammonium did not grow, and addition of NSF did not rescue growth. They also see that nrt1 and mei2 gene induction in response to NSF is abolished when hmt2 is deleted. They then argue that hmt2, a sulfide:quinone oxidoreductase localized in the inner membrane of mitochondria is a direct target of NSF that triggers a switch to respiratory metabolism and allows bypass of NCR.

      Below are comments that I think ought to be addressed prior to publication (Major comments)

      1. In line 70, the authors state that "S. pombe cells rely on their own BCAA synthesis to sustain growth" when grown alongside Leucine when ammonium is supplied in the media. If prototrophs can inhibit NCR via NSFs in neighboring auxotrophic cells on the same plate, couldn't they also inhibit NCR within their own colony? How do we know that prototrophic cells grown in high quality nitrogen sources along with, say leucine, are not taking up leucine? The fact that leucine auxotrophs cannot grow in high quality nitrogen sources when leucine is present does not imply that wild type cells must use be synthesizing BCAAs rather than importing them. In a recent paper (Kamrad et al Nat. Microbiol. 2023, https://www.nature.com/articles/s41564-022-01304-8), it was shown that S. cerevisiae cells grown in lysine and in high concentrations of ammonium uptake lysine rather than synthesize it as lysine concentrations in the media are increased. I am aware via unpublished results that this is the case for Leucine as well. I would be surprised if the same isn't true in S. pombe. The authors should caveat or remove this assertion.
      2. It is important for the authors to put their observation linking respiration to rescue from NCR in context with findings from a closely related study (Chiu et al 2022) which included some authors from this manuscript and which the authors cite. In that paper, it was shown that the siderefore ferrichrome can also rescue NCR in fission yeast. That paper stated "It is likely that ferrichrome increased mitochondrial activity, which enabled efficient utilization of glucose downstream of the glycolytic pathway" based on experiments in different concentrations of glucose. This evidence seems to support the link between respiration and rescue from NCR proposed by the authors of this manuscript. The authors should acknowledge this closely related and earlier work as it strengthen's the case they are trying to make. They could even test if ferrichrome addition makes cells sensitive to antimycin A (as in fig 1E), but that extra experiment would be optional in my opinion.
      3. In figure 1B for the second screen I do not understand what the photos represent. For the photos, two rows are meant to have no NH4 and also no NSF and the label on that image makes no mention of Leucine supplementation. In the diagram there are two rows that have NH4 and leucine and one row that has no NH4 but does have leucine. I assume the diagram is correct and the labels on the images are incorrect.
      4. It would be important for the authors to put their observation linking respiration to rescue from NCR in context with findings from Chiu et al 2022 which the authors cite. In that paper, it was shown that the siderefore Ferrichrome can also rescue NCR in fission yeast which the authors site which found that a siderephore rescues NCR. Also the authors of that paper stated "It is likely that ferrichrome increased mitochondrial activity, which enabled efficient utilization of glucose downstream of the glycolytic pathway." based on experiments in different concentrations of glucose. This evidence seems to support the link between respiration and rescue from NCR proposed by the authors of this manuscript.
      5. In line 133. The authors state that the 29 mutants that didn't grow under Leucine supplementation either without NH4CL or with NH4Cl whether or not NSF was present were "related to EMM Growth, leucine uptake, or utilization of ammonium as the sole nitrogen source." The first two make sense, but I can't see why a a strain with deletion of a gene related to utilization of ammonium as a sole nitrogen source wouldn't grow when supplemented with leucine. In fact for all the leucine auxotrophs in the screen, if one was to try to grow them with ammonium as the sole nitrogen source they would not grow, so it isn't clear that this screen can identify genes responsible for utilization of ammonium as a sole nitrogen source. The authors should clarify or remove this point.
      6. 203 strains are important for avoidance of NCR (because in the presence of Ammonium and Leucine, as well as a WT strain, they cannot grow). Of these 57 strains can't grow in the presence of a WT strain but they can grow in the presence of NSF. The authors conclude in line 138 that these strains are "likely to respond to a transmissible signal that is different from NSF". This is confusing because deletion of these genes still does allow cells to respond to NSF, however when these cells are growing in the presence of wild type cells (which in their model are releasing NSF), the cells don't grow. I am confused about the nature of the transmissible signal that the authors suggest. It would appear that when these genes are deleted and grown next to a wild type cell which sends the alternative signal and the NSF, the other transmissible signal would inhibits the ability of NSF to release NCR (as NSF can still rescue the gene). It is not clear how the other transmissible signal would work when the gene is present as it is clearly not necessary to rescue growth.

      A simpler explanation might be that there was contamination in the second screen, or that there was a threshold effect - perhaps in the first screen the strains grew just below a threshold and in the second screen it grew just above that level.

      The authors should clarify their interpretation for these strains, and acknowledge any alternative technical explanations.<br /> 7. The authors' efforts to removed confounding effects that might stem from additional auxotrophic alleles made the screen more convincing. However, Fig 1E, 1F, 5B, and 5E were done with EMM+Leu+Ade+Ura, while the initial strain was just done in the presence of additional Leucine. It is unclear why this was done from the text and captions, but I assume it was because they used a strain that was ade- and ura- in addition to being leu-. Given that they had strains without these additional mutations, this seems like a strange choice. The authors should acknowledge that there are possible confounding effects of adding adenine and uracil to the media, and, if they did have additional metabolic deletions, acknowledge that that could possibly be confounding.<br /> 8. Fig 1E, it appears that cells can grow without NSF in the presence of ammonium and additional amino acids after 10 days (although NSF is required for growth at 5 days). This is not a problem for the screen as that was taken at 5-6 days, but it appears as though NSF does not rescue growth so much as speed it up. The authors should acknowledge this when describing the phenotype. It also argues for a quantitative time course growth experiment to compare growth over the course of 10 days with and without NSF, although this would not be necessary to the paper's main argument.<br /> 9. In line 191 and 192, the authors suggest that the "downregulation of flocculation/adhesion related genes by NSF could serve to avoid undesirable mating during growth". If this is the case, I don't understand why mating genes and cellular fusion genes would be upregulated. What do the authors mean by undesirable mating? Wouldn't flocculation increase desirable mating as well? If all mating is undesirable, wouldn't upregulation of mating and cellular fusion genes be detrimental? 10. The authors mention that trehalose is an antioxidant, for which they reference Malecki 2019, however that paper shows no direct evidence of trehalose functioning as an antioxidant under respiratory conditions. It only shows that some trehalose synthesis genes are upregulated when cells are grown under glucose. The authors should identify primary literature to back this statement up, or soften the wording. Also trehalose is known to be a storage metabolite (which is mentioned in Malicki et al 2019, but not in this manuscript). In fact work in budding yeast has show that trehalose can be a shared metabolite that can be produced by respiring cells and used as a fermentable carbon source in communities of budding yeast cells that consist of fermenting and non-fermenting cells (Varahan et al, eLife 2019 https://doi.org/10.7554/eLife.46735). It seems that this role should be considered as an alternative explanation for the induction of trehalose in respiratory cells.<br /> 11. Line 208: The stimulatory effect of NSF on NRT1 decreased with cell density, thus cell density is likely to be an important factor in terms of gene expression. The methods section, text and figure captions do not mention the density at which cells were inoculated/harvested for RNA-seq and other experiments. If that density was more than OD 0.1, then this would be inconsistent with the measurements from Fig 3. Also in fig 3D, The culture density is not mentioned in the figure or the caption, even though the text suggests that for that experiment cells were grown at low density (Lines 212-213). The authors should provide information on density for their experiments in order for them to be reproducible, as they show it is a key factor. 12. In suggesting a name for NRT1 (NSF-responsive amino acid transporter 1), the authors assume that the gene has a role in amino acid transmembrane transport, but they have no experiments showing this phenotype. They mention that it is Inferred from homology with other amino acid transporters. I presume this name has already been approved by Pombase and is not provisional, but it seems that including phenotypes inferred from homology, rather than from experiments is unwise. Do the authors have any other direct evidence that this is a bona fide Amino Acid Transporter? Perhaps a name like "NSF-responsive gene" would be more appropriate.

      Related to this, it appears that the expression level of Nrt1 may be very low (see Fig S2B in which the scale of the RNA-seq track is very small [-1,1] and the amount of expression is very small even when NSF is added). Looking at Fig 2A, the total transcript abundance did not appear to be very low in terms of counts per million (over 100) is this a discrepancy in fig S2B? Perhaps the large fold change is the result of counts very close to zero in the control condition? Also in Fig 3 the nrt1 expression levels did not appear to be especially low and they appeared repeatable. Is the RNA-seq data shown in fig S2B for nrt1 a fluke or am I misinterpreting it? <br /> 13. To show that their Chip-seq worked, the authors showed specific examples of Chip-seq reads for target genes Line 240, "Previously determined target genes of these TFs were significantly enriched in our data set, demonstrating that the experiment has worked (Figure S2A)." Is the significance here, the threshold from fig S2B? If so that threshold should be clearly stated here in the text. If it is the fact that asn1 shows up as "Fil1 bound" is strange as there are no genes that had significant changes in ChIP-seq signals for fig S2B. If there is another threshold the authors should describe it. While some of the examples they showed were convincing (e.g. php3-flag for the php3 regulated gene gln1 and the increased reads for srw1 for the reb1 target srw1), there were some targets that didn't seem to be especially enriched for their designated transcription factor. For example, the gene trx1 which was identified as an Hsr1 binding target had some binding from Hsr1, but more from Php3 and equivalent amounts for many of the other transcription factors. A clear description of how genes are chosen to be significant in the text, alongside references/selection criteria the authors used to select the specific genes shown should be provided to improve reproducability. <br /> 14. In lines 244-246 the authors state that "These differences in TF occupancy were positively correlated with target gene expression changes. That is, individual genes that were upregulated by NSF tended to be more strongly bound by the TFs, whereas downregulated genes were less occupied by the respective TFs (Figure 4A)." This is far from a general trend. The trend is not there for reb1 and fil1. In fact fil1 looks to the eye like it shows a decrease in occupancy for genes with increased expression, and I worry that the authors did a one sided test for significance that would have missed this, although the variability of the genes that don't change in this case is very high, so there could be no significant effect. The authors elaborate on some of the detail in following statements, but they should soften or remove this statement.

      Related to this, in line 254, the authors state: "These results imply that NSF exposure rewires the recipient cell's transcriptional program, for which the TFs Atf1, Adn2, Adn3, Fil1, Hsr1, Php3, Php5, and Reb1 are indispensable (Table S3)." While I am convinced from the RNA-seq evidence and some of the chip-seq evidence that NSF exposure rewires cell's transcriptional program, I am not convinced that the 8 transcription factors they mention are indespensable for rewiring the transcriptional program. While they may be indespensible for the phenotype itself, Reb1, and Fil1 show no no siginificant enrichment in occupancy of upregulated or downregulated targets (Fig 4A) and, along with Atf1, Reb1, and Fil1, have very few genes in which ocupancy is changed significantly (Fig S2B), while no chip-seq experiments were shown for Php5 and Adn3.

      The more specific summary of the data (Lines 250-253) from Fig S2B describing how hsr1 and adn2 have the strongest effects of the transcription factors required for NSF-mediated NCR bypass is a much stronger message for this section. 15. In line 335, the authors state that "in contrast to other communication systems, NSF does not induce noticeable changes in S. pombe's morphology", referring to changins in mating, filamentation, and bacterial biofilm formation. However they do show very clearly that NSF does cause a large decrease in expression in flocculation/adhesion genes. The fact that they do not see a change in morphology is likely due to the fact that the lab strain in the conditions used for this assay do not flocculate. We have recently identified conditions and strains which do exhibit flocculation in this preprint [https://www.biorxiv.org/content/10.1101/2023.12.15.571870v2]. It is likely that if they had a strain and conditions that did flocculate addition of NSF would break up flocculation and thus change the morphology based on their evidence. The authors should remove or caveat this point.<br /> 16. Line 270 Fig 5B: The concentration of NH4Cl listed in the text (374mM) does not match the concentration shown on the figure (748mM). I assume this is a typo but it should be corrected prior to publication.

      Also I have several minor comments to help improve the manuscript.

      m1: Lines 66-70- state that "uptake of the branched-chain amino acids (BCAA) isoleucine (Ile), leucine (Leu), and valine (Val) is suppressed in the presence of high-quality nitrogen sources such as ammonium or glutamate, because the expression of transporters or permeases that are needed for the uptake of poorer nitrogen sources are down regulated (Zhang et al, 2018)." This reference is for S. cerevisiae and is a review. The authors should cite original results in S. pombe if possible, and if that is not available, alert the reader that this result is from a different species.

      m2: It is unclear from the methods section how the images taken for the screens were analyzed. Were they analayzed and scored by hand, or using custom image analysis software. Either way, when publishing the authors should publish the scores for each deletion mutant in their screen. If there was custom image analysis, the authors should mention in their methods the cutoffs which they used to score growth, and consider plotting the data as a supplement so readers can get a sense of how sensitive the screen was.

      m3: The authors identify 137 mutants that did not require NSF signaling to bypass NCR and claimed these genes were required for NCR. It would be helpful and give more confidence in this screen to demonstrate the extent to which the genes identified in this study overlap with any previous genes required for NCR, and whether there was any GO-term enrichment in this set.

      m4: It would be interesting if the authors could speculate a bit in their discussion on why mitochondrial respiration counteracts NCR. Is there something about cells undergoing respiration that would make it easier for them to use BCAAs than to produce them, or conversely something about fermenting cells that makes it easier for them to produce BCAAs rather than importing them?

      m5: It is unclear why Figure 1F has 'MP biomedicals TM' listed in the figure. It doesn't seem to be listed in the caption or the methods. Is this different media than in other experiments? If so, the authors should add that information to the methods or the caption.

      m6: In Line 160, positively influenced is strange wording, do the authors mean "induced"?

      m7: In the section on gene expression change upon exposure to NSF, the authors use a + after each gene name. My understanding is that that notation is meant to refer to strains with the wild type genotype of that gene, and not the gene itself. Shouldn't the gene be italicised in lower case to represent the gene? See: Lera-Ramirez et al 2023 https://doi.org/10.1093/genetics/iyad143.

      m8: In Fig 2A, genes are displayed on a plot that depicts level vs log2FC, but a comparison between the fold change and p-value would be more useful, and I believe DESeq2 should provide an adjusted p-value for these genes. A related issue is that it appears as though there were no biological replicates, though there was data gathered at different time points. In these genome wide experiments, replicates can give confidence to data and help distinguish true change from intrinsic variability of expression in specific genes. Though the authors did qPCR to validate specific results, it would have improved the quality of their systems-level data to have replicates for these and other key experiments (Chip-seq, affinity purification and even the screen).

      m9: Supp Fig S1: To show that similar gene expression profiles exist for other time points, it would be more convincing to show Log fold change 2h vs 4h and 2h vs 6h and show correlation, or else to make a heat map with all genes to see that genes that go up in one condition go up in the other conditions. It is not clear if the red and blue colors are defined for the 2h dataset and then mapped onto the 4 and 6h dataset, or if they are independently assigned for each plot.

      m10: Mbx2 is a key transcription factor related to flocculation and adhesion genes, and its expression is correlated with expression of its targets. If this transcription factor's expression levels decreased in response to NSF, that might strengthen and help explain the decrease in expression the authors observe in flocculation/adhesion genes when cells encounter NSF. If it it does not change, it might also be interesting for readers interested in these phenotypes.

      m11: In Fig 3D, The notation for the Ammonium concentrations for EMM and YES are inconcistent (+ vs parentheses), also the units (mM from the caption) are not on the figure, but the abbreviation "N" is which is confusing and inconsistent with the other plots in which NH4CL is not abbreviated. Additionally, the caption lists additional nutrients in the media for the EMM conditions (Leu, Ade, Ura) which ought to also be listed.

      m12: In lines 233-235, the authors say "One possibility is that they remain bound to their target genes but become activated or deactivated by NSF directly, or posttranslational modification, such as phosphorylation in the case of Atf1". I don't think the authors intend this, but this sentence could be taken to mean that Atf1 has been shown to be phorphorylated by NSF in the reference they site. I think the authors should clarify, i.e. by saying "..such as phophorylation which is known to regulate activity of Aft1 in response to oxidative and osmotic stress [Lawrence et al 2009]".

      m13: In Fig 4B and Fig S2A, there are grey and colored tracks for the chip-seq (- and + NSF), but they are very difficult to see. If grey is in front it is hard to tell how close the colored peak wehn the colored peak is lower. For example, grey is in front for pex7 while color is in front for yhb1. Could the authors add some transparancy so that the data for both conditions could be seen at once? Also there is little information on the control. My assumption for the input(ChIP) sample was that it was cross-linked and sonicated but not immunoprecipitated, but it is not clear what conditions it was in. I would assume it was done without NSF treatment in WT cells, but those details should be added in the caption or methods. In particular, in the input there is a large spike for Gsf2. Do the authors have any explanation for this and does it have anything to do with that gene's NSF responsiveness?

      m14: The authors might consider putting something like Fig S2B (or even a corresponding volcano plot) as a main figure for Fig 4 in addition to the other two panels, as the individual examples from fig 4B are nice to see, but do not give a broad overview of the data.

      m15: In line 348, the wording "Would score" might be better replaced by "would be identified."

      Significance

      Assessment:

      In general I find the authors arguments compelling and their experiments convincing. The initial and follow on screens were well designed and the authors linked respiration and the action of NSF in a convincing way. The analysis of RNA-seq data was also convincing, especially regarding the decreased expression of flocculation and adhesion genes, and the follow up of specific targets gives confidence in the data (though see Major point 12 below regarding the naming and expression levels of nsf1). The identification of hmt2 as a functional target of NSF was compelling and rigorous, and the authors offer an interesting hypothesis to connect this to respiration that could form the basis of future studies.

      At times I thought that some of the interpretation of the results was hard to follow, poorly worded, or off the mark (see comments below). The presentation of the CHiP seq data also felt incomplete, though the influence of Hsr1 and Adn2 on expression of NSF1 targets was convincing. The genome wide assays (RNA-seq, CHiP seq, screen and pull-down/mass spec) could have done with replicates which would have improved statistics and reliability of the results presented for those experiments, although for key messages, the authors followed up with convincing targeted experiments.

      The study represents an advance on recent work in NCR in fission yeast in linking this with the broad metabolic switch between fermentation and respiration, and in that sense makes this of interest to a broader swathe of the microbiology community, outside those interested in metabolic regulation in microbes. In addition to being of interest to applied researchers interested in producing metabolites with yeast and other microbes, the link to cell signaling and, via flocculation and adhesion genes, to microbial multicellular-like phenotypes would make this work of interest to those interested in microbial communities.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study investigated transcriptional profiles of midbrain dopamine neurons using single nucleus RNA (snRNA) sequencing. The authors found more nuanced subgroups of dopamine neurons than previous studies, and idenfied some genes that are preferenally expressed in subpopulaons that are more vulnerable to neurochemical lesions using 6-hydroxydopamine (6OHDA). The reviewers found the results are solid, and the study is overall valuable, providing crical informaon on the heterogeneity and vulnerability of dopamine neurons although the scope is somewhat limited because the result with snRNA is similar to previous results and cell deaths were induced by 6OHDA injecons.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study by Yaghmaeian Salmani et al., the authors performed single-nuclei RNA sequencing of a large number of cells (>70,000) in the ventral midbrain. The authors focused on cells in the ventral tegmental area (VTA) and substana nigra (SN), which contain heterogeneous cell populaons comprising dopaminergic, GABAergic, and glutamatergic neurons. Dopamine neurons are known to consist of heterogeneous subtypes, and these cells have been implicated in various neuropsychiatric diseases. Thus, idenfying specific marker genes across different dopamine subpopulaons may allow researchers in future studies to develop dopamine subtype-specific targeng strategies that could have substanal translaonal implicaons for developing more specific therapies for neuropsychiatric diseases.

      A strength of the authors' approach compared to previous work is that a large number of cells were sequenced, which was achieved using snRNA-seq, which the authors found to be superior compared to scRNA-seq for reducing sampling bias. A weakness of the study is that relavely litle new informaon is provided as the results are largely consistent with previous studies (e.g., Poulin et al., 2014). Nevertheless, it should be noted that the authors found some more nuanced subdivisions in several genecally idenfied DA subtypes.

      On this point we respectfully disagree with the reviewer. In this study, over 30,000 mDA neurons have been analyzed at the genome-wide gene expression level, idenfying mDA territories and neighborhoods (that some may call “subtypes”), a descripon of the mDA neuron diversity that goes far beyond what has been published previously.

      Although several single-cell RNA sequencing studies of mDA neurons have added to our understanding of mDA diversity, they have been limited by the low numbers of sequenced mDA neurons. As the reviewer specifically referred to the study by Poulin et al., 2014, it should be noted that in this report, 159 mDA neurons were analyzed by qPCR – not by RNAseq – of 96 previously identified marker genes. Despite those limitaons, this was indeed a highly impressive study, suggesng five different mDA neuron subtypes (as compared to the 16 neighborhoods described here), published before the era of single-cell genome-wide gene expression methods and advanced bioinformac tools were available. On average, the following scRNAseq studies typically captured a few hundred mDA neurons - compared to over 30,000 in this study. None of the studies menoned in our manuscript were close to capturing the full diversity, and the informaon on mDA neuron diversity is, for this reason, somewhat fragmented in the scienfic literature. Indeed, the seven mDA “subtypes” described in the excellent reviews by Poulin et al., 2020 in Trends in Neurosciences and Garritsen et al., 2023 in Nature Neuroscience are integrated interpretaons of the results from numerous independent studies, each methodologically unique. Several previously idenfied groups, especially Vglut2+ populaons in VTA and SNpc, have been considered poorly defined. As menoned above, our findings in this study could reliably idenfy, by computaonal analyses and combinatorial marker expression in situ, 16 different neighborhoods within the mDA populaon and localize them in the ssue (Figure 4, Supplementary figures 4-1 to 4-3, described further in Supplementary Results). To menon three examples: Within Sox6+ SNpc, we idenfied four different variants (neighborhoods) with partly unique anatomical localizaon. In addion, the large group of mDA neurons referred to as the Pcsk6 territory has not been clearly defined in earlier studies. We also idenfied a novel mDA neuron group that is related to the previously well described Vip-expressing mDA neurons. These and other novel features are menoned in the manuscript and in Supplementary Figure 4-1 to 4-3.

      Although we have, for the consideraon of the space and intelligibility, characterized the 16 neighborhoods with only a few selected key marker genes, we have idenfied numerous addional novel markers, some of which are shown in dot plots in Figure 3 and Supplementary Figure 3, which can be used to characterize these groups further. We also provide all our sequencing data and our Padlock probe ISS data for anyone to download and analyze further, and we have made a web-based tool, CELLxGENE, available on our group’s website to facilitate exploraon of the different aspects of our dataset.

      Lastly, the authors performed molecular analysis of ventral midbrain cells in response to 6-OHDA exposure, which leads to the degeneraon of SN dopamine neurons, whereas VTA dopamine neurons are mainly unaffected. Based on this analysis, the authors idenfied several candidate genes that may be linked to neuronal vulnerability or resilience.

      Overall, the authors present a comprehensive mouse brain atlas detailing gene expression profiles of ventral midbrain cell populaons, which will be important to guide future studies that focus on understanding dopamine heterogeneity in health and disease.<br /> We thank the reviewer for poinng this out.

      Reviewer #2 (Public Review):

      In the manuscript by Salmani et al., the authors explore the transcriptomic characterizaon of dopamine neurons in order to explore which neurons are parcularly vulnerable to 6-OHDA-induced toxicity. To do this they perform single nucleus RNA sequencing of a large number of cells in the mouse midbrain in control animals and those exposed to 6-OHDA. This manuscript provides a detailed atlas of the transcriptome of various types of ventral midbrain cells - though the focus here is on dopaminergic cells, the data can be mined by other groups interested in other cell types as well.

      The results in terms of cell type classificaon are largely consistent with previous studies, though a more nuanced picture of cellular subtypes is portrayed here, a unique advantage of the large dataset obtained. The major advance here is exploring the transcriponal profile in the ventral midbrain of animals treated with 6-OHDA, highlighng potenal candidate genes that may influence vulnerability. This approach could be generalizable to invesgate how various experiences and insults alter unique cell subtypes in the midbrain, providing valuable informaon about how these smuli impact DA cell biology and which cells may be the most strongly affected.

      We appreciate these comments. We want to state that the study not only gives a more nuanced picture but goes far beyond previously published studies and provides a highly resolved and detailed atlas of mDA neurons. Thus, it clarifies poorly described diversity and idenfies enrely novel groups of diverse mDA neurons at the genome-wide gene expression level.

      Overall, the manuscript is relavely heavy on characterizaon and comparavely light on funconal interpretaon of findings. This limits the impact of the proposed work. It also isn't clear what the vulnerability factors may be in the neurons that die. Beyond the characterizaon of which neurons die - what is the reason that these neurons are suscepble to lesion? Also, the interpretaon of these findings is going to be limited by the fact that 6-OHDA is an injectable, and the effects depend on the accuracy of injecon targeng and the equal access of the toxin to access all cell populaons. Though the site of injecon (MFB) should hit most/all of the forebrain-projecng DA cells, the injecon sites for each animal were not characterized (and since the cells from animals were pooled, the effects of injecon targeng on the group data would be hard to determine in any case).

      We agree that the results are presented to provide a comprehensive and valuable resource rather than explaining molecular mechanisms. The reviewer points out that “what the vulnerability factors may be in the neurons that die” is unclear. However, our study was designed to answer the queson: What genes are enriched in clusters of mDA neurons that are parcularly likely to die aer toxic stress? Using single-cell analysis, we believe this queson had higher priority than atempng to idenfy gene expression changes occurring during the cell death process. We agree that we cannot answer why neurons are suscepble to lesions, only idenfy genes that correlate with either high or low sensivity. Thus, the genes we refer to as “vulnerability genes” and “resilience genes” are candidates for influencing differenal vulnerability. Hard evidence for such influence will require addional and extensive funconal analysis. As for the variability of injecon and the characterizaon of individual animals, we wish to menon the online interacve explorer available at htps://perlmannlab.org/resources/. It allows visualizaon of nuclei distribuon per territory and neighborhood for each mouse, making it easy to determine the cell loss rao and cell distribuon per animal. There is indeed variance in the proporons of intact/lesioned total nuclei per animal. This is also evident from the DAT autoradiographs shown for each lesioned animal and presented in Figure Supplement 5-1 A. Importantly, the relave UMAP distribuon of nuclei is quite similar between individual animals. To further invesgate this, we used Pearson’s Chi square test of independence with a conngency table for animals, each with two categorical variables as the proporon of nuclei from intact vs lesioned parts of the vMB (see added Supplementary figure 5-1 C ). This shows that – while there is a difference in the number of nuclei remaining aer lesioning – the relave distribuon among clusters and neighborhoods is similar between animals. We have clarified this point in the manuscript (see page 12 ).

      I am also not clear why the authors don't explore more about what the genes/pathways are that differenate these condions and why some cells are parcularly vulnerable or resilient. For example, one could run GO analyses, weighted gene co-expression network analysis, or any one of a number of analysis packages to highlight which genes/pathways may give rise to vulnerability or resilience. Since the manuscript is focused on idenfying cells and gene expression profiles that define vulnerability and resilience, there is much more that could have been done with this based on the data that the authors collected.

      We performed GO analysis for the genes upregulated and downregulated in the ML clusters (specific to the lesion condion) in the original manuscript (Please see figure supplement 7-1 C-E, and the newly added Supplementary file 10), but we agree with the reviewer that we could also have analyzed funconal categories of genes correlang with differenal vulnerability. Thus, we have used tools recently developed by Morabito et al., Cell Reports Methods (2023), and their hdWGCNA package to address this queson. This method is parcularly suitable for analyzing high-dimensional transcriptomics data such as single-cell RNA-seq or spaal transcriptomics. We calculated the coexpression network based on the lesioned nuclei of the mDA territories. Of the 9 co-expression modules calculated, one has the highest expression in Sox6 territory and has genes in common with the vulnerability module. Another co-expression module has genes in common with the resilience module and is most highly expressed in Otx2 and Ebf1 territories. We also did GO analysis for these co-expression modules and added addional GO analysis of the ML-enriched genes (see Supplementary Figure 7-1 D,E, the newly added Supplementary Figure 6-3, and the newly added Supplementary file 9). Text describing these addional analyses are menoned on page 15 and 17.

      In addition, we wish to emphasize our idenficaon of the genes we refer to as vulnerability and resilience modules in the previous version of the manuscript. Several of the genes were discussed in the previous version of the manuscript but we have now included more informaon on these genes, based on previously published studies and discuss their potenal funconal roles (see pages 22 & 23 in the Discussion).

      Another limitation of this study as presented is the missed opportunity to integrate it with the rich literature on midbrain dopamine (and non-dopamine) neuron subtypes. Many subtypes have been explored, with divergent funcons, and can usually be disnguished by either their projecon site, neurotransmiter identy, or both. Unfortunately, the projecon site does not seem to track parcularly well with transcriptomic idenes, aside from a few genes such as DAT or the DRD2 receptor. However, this could have been more thoroughly explored in this manuscript, either by introducing AAVretro barcodes through injecon into downstream brain sites, or through exisng evidence within their sequencing dataset. There are likely clear interpretaons from some of that literature, some of which may be more excing than others. For example, the authors note that vGluT2-expressing cells were part of the resilient territory. This might be because this is expressed in medially-located DA cells and not laterally-located ones, which tends to track which cells die and which don't.

      The manuscript consists of a comprehensive descripon of transcriponal diversity. Although of clear value, we believe that addional, comprehensive analysis that combines snRNAseq with, e.g., AAVretro barcodes must be done in a separate study. It should also be noted that we describe each territory and neighborhoods in the further detail in the Supplementary Results, which contains references to the relevant literature. In line with the comments, this secon has now been expanded with further references to relevant studies (see Supplementary Results related to Figure 4-figure supplements 1-3).

      It is not immediately clear why the authors used a relaxed gate for mCherry fluorescence in Figure 1. This makes it difficult to definively isolate dopaminergic neurons - or at least, neurons with a DATCre expression history. While the expression of TH/DAT should be able to give a fairly reliable idenficaon of these cells, the reason for this decision is not made clear in the text.

      We used a relaxed gang to ensure that we could capture nuclei expressing low levels of RFP, which we believe could be especially relevant for the lesioned dataset (see page 5). We did not find that it would be advantageous to use a more stringent gang that would risk losing all cells expressing no (or very low levels) RFP. Idenfying mDA neurons based on their typical markers is straighorward, as their transcriponal relaonship is evident from the expression profile of several markers, including transcripon factors such as Nr4a2, Pitx3, and En1. In addion, as pointed out in response to Reviewer #1, point 5, atypical DA neurons expressing Th and other mDA markers with no or low levels of Slc6a3 (DAT) were isolated. We believe the study is more complete by the inclusion of these cells. Moreover, we included a sufficiently large number of cells, which ensured a comprehensive analysis of mDA neurons in relaon to other cell types dissected from the ventral midbrain.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors state that a major advantage of their approach is that it prevents biased datasets when compared to methods that rely on capturing certain cell types. I was wondering if the authors could follow up on this topic with a more detailed descripon of their methodological advantages regarding potenal sampling bias. This is somewhat unclear to me, given that the results of the present study are largely consistent with previous work on this topic.

      As expanded on above (see response to the inial comment in the public review), we strongly disagree that there is litle novelty in our study. None of the previous studies come close to describing the mDA neuron populaon with a similar resoluon, which is unsurprising given the differences in the number of analyzed mDA neurons in this versus previous reports. We agree with the reviewer that our data is consistent with previous studies, when they are all combined. Thus, we idenfied mDA neuron groups that correspond (or roughly correspond) to major DA neuron groups idenfied in previous studies (see pages 8-14 in the Supplementary Results). However, the atlas presented here goes well beyond anything published in scope and resoluon. The diversity we define is comparable to findings that, with careful cross-paper analyses, can be stched together from previous single-cell studies. However, even such a combined analysis does not unravel the resoluon and diverse categorizaon of what we have demonstrated herein (16 neighborhoods in midbrain dopaminergic territories). Considering the well-established problems of dissociang and isolang whole neurons from adult brain ssue, this is likely due to sampling bias, resulng in an almost complete exclusion of some sub-populaons of neurons. We have added text on page 20 to clarify this point.

      (2) In the abstract, the authors state that their "results showed that differences between mDA neuron group could best be understood as a connuum without sharp differences between subtypes". However, I am not sure whether this is the most appropriate descripon of the authors' results, parcularly when looking at the schemac overview shown in Fig. 4F. To me, it seems more likely that genecally-defined DA subtypes overlap with discrete ventral midbrain subnuclei - parcularly in the case of Sox6-expressing cells, which are almost exclusively located in the SNc. In the case of genes that are specific for the VTA, there also seems to be a strong bias toward certain VTA subnuclei, although I agree that arguments can be made that there is some topographic organizaon along a dorso-ventral and medio-lateral gradient, which seems to be largely consistent with the anatomical locaon of projecon-defined dopamine neurons as described previously by Poulin et al., 2018 (Nature Neuroscience).

      What was meant by connuum must be interpreted in the context of the transcriponal landscape of mDA neurons and not their anatomical localizaon. As stated in the paper, the dendrogram depicon of mDA neurons’ transcriptome can be misinterpreted as an indicaon of sharp boundaries and discrete groups in transcriponal profiles. In contrast, we assert that differences between developmentally related mDA neurons are beter described as a connuum with areas in the gene expression landscape defined by the expression of shared genes but without sharp borders between them. We decided to name different areas within this connuum as “territories” at the higher hierarchical level and “neighborhoods” at the more highly resolved level. Hypothecally, such categorizaon can be even more fine-grained, but we find it unlikely that a resoluon beyond the neighborhood level is biologically relevant. As pointed out, the Sox6 territory is the territory that best qualifies as a disncve subtype, while mDA neurons in, e.g., the VTA consist of much higher and nuanced diversity. Importantly, all mDA neurons are much more related to each other than cell types lacking a common developmental origin, including hypothalamic DA neurons. Thus, our effort to define differences in such a gene expression connuum is, in our opinion, more accurate than conveying the message that the diversity consists of subtypes comparable in difference to other cell types that lack a close developmental relaonship with the mDA neuron populaon. Such disnct neuron types, despite using the same neurotransmiter as hypothalamic DA neurons, appear as disnct islands in the UMAP snRNA-seq landscape and typically harbor hundreds of differenally expressed genes. As pointed out in the Discussion, several other studies have noted similar difficules in defining different subtypes among related neurons in e.g. the cortex, striatum, and hippocampus (Kozareva et al., 2021; Saunders et al., 2018; Tasic et al., 2018; Yao et al., 2021). For example, Yao et al., 2021, used a similar hierarchical definion to avoid the implicaon that different groups (“neighborhoods” in this study) should be defined as disnct subtypes of neurons with obvious disncve funcons.

      (3) I recommend that the authors revise the introducon to include more current literature on this topic. The review by Bjoerklund and Dunnet, 2006, is very informave and important, but there is more current literature available that discusses anatomical, molecular, and funconal heterogeneity in the ventral midbrain. For example, it would be nice to incorporate recent work from the Awatramani lab on the mapping of the projecon of molecularly defined dopamine neurons (Poulin et al., 2018; Nature Neuroscience).

      We deliberately avoided including primary references to previously described diversity in the Introducon since numerous papers are relevant to cite. Instead, we refer to three essenal reviews, including the recent arcles from Awatramani and Pasterkamp. In the Supplementary Results related to Figure 4 (pages 8-14 in the Supplementary Results), we include many references and the Poulin 2018 paper. We believe that this is the appropriate place for a comprehensive discussion on anatomical, molecular, and funconal heterogeneity. In the revised manuscript's main body, we now emphasize that previous literature is discussed in the Supplementary Results (see page 11).

      (4) In Fig. 1C, the authors show a sample image demonstrang overlap between TH and mCherry, but this has not been quanfied. Similarly, there seem to be no sample images and quanficaon for the contralateral side that was exposed to 6-OHDA.

      The mouse lines used here (Dat-Cre and Rpl10a-mCherry) have been characterized before (Toskas et al., Science Advances 2022). The labelling colocalizes nearly fully with TH, with some excepons (see response below to point #5). We have now complemented with addional data showing an IHC image of one of the midbrain of a unilaterally lesioned mouse in Figure Supplement 5-1E.

      (5) The authors state that they focused their analysis on 33,052 nuclei expressing above-threshold levels of either Th OR Slc6a3. However, there seem to be cell populaons in the ventral midbrain of mice that express TH mRNA but not TH protein, and these cells do not seem to be bona fide dopamine neurons (see work from the Morales lab). Similarly, not all dopamine neurons may express DAT mRNA. I was wondering how these discrepancies may influence the authors' analysis and interpretaon.

      Indeed, the presence of cells lacking TH protein despite Th mRNA being expressed has been previously described. We also detected these cells across SNpc and VTA and now show these data as a newly added supplementary figure 2-1. In our dataset, the Gad2 territory, located in the ventromedial VTA, contains cells that express many typical mDA markers, such as Pitx3, but very low levels of TH protein. We have idenfied these based on Pitx3-EGFP and Gad2 mRNA co-expression (figure supplement 4-3). In other parts of VTA and SNpc, most cells seem to co-express Th mRNA and protein and are labeled with Dat-Cre. Also scatered in these areas, we could detect some rare mDA cells that lack TH protein. It should be noted that in our mDA territories other typical mDA neuron genes were expressed, such as Slc18a2, Ddc, Nr4a2 and Pitx3, and thus, they were not solely defined by the presence of Th and/or Slc6a3. Cells that do not have a history of DAT-expression, and therefore were not mCherry labelled, were also included in the analysis due to the relaxed gang used during FANS isolaon.

      (6) The sex and age of the mice that are used for the experiments are not stated in the Materials and Methods secon under "Mouse lines and genotyping".

      Thank you for pointing this out. This informaon has been added to the updated manuscript in the methods secon.

      Reviewer #2 (Recommendations For The Authors):

      I think that the manuscript can be significantly improved just by providing deeper analyses of the exisng data and linking them to the current state of the art in terms of defining midbrain dopamine neurons (e.g., by projecon). The dataset is likely richer than was explored in the manuscript and more valuable insights could be gleaned with a deeper analysis.

      Please see our response to Reviewer #2 (Public Review), regarding WGCNA analysis, and the comments on ML-based GO analysis, as well as the comments on the added secons in the supplementary results file.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The delineation of MBOAT function is important with theoretical and practical implications in MAFLD, alcohol-induced hepatic steatosis, and lysosomal diseases. The strength of evidence is convincing using methodology in line with current state-of-the-art, with good support for the claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors provide mechanistic insights into how the loss of function of MBOAT7 promotes alcoholassociated liver disease. They showed that hepatocyte-specific genetic deletion of Mboat7 enhances ethanol-induced hepatic steatosis and increased ALT levels in a murine model of ethanol-induced liver disease. Through lipidomic profiling, they showed that mice with Mboat7 deletion demonstrated augmented ethanol-induced endosomal and lysosomal lipids, together with impaired transcription factor EB (TFEB)-mediated lysosomal biogenesis and accumulation of autophagosomes.

      Strengths:

      Alcohol-induced liver disease (ALD) and metabolic-associated steatotic liver disease (MASLD) are major global health problems, and polymorphism near the gene encoding MBOAT7 has been associated with these conditions. This paper is timely as it is important to gain insights on how loss of MBOAT function contributes to liver disease as this may eventually lead to therapeutic strategies. -The conclusions of the paper are mostly well supported by data.

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Weaknesses:

      (1) In regards to circulating levels of MBOAT7 products, a comparison of heavy drinkers with ALD versus heavy drinkers without ALD would be more clinically relevant.

      We agree this comparison would be an important comparison to make in future studies, but given the difficulties in accessing well-matched samples such as these we see this as beyond the scope of the current work.

      (2) A few typos need to be addressed. For Figure 1 - figure supplement 1, should the second column heading be "Heavy drinkers" instead of "Healthy drinkers"? Also, in the same figure, it is unclear what the "healthy" subcategory under MELD means.

      The typographical error was addressed in the main text and in all associated tables and figures.

      (3) Some of the data in the tables need to be addressed/discussed. For instance, the white blood cell count (WBC) in Figure 1 - figure supplement 1 for "healthy controls" is 34, compared to 13.51 for drinkers. A WBC of 34 is not at all healthy and should be explained. The vast difference between BMI and also between racial distribution within the two cohorts should also be explained. Is it possible that some of these differences contributed to the different levels of circulating MBOAT7 products that were measured?

      Sincere thanks for catching this error. In follow up, we found that some of our patient recruitment sites were using different units to report WBC counts (percent vs 1000/ml) and at this time we cannot retrospectively correct that difference. Therefore, we have incomplete WBC values for the cohort so elected to exclude that information to avoid confusing readers. A revised table is provided in revision reflecting these changes/ If we look at each site separately, values for WBC were in the normal range, so we do not think this is a major limitation of our studies. In regards to BMI and race: Race is not actually significant, but close. For BMI, there are 2 very low BMIs in the Heavy drinkers which bring that average down. We agree with Reviewer # 1 that race and / or BMI could impact MBOAT7, but larger cohorts are needed to detect such potential differences.

      (4) The representation of the statistical difference between the bars in the results figures by using alphabets is a bit confusing. For instance, in figure 2C, does that mean all the bars labelled A are significantly different from B? The solid black bar seems to be very similar to the open red bar; please double check.

      We apologize for this confusing presentation. Using the letter system, groups not sharing a common superscript differ statistically. Given this concern, we have gone back and reviewed all statistical comparisons and realized that there were several mistakes in the graph Figure 2C, Figure 3F and G, Figure 3-Supplementary Figure 1 F and Figure 3-Supplementary Figure 10H. The graphs themselves were not altered, but the denotation of statistical significance was updated with the correct letter superscripts.

      Reviewer #2 (Public Review):

      Summary:

      The work by Varadharajan et. al. explored a previously known genetic variant and its pathophysiology in the development of alcohol-associated liver injury. It provides a plausible mechanism for how varying levels of MBOAT7 could impact the lipid metabolomics of the cell, leading to a deleterious phenotype in MBOAT7 knockout. The authors further characterized the impact of the lipidomic changes and raised lysosomal biogenesis and autophagic flux as mechanisms of how MBOAT7 deletion causes the progression of ALD.

      Strengths:

      Connecting the GWAS data on MBOAT7 variants with plausible pathophysiology greatly enhances the translational relevance of these findings. The global lipidomic profiling of ALD mice is also very informative and may lead to other discoveries related to lipid handling pathways.

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Weaknesses:

      The rationale of why MBOAT7 metabolites are lower in heavy drinkers than in normal individuals is not well explained. MBOAT7 loss of function drives ALD, but unclear if MBOAT7 deletion also drives preference for alcohol or if alcohol inhibits MBOAT7 function. Presuming most individuals studied here were WT and expressed an appropriate level of MBOAT7?

      Although we were unable to genotype for the rs641738 SNP in the human subjects studied here, the original study by Buch et al. published in Nature Genetics performed cis expression quantitative trait lock (cis-eQTL) analyses to demonstrate that the minor disease-associated allele was associated with reduced MBOAT7 expression in subjects with alcohol-related cirrhosis. It is important to note that we did not see any evidence that alcohol preference was altered in either myeloid- or hepatocyte-specific Mboat7-knockout mice, given ethanol intake was similar in all genotypes. Additional studies are needed to address the possibility that MBOAT7 loss of function may promote alcohol preference, but we agree that this should be further investigated.

      Also, the discussion of mechanisms of MBOAT7-induced dysregulation of lysosomal biogenesis/autophagy, while very interesting, seems incomplete. It is not clear how MBOAT7 an enzyme involved in membrane phospholipid remodeling increases mTOR which leads to decreased TFEB target gene transcription.

      Although we agree with Reviewer #2 that mechanistic understanding by which MBOAT7 loss of function impacts mTOR activity and TFEB-driven lysosomal biogenesis is still incomplete, we do feel that the results published here will inform downstream investigation linking phosphatidylinositol remodeling to mTOR and TFEB. The MBOAT7 gene encodes an acyltransferase enzyme that specifically esterifies arachidonyl-CoA to lysophosphatidylinositol (LPI) to generate the predominant molecular species of phosphatidylinositol (PI) in cell membranes (38:4). It is well established that PI-related lipids can regulate membrane dynamics and signal transduction pathways. For instance PI-phosphates (PIPs) are dynamically shaped by PI kinases and phosphatases to play crucial roles in the regulation of a wide variety of cellular processes via specific interactions of PIP-binding proteins. Among PI phosphates, PI 3phosphate (PI3P) regulates vesicular trafficking pathways, including endocytosis, endosome-toGolgi retrograde transport, autophagy and mTOR signaling. Although additional work is needed to understand the molecular details of how MBOAT7-driven LPI acylation impacts mTOR and TFEB, it is not particularly surprising that PI lipid remodeling could broadly impact cell signaling.

      Furthermore, given the significant disturbances of global lipidomic profiling in MBOAT7 knockout, many pathways are potentially affected by this deletion. Further in vivo modeling that specifically addresses these pathways (TFEB targeting, mTOR inhibitor) would help strengthen the conclusions of this paper.

      We agree that further in vivo studies are needed that are beyond the scope of the current work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) p values are rather hard to read. For example, Figure 2c, Hepatocyte-specific deletion of Mboat7 resulted in enhanced ethanol-induced increases in liver weight. However, doesn't look like there is a significant difference between the 2 EtOH groups in Figure 2C? Same comment for Figure 2e, not sure if pair-fed groups had a significant difference.

      (2) Figure 2 Supp fig 1, what is the top band on the MBOAT7 WB?

      We have addressed these statistical comparison comments as described above. Although we cannot be sure, it is likely that the top band on the MBOAT7 Western blot is a non-specific band that shows up with the antibody combination used given there is equal intensity in the Mboat7flox/flox and the MSKO mice (Mboat7flox/flox+LysM-Cre).

    1. Reviewer #2 (Public Review):

      Summary:

      This study looks into the complex dominance patterns of S-allele incompatibilities in Brassicaceae, through which it attempts to learn more about the sheltering of deleterious load. I found several weak points in the analyses that diminished my excitement about the results. In particular, the way in which deleterious mutations were classified lacked the ability to distinguish the severity of the mutations and thus their expected associated dominance. Furthermore, the simulation approach could have provided this exact sort of insight but was not designed to do so, making this comparison to the empirical data also less than exciting for me.

      Major and minor comments:

      I think the introduction (or somewhere before we dive into it in the results) of the dominance hierarchy for the S-alleles needs a more in-depth explanation. Not being familiar with this beforehand really made this paper inaccessible to me until I then went to find out more before continuing. I would expect this paper to be broad enough that self-contained information makes it accessible to all readers. For example, lines 110-115 could be in the Introduction.

      Along with my above comment, perhaps it is not my place to comment, but I find the paper not of a broad enough scope to be of interest to a broad readership. This S-allele dominance system is more than simple balancing selection, it is a very complex and specific form of dominance between several haplotypes, and the mechanism of dominance does not seem to be genetic. I am not sure that it thus extrapolates to broad comments on general dominance and balancing selection, e.g. it would not be the same as considering inversions and this form of balancing selection where we also expect recessive deleterious mutations to accumulate.

      It would have been particularly interesting, or a nice addition, to see deleterious mutations classed by something like SNPeff or GERP where you can have different classes of moderate to severe deleterious variants, which we would expect also to be more recessive the more deleterious they are. In line with my next comment on the simulations, I think relative differences between mutations expected to be more or less dominant may be even more insightful into the process of sheltering which may or may not be going on here.

      In the simulations, h=0 and s=0.01 (as in Figure 5) for all deleterious mutations seems overly simplistic, and at the convenient end for realistic dominance. I think besides recessive lethals which we expect to be close to h=0 would have a much larger selection coefficient, and other deleterious mutations would only be partially recessive at such an s value. I expect this would change some of the simulation results seen, though to what degree I am not certain. It would be nice to at least check the same exact results for h=0.3 or 0.2 (or additionally also for recessive lethals, e.g. h=0 and s=-0.9). I would also disagree with the statement in line 677, many studies have shown, particularly those on balancing selection, that partially recessive deleterious mutations are not eliminated by natural selection and do play a role in population genetic dynamics. I am also not surprised that extinction was found for higher s values when the mutation rate for such mutations was very high and the distribution of s values was constant. An influx of such highly deleterious mutations is unlikely to ever let a population survive, yet that does NOT mean that in nature, the rare influx of such mutations does lead to them being sheltered. I find overall that the simulation results contribute very little, to none, to this paper, as without something more realistic, like a simultaneous distribution of s and h values, you cannot say which, if any class of these mutations are the ones expected to accumulate because of S-allele dominance. Rather they only show the disappointing or less exciting result that fully recessive, weakly deleterious mutations (which I again think do not even exist in nature as I said above) have minor, to no effect across the classes of S-allele dominance. They provide no insight into whether any type of recessive deleterious mutation can accumulate under the S-allele dominance hierarchy, and that is the interesting question at hand. I would either remove these simulations or redo them in another approach. The authors never mention what simulation approach was used, so I can only assume this is custom, in-house code. Yet I do not find that code provided on the github page. I do not know if the lack of a distribution for h and s values is then a choice or a programming limitation, but I see it as one that should be overcome if these simulations are meant to be meaningful to the results of the study.

  4. Feb 2024
    1. level 1 vs level 2

      (#14) (*14)J11(Rita) Is there a specific sampling design for multilevel regression analysis? Are there any assumptions to take into consideration when looking at multilevel regression?

      Response: Anytime the data is clustered you have a multilevel design. I.e., Add health data is clustered by schools. Longitudinal data like the NLSY is clustered at the individual level. The WVS is clustered by countries. Clusters provide the context→and context, for a sociologist, may actually be the point. (More on this in class).

      (*14)J12(Osamudia): A very broad question as I start the reading–is there ever any debate about how we define the levels in multilevel modeling? Reading table 1.1 of the reading, I found myself questioning whether the various levels were appropriately characterized and distinguished.

      Response: You are right to critique this. The levels depend on how we are conceptualizing context, and that depends on sociological theory. I.e., variation in social structure and context might be a level 2 variable, but the way we define that will affect how we think about the model. I.e., is religion a level 1 or level 2 variable?

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you and the two reviewers for the thorough review of our manuscript. We found the reviewer’s comments highly valuable and addressed them by the following additional experiments and changes in the text and the figures:

      (1) We measured the effect of ROCK MASO’s on the ROCK expression by immunostaining and observed a reduction in ROCK signal, supporting the downregulation of ROCK protein level under ROCK MASO’s (new Fig. S3).

      (2) We measured the effect of lower concertation of ROCK inhibitor, Y27632 (10µM), and observe the same phenotypes of skeletal loss, skeletal reduction and ectopic branching in this concentration (Fig. 2, S4). Importantly, these phenotypes were not observed when directly inhibiting PKA and PKC, in whole sea urchin embryos (1) and in skeletogenic cell cultures (2), further supporting the specificity of ROCK inhibitor.

      (3) We added a time course of Pl-ROCK expression and immunostaining of ROCK in the fertilized egg, that show that this gene is maternal and the protein is present in the egg Fig. 2SA-C.

      (4) We recorded F-actin in ROCK MASO’s and demonstrate that it is still detected around the spicules and their tips, similarly to ROCK inhibited embryos (new Fig.S3).

      (5) We revised the paper text and figures to provide a better description of our results, distinguish clearly between our data and our interpretations and emphasize the novelty of our findings.

      This paper demonstrates that ROCK, F-actin polymerization and actomyosin contractility play critical roles in biomineral growth and in shaping biomineral morphology in the sea urchin embryo, and that ROCK activity affects skeletogenic gene expression. Our findings together with previous reports of the role of actomyosin in Eukaryotes biomineralization, suggest that this molecular machinery is a part of the common molecular tool-kit used in biomineralization. The identification of a common molecular mechanism within the diverse gene regulatory networks, organic scaffolds and minerals that Eukaryote use to build their biominerals will be of high interest to the field of biomineralization and evolutionary biology. Furthermore, our paper portrays the interplay between the cellular and the genetic machinery that drives morphogenesis. We believe it would be of great interest to the broad readership of eLife and particularly to the fields of biomineralization, cell, developmental and evolutionary biology.

      Thank you very much for the helpful review of our paper.

      Reviewer #1 (Public Review):

      We thank the reviewer for the appreciation of our work the helpful comments that guided us to strengthen the experimental evidence for our conclusions and increase the paper’s clarity. Below are our responses to the specific comments:

      Major comments

      One MASO led to reduced skeleton formation while the other one additionally induced ectopic branching. How was the optimum concentration for the MASOs determined? Did the authors perform a dose-response curve? What is the reason for this difference? Which of the two MASOs can be validated by reduced ROCK protein abundance? Since the ROCK antibody works, I would like to see a control experiment on Rock protein abundance in control and ROCK MO injected larvae which is the gold-standard for validating the knock-down.

      We tested several MASO concentrations to identify a concentration where the control embryos injected with Random MASO were overall healthy and ROCK MASO’s showed clear phenotypes.

      To test the effect of ROCK MASO’s on ROCK protein levels we did immunostaining experiments that are now presented in new Fig. S3. We could not do Western blot for injected embryos since ROCK antibody requires thousands of embryos for Western blot, which is not feasible for injected embryos. Therefore, we tested the effect of the two translation ROCK MASO’s on ROCK abundance compared to uninjected and Random MASO injected embryos using immunostaining. We observed a reduction of ROCK signal, supporting the downregulation of ROCK protein level in these genetic perturbations (new Fig. S3).

      L212 "Together, these measurements show that ROCK is not required for the uptake of calcium into cells." But what about trafficking and exocytosis? As mentioned earlier, I think this is a really important point that needs to be confirmed to understand the function of ROCK in controlling calcification. In their previous study (reference 45) the authors demonstrated that they have superior techniques in measuring vesicle dynamics in vivo. Here an acute treatment with the ROCK inhibitor would be sufficient to test if calcein-positive vesicle motion, including the observed reduction in velocity close to the tissue skeleton interface, is affected by the inhibitor.

      We thank the reviewer for the appreciation of our previous work where we studied calcium vesicle dynamics in whole embryos (Winter et al, Plos Com Biol 2021). We agree with the reviewer that the best way to test directly the effect of ROCK on mineral deposition and vesicle kinetics is to observe it in live skeletogenic cells. However, in Winter et al 2021, we found that the skeleton (spicules) doesn’t grow when the embryos are immobilized in either control or treated embryos. We have to immobilize the embryos to record live timelapses of whole embryos. Hence, this means that we can not determine the role of ROCK or any other perturbation in vesicle trafficking and exocytosis based on experiments conducted in immobilized whole embryos, since skeletogenesis is arrested. We believe that we can do it in skeletogenic cell cultures and we are currently developing this assay for vesicle tracking, but this is beyond the scope of this current work.

      Is there a colocalization of ROCK and f-actin in the tips of the spicules? This would support the mechano-sensing-hypothesis by ROCK.

      Our studies show that F-actin is localized around the spicule cavity and in the cortex of the cells (Figs. 5 and 6) while ROCK is enriched in the skeletogenic cell bodies, with some localization near the skeletogenic cell membranes (Fig. 1). To directly address the reviewer question we immune-stained ROCK and F-actin in the same embryos, and showed that their sub-cellular localizations does not show a strong overlap (Fig. S3 Q-T). However, ROCK does not bind F-actin directly: ROCK activates another kinase, LimK that phosphorylates Cofilin that interacts with F-actin. Therefore, the fact that ROCK is not colocalized with F-actin does not support nor contradicts the possible role of ROCK in mechano-sensing.

      L 283. "F-actin is enriched at the tips of the spicules independently of ROCK activity" The results of this paragraph clearly demonstrate that ROCK inhibition has no effect on the localization of f-actin at the tips of the growing spicules. In addition, the new cell culture experiments underline this observation. Still, the central question that remains is, what is the interaction between ROCK, f-actin, and the mineralization process, that leads to the observed deformations? What does the f-actin signal look like in a branched phenotype or in larvae that failed to develop a skeleton (inhibition from Y20)?

      As we report in Fig. 6, and now on new Fig. S3, under ROCK late inhibition or in ROCK morphants, we still detect F-actin around the spicule and enriched at the tips. When ROCK is inhibited and the embryo fails to develop a skeleton, we observe Factin accumulation in the skeletogenic cells, but the F-actin is not organized (Fig. 5). As the spicule is absent in this condition, it is hard to conclude whether the effect on F-actin organization is direct or due to the absence of spicule in this condition. We stated that explicitly in the current version in the results, lines 324-326 and in the discussion, lines 405-408.

      Immunohistochemical analyses on f-actin localization and abundance should be additionally performed with ROCK knock-down phenotypes to confirm the pharmacological inhibition.

      We did that in our new Figure S3 and showed that ROCK morphant show the same F-actin localization at the tips like control and ROCK inhibited embryos.

      L 365 "...supporting its role in mineral deposition..." "...Overall, our studies indicate that ROCK activity....is essential for the formation of the spicule cavity......which could be essential for mineral deposition..." I think the authors need to do a better job in clearly separating between the potential processes impacted by ROCK perturbation. Is it stabilization and mechano-sensing in the spicule tip or the intracellular trafficking and deposition of the ACC? If the dataset does not allow for a definite conclusion, I suggest clearly separating the different possibilities combined with thorough discussion-based findings from other mineralizing systems where the interaction between ROCK and F-actin has been described.

      We thank the reviewer for this important comment. We believe that ROCK and the actomyosin are involved in both, mechano-sensing of the rigid biomineral and in the transport and exocytosis of mineral-bearing vesicles. In the current version we provide explicit explanations of these two hypotheses in the discussion section. The possible role in exocytosis and the experiments that are required to assess this role are described in lines 427-439, and the possible mechano-sensing role and effect on gene expression is described in lines 440-453.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      L185 "These SR-µCT measurements show that the rate of mineral deposition is significantly reduced under ROCK inhibition." To correctly support this statement I would suggest to calculate the real growth rates (µm3 time-1). For example, an increase in volume from 6,850 µm3 at 48 hpf to 14,673 µm3 at 72 hpf would result in a growth rate of 7823 µm3 24h-1.

      We thank the reviewer for this suggestion. We calculated the rate of spicule growth as the reviewer suggested and we added this information in lines 218-221.

      L343: "This implies that....within the skeletogenic lineage." This concluding sentence is very speculative and therefore misplaced in the results section.

      We removed this sentence from the results section into the discussion, lines 443-445.

      L382: "The participation of F-actin and ROCK in polarized tip-growth and vesicle exocytosis has been observed in both, animals and plants." L407-409: "...F-actin could be regulating the localized exocytosis of mineral-bearing vesicles...." I think this is exactly the core question that remains unresolved in this study. To reduce speculations I strongly recommend addressing the effect of ROCK inhibition on vesicle trafficking and exocytosis (Monitoring of calcein-positive Vesicles in PMCs).

      We agree with the reviewer that this is a critical question that we would have address, but as we explained above, is beyond the scope of this study.

      Figure 5: The values below the scale bars in the newly added figures U+V are extremely small. Also, the Legend for this figure sounds incorrect. Should read: "...and skeletogenic cell cultures that were treated with 30µM ROCK inhibitor that was added at 48hpf and recorded at 72hpf.

      We increased the font near the scale bars and corrected the figure caption. Thanks for this and your other helpful comments!

      Reviewer #2 (Public Review):

      We thank the reviewer for raising the important issue of inhibitor concentration which led us to do additional experiments with lower concentration that were valuable and strengthen the manuscript. We also thank the reviewer for asking us to be clearer with the interpretation of the results. Below are our responses to the specific comments:

      My concerns are the interpretation of the experiments. The main overriding concern is a possible over-interpretation of the role of ROCK. In the literature that ROCK participates in many biological processes with a major contribution to the actin cytoskeleton. And when a function is attributed to ROCK, it is usually based on the determination of a protein that is phosphorylated by this kinase. Here that is not the case. The observation here is in most cases stunted growth of the spicule skeleton and some mis-patterning occurs or there is an absence of skeleton if the inhibitor is added prior to initiation of skeletal growth. They state in the abstract that ROCK impairs the organization of F-actin around the spicules. The evidence for that as a direct role is absent.

      We agree with the reviewer that since the spicule doesn’t form under ROCK continuous inhibition, it is unclear if the absence of F-actin around the spicule in this condition is a direct outcome of the lack of ROCK activation of F-actin polymerization, or an indirect outcome due to the lack of spicule to coat. We therefore deleted this line in the abstract and explicitly stated that we cannot conclude whether the impaired F-actin organization is directly due to ROCK effect on actin polymerization in the results, lines 324-326 and in the discussion, lines 405-408.

      They use morpholino data and ROCK inhibitor data to draw their conclusion. My main concern is the concentration of the inhibitor used since at the high concentrations used, the inhibitor chosen is known to inhibit other kinases as well as ROCK (PKA and PKC). They indicate that this inhibition is specifically in the skeletogenic cells based on the isolation of skeletogenic cells in culture and spicule production either under control or ROCK inhibition and they observe the same - stunting and branching or absence of skeletons if treated before skeletogenesis commences. Again, however, the high concentrations are known to inhibit the other kinases.

      In the previous version of the paper we used the range of 30-80µM Y-27632 to block ROCK activity. These concentrations are commonly used in mammalian systems and in Drosophila to block ROCK activity (3-8). The reviewer is correct stating that at high concentration, this inhibitor can block PKA and PKC. However, the affinity of the inhibitor for these kinases is more than 100 times lower than its affinity to ROCK as indicated by the biochemical Ki values reported in the manufactory datasheet: 0.14-0.22 μM for ROCK1, 0.3 μM for ROCK2, 25 μM for PKA and 26 μM for PKC.

      Importantly, these Ki values are based on biochemistry assays where the activity of the inhibitor is tested in-vitro with the purified protein. Therefore, these concentrations are not relevant to cell or embryo cultures where the inhibitor has to penetrate the cells and affect ROCK activity in-vivo. Y-27632 activity was studied both in-vitro and in-vivo in Narumiya, Ishizaki and Ufhata, Methods in Enzymology 2000 (9). This paper reports similar concentrations to the ones indicated in the manufactory datasheet for the in-vitro experiments, but shows that 10µM concentration or higher are effective in cell cultures. We therefore tested the effect of 10µM Y-27632 added at 0hpf (continuous inhibition) and at 25hpf (late inhibition) and added this information to Figs. 2 and S3. Continuous inhibition at this concentration resulted with three major phenotypes: skeletal loss, spicule initiations and small spicules with ectopic branching. This result supports our conclusion that ROCK activity is necessary for spicule formation, elongation and prevention of branching. Late inhibition in this concentration resulted with the majority of the embryos developing branched spicules, which is very similar to the effect of MyoII inhibition with Blebbistatin. This result again, supports the inference that ROCK activity is required for normal skeletal growth and the prevention of ectopic branching. Importantly, there are two papers were PKA and PKC were directly inhibited in whole sea urchin embryos (1) and in skeletogenic cell cultures (2). In both assays, PKC inhibition resulted with mild reduction of spicule length while PKA inhibition did not affect skeletal formation. Neither skeletal loss nor ectopic branching were ever observed under PKC or PKA inhibition, supporting the specific inhibition of ROCK by Y-27362. Furthermore, both genetic and pharmacological perturbations of ROCK resulted with significant reduction of skeletal growth and with the enhancement of ectopic branching. Therefore, we believe we provide convincing evidence for the role of ROCK in spicule formation, growth and prevention of branching. We revised Fig. 2 and S3 to include the 10µM Y-27632 data and the text describing the inhibition to include the explanations and references we provided here.

      They use blebbistatin and latrunculin and show that these known inhibitors of actin cytoskeleton lead to abnormal spiculogenesis, This coincidence is suggestive but is not proof that it is ROCK acts on the actomyosin cytoskeleton given the specificity concerns.

      As stated above, we believe that in the current vesion we overcame the specificity concerns and provided solid evidence that ROCK activity is necessary for spicule formation, growth and prevention of branching. Furthermore, the skeletogenic phenotypes of late 10µM Y-27632 are highly similar to those of MyoII inhibition (Blebbistatin) while the phenotypes of higher concetrations resemble the inhibition of actin polymerization by Latrunculin. We agree with the reviewer that: “This coincidence is suggestive but is not proof that ROCK acts on the actomyosin cytoskeleton” and we revise the discussion paragraph to differentiate between our solid findings and our speculations (lines 421-426): “These correlative similarities between ROCK and the actomyosin perturbations lead us to the following speculations: the low dosage of late ROCK inhibition is perturbing mostly ROCK activation of MyoII contractility while the higher dosage affects factors that control actin polymerization (Fig. 8F). Further studies in higher temporal and spatial resolution of MyoIIP activity and F-actin structures in control and under ROCK inhibition will enable us to test this.”

      Reviewer #2 (Recommendations For The Authors):

      The following areas require attention:

      (1) You begin and end the abstract with statements on evolution in which the actomyosin cytoskeleton is associated with skeletogenesis despite different GRNs, different contributing proteins, etc. You then move to ROCK and claim to reveal that ROCK is a central player in the process. As above, in the judgement of this reviewer, you fail to establish a direct role of ROCK to the actomyosin role in skeletogenesis. Sure, the ROCK inhibitors suggest that ROCK plays some kind of role in the process but you also indicate that ROCK could act on many processes, none of which you directly associate with the necessary activity of ROCK.

      We agree that our paper provides correlative similarities between the phenotypes of ROCK and those of direct pertrubations of the actomyosin network, and lacks causal relationship. We made this point clear throughout the current version of the manuscript.

      (2) In the abstract you report that ROCK inhibition impairs the actin cytoskeleton around the skeleton. In examining your images in Fig. 5 that is not the case. Based on Phalloidin staining, actin surrounds both the control and the ROCK-inhibited skeleton. The distribution of actin is the same in both cases. Myosin is also stained in this figure and it too shows similar staining both in experimental and control. So, to this reviewer, there is insufficient evidence to suggest that the actin cytoskeleton is impaired, and there is no evidence directly relating ROCK with that cytoskeleton. I'm not questioning the observation that inhibition of ROCK causes stunting and mispatterning of the skeleton. That you show and quantify well. The issue is the precise target of ROCK. Your data does not establish the specific cause. It could be the actin cytoskeleton but your experiments do not directly address that.

      Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (3) In parts of the manuscript you use the term filopodia and in other parts I think you use pseudopodia to refer to the same structure. Since Ettensohn has provided the most evidence on the organization of the skeletogenic syncytia, I suggest you use the same term he used for those cellular extensions.

      The filopodia and the pseudopodia are two distinct structures generated by the skeletogenic cells. The filopodia is the common cellular extension described in many cells, while the term “pseudopodia cable” describes the specific structure that forms between the skeletogenic cells in which the spicule cavity forms, in agreement with Prof. Ettensohn terminology.

      (4) In trying to find relationships you cite a number of previous papers at the end of the introduction. I went back to those papers and they describe (from your work) calcium exocytosis, plus filopodia formation, plus planar cell polarity, plus CDC42, any one of which could involve an actin cytoskeleton. You even cite a paper saying that perturbations of ROCK prevent spicule formation. I went back to that paper and that isn't the case. You then summarize the Introduction by relating ROCK and the actin cytoskeleton, thereby raising reader expectation that the two will be connected. As above, in reality, your evidence here does not connect the two.

      We thank the reviewer for giving us credit for all these works, but only the paper on vesicle kinetics is from our lab (winter et al 2021). As for Croce et al, 2006 that the reviewer refers to: in Fig. 9A, 75µM of Y-27632 is used to inhibit ROCK in the same sea urchin species that we use, and the phenotype is identical to what we observe – the skeletogenic cells are there, but the spicule is not formed. As mentioned above, in the current version we distinguished clearly between our solid findings and our interpretations.

      (5) You emphasize in Fig. 1 the inhibition of ROCK in the presence of VEGFR inhibition. However, at no place in the manuscript do you say anything about how VEGFR is inhibited, when it is inhibited, or how you know it is inhibited. That oversight must be corrected. You mention axitinib but don't say anything about what it does. Some readers may know its activity but many will not.

      We now indicate that we use Axitinib to block VEGFR in the results section (line 104) and in the methods section (lines 470-471).

      (6) Fig. 2. The use of Y27632 as a selective inhibitor of ROCK. According to data sheets from the manufacturer, at the levels used in your experiments, 120 µm, 80 µm and 30 µm, those levels of inhibitor also inhibit the activity of PKA and PKC (both inhibited at around 25 µm). This is concerning because of the literature indicating that activation of the VEGFR operates through PKA. Inhibition of PKA, then, would inhibit the activity of VEGF signaling. Thus, the inhibitory effects of Y27632 may actually not be attributed specifically to ROCK. Furthermore, the heading of this section states that ROCK activity controls initiation, growth, and morphology of the spicule. Yet, even in high levels of inhibitor spicule production is initiated. Yes, the growth and the morphology are compromised, but the initiation doesn't seem to be.

      The spicule fails to form under ROCK continuous inhibition in all concentrations (Fig. 2). Also, as we explained in details above, these Ki values are based on biochemical experiments with purified proteins and are not relevant to in-vivo use of the inhibitor. Yet, these Ki values demonstrate that the affinity of the inhibitor to ROCK is 100 higher than of its affinity to PKA and PKC. Specifically to the reviewer suggestion here: direct inhibition of PKA does not have skeletogenic phenotypes, not in whole embryos (1) and not in skeletogenic cell culture (2). Since we see the same skeletogenic phenotypes at low Y-27362 concentration and the genetic and pharmacological pertrubations of ROCK reconcile, we believe that these phenotypes can be atributed directly to ROCK.

      (7) The synchrotron study is very nice with two points that should be addressed. Again, a high concentration of Y27632 was used giving a caveat on ROCK specificity. And second, the blue and green calcein pulses are very nice but the recent paper by the Bradham group should be cited.

      We added a reference to Bradham recent paper on two calcein pulses (10).

      (8) Fig. 5 is where an attempt is made to associate ROCK inhibition to alterations in actomyosin. Again, a high concentration of the inhibitor is used casting doubt on whether it specifically inhibits ROCK. However, even if the inhibition is specific to ROCK the images do not provide convincing evidence that ROCK activity normally is directed toward actomyosin. This is crucial to the manuscript.

      As stated above, we addressed the specificity in this version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (9) Again in Fig. 6 the inhibitor is used with the same concern about whether the effects noted are due to ROCK.

      Fig. 6 is now Fig. 7 – the effect of ROCK on gene expression and as explained above, we addressed the specificity in this version.

      (10) Lines 350-358. This interpretation falls apart without showing that the inhibitor is specific for ROCK as indicated above. Also, Fig. 5 is unconvincing in showing a difference in actin or myosin distribution in control vs ROCK inhibited embryos. Yes, the spicules are stunted, but whether actin or myosin have anything to do with that as a result of lack of ROCK activity is not demonstrated.

      As stated above, we addressed the specificity in the revised version and we modified the text to emphasize the correlation and not cuasation: Fig. 5 shows a clear difference between F-actin in control and under ROCK inhibition. In control F-actin is enriched around the spicule and under ROCK inhibition the spicule doesn’t form and disorganized F-actin is accumulated in the skeletogenic cells. Yet, as we stated above – this is not a proof for the direct effect of ROCK on F-actin polymerization, and we explain it explicitly in the results, lines 324-326 and in the discussion, lines 405-408.

      (11) Throughout, the manuscript spelling, grammar, and sentence structure will require extensive editing. The mistakes are numerous.

      We did our best to correct the spelling and grammar. If we still missed some mistakes, we would be happy to further correct them.

      References

      (1) Mitsunaga K, Shinohara S, Yasumasu I. Probable Contribution of Protein Phosphorylation by Protein Kinase C to Spicule Formation in Sea Urchin Embryos: (sea urchin/protein kinase C/spicule formation/H-7/HA1004). Dev Growth Differ. 1990;32(3):335-42.

      (2) Mitsunaga K, Shinohara S, Yasumasu I. Does Protein Phosphorylation by Protein Kinase C Support Pseudopodial Cable Growth in Cultured MicromereDerived Cells of the Sea Urchin, Hemicentrotus pulcherrimus?: (sea urchin/protein kinase C/spicule formation/phorbol ester/H-7). Dev Growth Differ. 1990;32(6):647-55.

      (3) Su Y, Huang H, Luo T, Zheng Y, Fan J, Ren H, et al. Cell-in-cell structure mediates in-cell killing suppressed by CD44. Cell Discov. 2022;8(1):35.

      (4) Kagawa H, Javali A, Khoei HH, Sommer TM, Sestini G, Novatchkova M, et al. Human blastoids model blastocyst development and implantation. Nature. 2022;601(7894):600-5.

      (5) Canellas-Socias A, Cortina C, Hernando-Momblona X, Palomo-Ponce S, Mulholland EJ, Turon G, et al. Metastatic recurrence in colorectal cancer arises from residual EMP1(+) cells. Nature. 2022;611(7936):603-13.

      (6) Becker KN, Pettee KM, Sugrue A, Reinard KA, Schroeder JL, Eisenmann KM. The Cytoskeleton Effectors Rho-Kinase (ROCK) and Mammalian DiaphanousRelated (mDia) Formin Have Dynamic Roles in Tumor Microtube Formation in Invasive Glioblastoma Cells. Cells. 2022;11(9).

      (7) Segal D, Zaritsky A, Schejter ED, Shilo BZ. Feedback inhibition of actin on Rho mediates content release from large secretory vesicles. J Cell Biol. 2018;217(5):1815-26.

      (8) Fischer RS, Gardel M, Ma X, Adelstein RS, Waterman CM. Local cortical tension by myosin II guides 3D endothelial cell branching. Curr Biol. 2009;19(3):2605.

      (9) Narumiya S, Ishizaki T, Uehata M. Use and properties of ROCK-specific inhibitor Y-27632. Methods Enzymol. 2000;325:273-84.

      (10) Descoteaux AE, Zuch DT, Bradham CA. Polychrome labeling reveals skeletal triradiate and elongation dynamics and abnormalities in patterning cue-perturbed embryos. Dev Biol. 2023;498:1-13.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The OSCA/TMEM63 channels have recently been identified as mechanosensitive channels. In a previous study, the authors found that OSCA subtypes (1, 2, and 3) respond differently to stretch and poke stimuli. For example, OSCA1.2 is activated by both poke and stretch, while OSCA3.1, responds strongly to stretch but poorly to poke stimuli. In this study, the authors use cryo-EM, mutagenesis, and electrophysiology to dissect the mechanistic determinants that underlie the channels' ability to respond to poke and stretch stimuli.

      The starting hypothesis of the study is that the mechanical activation of OSCA channels relies on the interactions between the protein and the lipid bilayer and that the differential responses to poke and stretch might stem from variations in the lipid-interacting regions of OSCA proteins. The authors specifically identify the amphipathic helix (AH), the fenestration, and the Beam Like Domain (BLD) as elements that might play a role in mechanosensing.

      The strength of this paper lies in the technically sound data - the structural work and electrophysiology are both very well done. For example, the authors produce a high-resolution OSCA3.1 structure which will be a useful tool for many future studies. Also, the study identifies several interesting mutants that seemingly uncouple the OSCA1.2 poke and stretch responses. These might be valuable in future studies of OSCA mechanosensation.

      However, the experimental approach employed by the authors to dissect the molecular mechanisms of poke and stretch falls short of enabling meaningful mechanistic conclusions. For example, we are left with several unanswered questions surrounding the role of AH and the fenestration lipids in mechanosensation: Is the AH really important for the poke response if mutating residues conserved between OSCA1.2 and OSCA3.1 disrupts the OSCA1.2 ability to respond to poke but mutating the OSCA1.2 AH to resemble that of OSCA3.1 results in no change to its "pokability"? Similar questions arise in response to the study of the fenestrationlining residues.

      We thank the reviewer for their feedback. We believe that the different OSCA1.2 mutants on their own suggest an involvement of the AH and fenestration-lining residues in its mechanosensitive response. We attribute the inability to restore the poke response of OSCA3.1 with similar mutations to its inherent high threshold to this particular stimulus and perhaps other structural differences, or a combination of them, that we did not probe in this study. We agree more work is required in the field to address these remaining questions and further dissect the difference between poke and stretch responses.

      Reviewer #2 (Public Review):

      Summary:

      Jojoa-Cruz et al. determined a high-resolution cryo-EM structure in the Arabidopsis thaliana (At) OSCA3.1 channel. Based on a structural comparison between OSCA3.1 and OSCA1.2 and the difference between these two paralogs in their mechanosensitivity to poking and membrane stretch, the authors performed structural-guided mutagenesis and tested the roles of three structural domains, including an amphipathic helix, a beam-like domain, and a lipid fenestration site at the pore domain, for mechanosensation of OSCA channels.

      Strengths:

      The authors successfully determined a structure of the AtOSCA3.1 channel reconstituted in lipid nanodiscs by cryo-EM to a high resolution of 2.6 Å. The high-resolution EM map enabled the authors to observe putative lipid EM densities at various sites where lipid molecules are associated with the channel. Overall, the structural data provides the information for comparison with other OSCA paralogs.

      In addition, the authors identified OSCA1.2 mutants that exhibit differential responses to mechanical stimulation by poking and membrane stretch (i.e., impaired response to poke assay but intact response to membrane stretch). This interesting behavior will be useful for further study on differentiating the mechanisms of OSCA activation by distinct mechanical stimuli.

      Major weakness:

      The major weaknesses of this study are the mutagenesis design and the functional characterization of the three structural domains - an amphipathic helix (AH), a beam-like domain (BLD), and the fenestration site at the pore, in OSCA mechanosensation.

      (1) First of all, it is confusing to the reviewer, whether the authors set out to test these structural domains as a direct sensor(s) of mechanical stimuli or as a coupling domain(s) for downstream channel opening and closing (gating). The data interpretations are vague in this regard as the authors tend to interpret the effects of mutations on the channel 'sensitivity' to different mechanical stimuli (poking or membrane stretch). The authors ought to dissect the molecular bases of sensing mechanical force and opening/closing (gating) the channel pore domain for the structural elements that they want to study.

      We agree with the reviewer that our data are unable to distinguish the transduction of a mechanical stimulus and channel gating. We set up to determine whether these features were involved in the mechanosensitive response. However, as the reviewer points out, evaluating whether they work as direct sensors or coupling domains would require a more involved experimental design that lies beyond the scope of this work. Thus, we do not claim in our study whether these features act as direct sensors of mechanosensitive stimuli or as coupling domains, only their involvement.

      Furthermore, the authors relied on the functional discrepancies between OSCA1.2 (sensitive to both membrane poking and stretch) and OSCA3.1 (little or weak sensitivity to poking but sensitive to membrane stretch). But the experimental data presented in the study are not clear to address the mechanisms of channel activation by poking vs. by stretch, and why the channels behave differently.

      We had hoped that when we switched regions of the OSCA1.2 and OSCA3.1 channels we would abolish poke-induced responses in OSCA1.2 and confer poke-induced sensitivity to OSCA3.1. We agree with the reviewer that we were not able to pinpoint the reason or multiple reasons, as it could be a compounded effect of several differences, that caused OSCA3.1 higher threshold and thus we could not confer to it an OSCA1.2-like phenotype. Yet, we shed some light on some of the structural differences that appear to contribute to OSCA3.1 behavior, as mutagenesis of OSCA1.2 to resemble this channel led to OSCA3.1-like phenotype.

      (2) The reviewer questions if the "apparent threshold" of poke-induced membrane displacement and the threshold of membrane stretch are good measures of the change in the channel sensitivity to the different mechanical stimuli.

      The best way to determine an accurate measure of sensitivity to mechanical stimuli is stretch applied to a patch of membrane. There are more complicating factors that influence the determination of "apparent threshold" in the whole cell poking assay, including visualizing when the probe first hits the cell (very difficult to see). With that said, the stretch assay has its own issues such as the creep of the membrane into the pipette glass which we try to minimize with positive pressure between tests.

      (3) Overall, the mutagenesis design in the various structural domains lacks logical coherence and the interpretation of the functional data is not sufficient to support the authors' hypothesis. Essentially the authors mutated several residues on the hotspot domains, observed some effects on the channel response to poking and membrane stretch, then interpreted the mutated residues/regions are critical for OSCA mechanosensation. Examples are as follows.

      In the section "Mutation of key residues in the amphipathic helix", the authors mutated W75 and L80, which are located on the N- and C-terminal of the AH in OSCA1.2, and mutated Pro in the OSCA1.2 AH to Arg at the equivalent position in OSCA3.1 AH. W75 and L80 are conserved between OSCA 1.2 and OSCA3.1. Mutations of W75 and/or L80 impaired OSCA1.2 activation by poking, but not by membrane stretch. In comparison, the wildtype OSCA3.1 which contains W and L at the equivalent position of its AH exhibits little or weak response to poking. The loss of response to poking in the OSCA1.2 W/L mutants does not indicate their roles in pokinginduced activation.

      Besides, the P2R mutation on OSCA1.2 AH showed no effect on the channel activation by poking, suggesting Arg in OSCA3.1 AH is not responsible for its weak response to poking. Together the mutagenesis of W75, L80, and P2R on OSCA1.2 AH does not support the hypothesis of the role of AH involved in OSCA mechanosensation.

      Mutagenesis of OSCA1.2 in the amphipathic helix for residues W75 and L80 suggests a role of the helix in the poke response in OSCA1.2, regardless of OSCA3.1 having the same residues. Furthermore, the lack of alteration in the response for mutant P77R suggests that specific residues of the helix are involved in this response and is not a case where any mutation in the helix will lead to a loss of function.

      OSCA3.1 WT exhibits a high-threshold response (near membrane rupture) in the poke assay without any mutations, and this could be due to other features, for example, the residues lining the membrane fenestration, as well as features not identified/probed in this study. We agree with the reviewer that the differences in the AH do not explain the different response to poke in OSCA1.2 and OSCA3.1, and we have added this statement explicitly in the discussion for clarification (line #251-252).

      In the section "Replacing the OSCA3.1 BLD in OSCA1.2", the authors replaced the BLD in OSCA 1.2 with that from OSCA3.1, and only observed slightly stronger displacement by poking stimuli. The authors still suggest that BLD "appears to play a role" in the channel sensitivity to poke despite the evidence not being strong.

      We agree with the reviewer that the experiments carried out show little difference between the response of OSCA1.2 WT and OSCA1.2 with OSCA3.1 BLD, and we have stated so (line #259: “Substituting the BLD of OSCA1.2 for that of OSCA3.1 had little effect on poke- or stretchactivated responses. Although these results suggest that the BLD may not be involved in modulating the MA response of OSCA1.2…”). However, the section of the discussion that the reviewer points out also considers evidence provided by recent reports from Zheng, et al. (Neuron, 2023) and Jojoa-Cruz, et al. (Structure, 2024) and we suggest an hypothesis to reconcile our findings with these new evidence.

      OSCA1.2 has four Lys residues in TM4 and TM6b at the pore fenestration site, which were shown to interact with the lipid phosphate head group, whereas two of the equivalent residues in OSCA3.1 are Ile. In the section "Substitution of potential lipid-interacting lysine residues", the authors made K435I/K536I double mutant for OSCA1.2 to mimic OSCA3.1 and observed poor response to poking but an intact response to stretch. Did the authors mutate the Ile residues in OSCA3.1 to Lys, and did the mutation confer channel sensitivity to poking stimuli resembling OSCA1.2? The reviewer thinks it is necessary to perform such an experiment, to thoroughly suggest the importance of the four Lys residues in lipid interaction for channel mechanoactivation.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are no longer able to perform such experiments.

      Reviewer #3 (Public Review):

      Summary:

      Jojoa-Cruz et al provide a new structure of At-OSCA3.1. The structure of OSCA 3.1 is similar to previous OSCA cryo-em structures of both OSCA3.1 and other homologues validating the new structure. Using the novel structure of OSCA3.1 as a guide they created several point mutations to investigate two different mechanosensitive modalities: poking and stretching. To investigate the ability of OSCA channels to gate in response to poking they created point mutations in OSCA1.2 to reduce sensitivity to poking based on the differences between the OSCA1.2 and 3.1 structures. Their results suggest that two separate regions are responsible for gating in response to poking and stretching.

      Strengths:

      Through a detailed structure-based analysis, the authors identified structural differences between OSCA3.1 and OSCA1.2. These subtle structural changes identify regions in the amphipathic helix and near the pore that are essential for the gating of OSCA1.2 in response to poking and stretching. The use of point mutations to understand how these regions are involved in mechanosensation clearly shows the role of these residues in mechanosensation.

      Weaknesses:

      In general, the point mutations selected all show significant alterations to the inherent mechanosensitive regions. This often suggests that any mutation would disrupt the function of the region, additional mutations that are similar in function to the WT channel would support the claims in the manuscript. Mutations in the amphipathic helix at W75 and L80 show reduced gating in response to poking stimuli. The gating observed occurs at poking depths similar to cellular rupture, the similarity in depths suggests that these mutations could be a complete loss of function. For example, a mutation to L80I or L80Q would show that the addition of the negative charge is responsible for this disruption not just a change in the steric space of the residue in an essential region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have several questions regarding some of the aspects of your study:

      Mutation of the hydrophobic W75 and L80 in OSCA1.2 to charged residues significantly decreases the poke response in OSCA1.2 without affecting the stretch response. However, W75 and L80 are also present in OSCA3.1, which does not respond efficiently to poke. You conclude that these two residues are important for the poke response, but do not delve into why, if these residues are important, OSCA3.1 is not poke-sensitive.

      In addition, mutation of the OSCA1.2 AH to resemble that of OSCA3.1 does not produce channels that are less poke-sensitive. Given the data presented, if AH were a universal "poke sensor", one could also expect WT OSCA3.1 to exhibit a robust poke response, like OSCA1.2. Here I think it would be important to explain in more detail how this data might fit together.

      We thank the reviewer for bringing up this issue. We decided to test the importance of the AH due to the presence of similar structures in other mechanosensitive channels. Our data showed that single and double mutants of the AH of OSCA1.2 affected its poke response but not stretch. This supports the idea of the AH involvement in the poke response. Yet, we agree that the differences in the AH between OSCA1.2 and OSCA3.1 (P77R mutation) do not explain the higher threshold of OSCA3.1, we have explicitly added this in line #255. The particular OSCA3.1 phenotype may be due to other differences in the structure, for example, differences in the membrane fenestration area, or a combined effect of several differences, which we believe is more likely.

      I also have some questions about the protein-lipid interactions in the fenestration. A lipid has been observed in this location in both OSCA1.2 and OSCA3.1 structures. Mutation of the two OSCA1.2 lysines to isoleucines results in channels that are resistant to poke which leads to the conclusion that the interactions between the fenestration lysines and lipids are important for the poke response.

      Here, there are several questions that arise but are not answered:

      It is not shown what happens when OSCA3.1 isoleucines are mutated to lysines - do these mutants result in poke-able channels? Is the OSCA3.1 mechanosensing altered?

      We performed a preliminary test on OSCA3.1 I423K/I525K double mutant (n = 3). However, we did not see an increase in poke sensitivity. We attributed this to other unexplored differences in OSCA3.1 having an effect in channel mechanosensitivity.

      It is implied that the poke response is predicated on the lysine-lipid interaction. However, lipid densities are present in both OSCA1.2 and OSCA3.1 structures, indicating that both fenestrations interact with lipids. How can we be certain that the mutation of lysine to isoleucine does not disrupt an inter-protein interaction rather than a protein-lipid one? For example, the K435I mutation might disrupt interactions with D523 or the backbone of G527?

      The reviewer brings up a good point. We believe the phenotype seen is due to a different strength in the interaction between lipids and proteins, however, disrupted interaction with other residues is a valid alternative explanation. We agree that the suggested experiments will further clarify the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      Similarly, the effects of single lysine-to-isoleucine (K435I or K536I) mutations are not explored.

      The observed effect might be caused by only one of these substitutions.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      I also wanted to take this opportunity to ask a couple of philosophical (?) questions about using a mammalian system to study ion channels that have evolved to function in plants. Your study highlights the intimate relationship between the lipid bilayer and protein function/mechanosensitivity. Plant cells contain high levels of sterols and cerebrosides that would significantly affect both cell stiffness and the specific interactions that can be formed between the protein and the lipid bilayer. I wonder if the properties of the lipid bilayer might shift the thresholds for poke and/or stretch stimuli and if structural elements that do not appear to have a major role in mechanosensation in a mammalian cell (e.g., BLD) might be very influential in a lipid environment that more closely resembles that of a plant?

      Conversely, is it possible that OSCA channels are not poke-sensitive in plant cells? These questions are beyond the scope of your study, but they might be a nice addition to your discussion.

      The reviewer poses a great question. Electrophysiological approaches for studying plant mechanosensitive channels suffer the limitation of not being able to fully reconstitute the environment of a plant cell. To be able to patch the cell, the cell wall needs to be disposed of, which eliminates the tension generated from this structure onto the membrane. In that sense, performing these assays in plant cells or another system would not give us a fully accurate picture of the physiological thresholds of these channels. Given this limitation, we performed our study with mammalian cells given our expertise with them. Like the reviewer, we are also intrigued by the effect of different membrane compositions on the behavior of OSCA channels and how these channels will behave under physiological conditions, but we agree with the reviewer that these questions are out of the scope of our work. To address this point, in line #294 we have added: “It is also important to note that the membrane of a plant cell contains a different lipid composition than that of HEK293 cells used in our assays, and thus these lipids, or the plant cell wall, may alter how these channels respond to physiological stimuli.”

      Line 313 For structural studies, human codon-optimized OSCA3.1. Could you please clarify what this means?

      We have changed the phrase to “For structural studies, the OSCA3.1 (UniProt ID: Q9C8G5) coding sequence was synthesized using optimized codons for expression in human cells and subsequently cloned into the pcDNA3.1 vector” in line #327 to clarify this sentence.

      As a final comment, in the methods you use references to previously published work. I would strongly encourage you to replace these with experimental details.

      We understand the reviewer’s argument. However, this article falls under eLIFE’s Research Advances and will be linked to the original published work to which we reference the method. As suggested in the guidelines for this type of article, we only described the methods that were different from the original paper.

      Reviewer #2 (Recommendations For The Authors):

      (1) In line 85, provide C-alpha r.m.s.d. values for the structural alignment among OSCA3.1, OSCA1.1, and OSCA1.2 protomers.

      As requested, we have added the C-alpha RMSD in line #86.

      (2) In line 90, should the figure reference to Fig. 1d be Fig. 1e?

      We thank the reviewer for catching this error. We have corrected it in the manuscript.

      (3) In lines 89-94, what putative lipid is it resolved in the OSCA3.1 pore? Can the authors assign the lipid identity? Is this the same or different from the lipids resolved in OSCA1.2, OSCA1.1, and TMEM63?

      In the model, we have built the lipid as palmitic acid to represent a lipid tail, but the resolution in this area makes it difficult to ascertain the identity of said lipid, hence we cannot compare to lipids in other orthologs.

      (4) In lines 115-121, the authors describe the presence of AHs and their functional roles in MscL and TMEM16. It will be more informative if the authors can add figures to show the structure of MscL and highlight the analogous AH. In addition, the current Supplementary Fig. 6 is not informative so it should be improved. It is not clear to the reviewer why that stretch of helix in TMEM16 is equivalent or analogous to the AH in OSCAs, either sequence alignment or a detailed structural alignment is helpful to address this point. Also, in lines 120-121, it says this helix in TMEM16 "does not present amphipathic properties", please show the sequence or amphipathicity of the helix.

      We thank the reviewer for the feedback on this figure. Supplementary Fig. 6 has been thoroughly modified to address the reviewer’s concerns. We now include a panel showing the structure of MscL and its amphipathic helix. We have modified the alignment of OSCA3.1 to a TMEM16 homolog to make clearer the homologous positioning of the helices in question and zoom in to show their sequences.

      (5) In discussion, lines 249-257, the authors referred to a recent study that suggested three evolutionarily coupled residue pairs located on BLD and TM6b. The authors speculate that the reason they did not observe a significant effect of channel response to poke/stretch stimuli in the BLD swapping between OSCA1.2 and 3.1 is due to the 2 of 3 salt bridges remaining for the residue pairs. To test the importance of these residue pairs and their coupling for channel gating, instead of swapping the entire BLD, can the authors systematically mutate the residue pairs, disrupt the salt-bridge interactions, and analyze the effect on channel response to mechanical force?

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (6) The reviewer suggests the authors tone down the elaboration of polymodal activation of OSCA by membrane poking and stretch.

      We believe the idea of polymodal activation is sufficiently toned down as we only postulate it as a possibility and following we give an alternative explanation based on methodological limitations: “Nonetheless, the discrepancy could be due to inherent methodological differences between these two assays, as whole-cell recordings during poking involve channels in inaccessible membranes (at the cell-substrate interface) and channel interactions with extracellular and intracellular components, while the stretch assay is limited to recording channels inside the patch.”

      (7) In lines 81-83, the authors described the BLD as showing increased flexibility, and the EM map at this region is less well resolved for registry assignment. In the method for cryo-EM image processing and Supplementary Fig. 1, the authors only carried out 3D refinement and classification at the full channel level. Have the authors attempted to do focus refinement or classification at the BLD domain in order to improve the local resolution or to sort out conformational heterogeneity? The reviewer suggests doing so because the BLD domain is a hot spot that the authors have proposed to play an important role in OSCA mechanosensation. Conformational changes identified in this region might provide insights into its role in the channel function.

      We thank the reviewer for this suggestion. We have performed focused classification on the BLD with and without surrounding regions and, in our hands, it did not improve the resolution or provide further insights.

      Reviewer #3 (Recommendations For The Authors):

      Here are a few specific minor corrections that should be addressed

      (1) In lines 117-135, in the discussion of Figure 2, the data shows an apparent increase in the poking threshold to gate W75K/L80E. The substantial increase in the depth required to gate the channel suggests that these channels are less sensitive to poking. Would it be possible to compare the depth at which these two patches show activity and the depth at which the other 22 cells ruptured? Line 161 mentions that the rupture threshold of HEK cells is close to the gating of OSCA3.1 at 13.8 µm.

      The distance just before the cell ruptured in 22 cells with no response was 12.5 +/- 2.5 um. The distance at which the cells ruptured was 0.5 um more (13 +/- 2.5 n=22). We have added this last value in line #137.

      (2) Would it be possible in Figures 2 panels b and c, 3, and figure 4 to label the WT as WT OSCA1.2?

      We thank the reviewer for pointing this out. We agree this modification will improve the clarity of the figures and have changed the figures to follow the reviewer’s suggestion.

      (3) Can you provide a western blot of the mutations described in Figure 2? This would provide insight into the amount of protein at the cell surface and available to respond to poking, the stretch data shows that these channels are in the membrane but does not show if they are in the membrane in similar quantities.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

      (4) The functional differences between the two channels are projected to be tied to several distinct point mutations, however, the data could be strengthened by additional point mutations at all sites to show that the phenotypes are due to the mutations specifically not just any mutation in the region.

      We thank the reviewer for this suggestion. We agree that the suggested experiments will further improve the quality of the results, but we are unable to perform such experiments due to the authors having moved on from the respective labs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, we discovered several erroneous duplicate values in our source data sets from figures S1, 2, 4, and 8, due to mistakes from MATLAB analysis. We have re-analyzed the data and corrected these errors; since limited values in each data set changed, the results were unaffected. The changes are reflected in updated figures and source data.

      Overall, the reviewers gave a positive assessment of our work, but had reservations about:

      (1) Specifics of the iGluSnFR data and analysis

      (2) Overstatement/oversimplification of the importance of syt7 and Doc2

      (3)The strength and interpretation of the EM data 4) The relevance and parametrization of the modeling data

      (1) We have clarified aspects of the iGluSnFR data and analysis in the point-by-point response, as well as in the manuscript.

      (2) We have toned down our statements about the role of syt7 and Doc2 throughout, and emphasized that the DKO data are conclusive and reveal that there must be additional Ca2+ sensors for AR. We have also added to the discussion, noting syt3 as a strong candidate to perform a function analogous to syt7 (to regulate docking), along with another protein (or proteins) performing a role similar to Doc2 (directly in fusion) that has not been identified as a candidate in the field yet.

      (3) We feel the EM data are consistent with the model as much as they could be, and while a sequence of events can only be inferred from time-resolved EM, we believe our work falls in the scope of reasonable interpretation. However, upon reexamining the terminology of ‘feeding’ and related discussion, we realized this could be misleading, so these sections have been revised.

      (4) We have improved the description and interpretation of the model in the manuscript and provide a detailed rationale of our approach in the point-by-point-response.

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) It is surprising the optical GluSnFR approach reports so much asynchronous release in control hippocampal neurons after single stimuli (36% of release). This seems much higher than what is observed at most synapses, where asynchronous release is usually less than 5% of the initial response to the first evoked stimuli. Any thoughts on why the GluSnFR approach reports such a high level of asynchronous release? Could the optical approach be slower in activation kinetics in some cases, which artificially elevates the asynchronous aspect of fusion? This seems to be the case, given electrophysiology recordings in Figure 3 show the asynchronous release component as ~10% in controls at the 1st stimuli (panel C).

      The reported proportion of asynchronous release from cultured hippocampal neurons varies, contingent upon a range of factors (calcium concentration, how asynchronous release is quantified, etc). However, we would argue that there is considerable evidence for a higher percentage of asynchronous release (more than the <5% indicated by the referee) at synapses in the hippocampus. In our previous work on Doc2 using electrophysiology in cultured hippocampal neurons (Yao et al., 2011, Cell), it was noted that there is an approximate 25% incidence of asynchronous release after a single action potential. Furthermore, Hagler and Goda also reported a 26% ratio of asynchronous neurotransmitter release, also from cultured hippocampal neurons (Hagler and Goda, 2001, J Neurophysiol.).

      We also point out that another study using iGluSnFR to measure synchronous/asynchronous release ratios, with more sophisticated stimulation, imaging, and analysis procedures than ours, found an average ratio of synchronous to asynchronous release that is in-line with our values, with considerable variability among individual boutons (Mendonça et al., 2022; 25% asynchronous release after a single action potential). We feel that iGluSnFR is actually the superior approach (barring specialized e-phys preparations that can measure quantal events at individual small synapses; please see Miki et al., 2018), as it directly measures the timing of individual release events at individual boutons. By comparison, in most electrophysiology experiments there is a large peak of synchronous release from many synapses. iGluSnFR also bypasses postsynaptic considerations such as receptor kinetics and desensitization, or asynchronous release being poorly aligned to AMPA receptors, per a recent study of ours (Li et al., 2021), and a study showing 25% of asynchronous release occurs outside the active zone (Malagon et al., 2023). All these factors could obscure asynchronous release or otherwise make it difficult to measure by electrophysiology. To our knowledge, the approach in Miki et al., 2018 best bypasses these limitations, though the data in that study are from exceptionally fast and synchronous cerebellar synapses, and so cannot be directly compared to our findings. Thus, it is possible that iGluSnFR can report more asynchronous release than electrophysiological recordings, but this may actually reflect real biology.

      This being said, after considering the reviewer’s points we realized that our analysis method likely underestimates the total amount of synchronous release when using the high-affinity sensor (Figure 1). We quantify release by ‘events’ (that is, peaks), which does not take into account multiquantal peaks resulting from near-simultaneous multivesicular release. We have previously determined by quantal analysis that most synchronous peaks after a single action potential are multiquantal, while for asynchronous release there are still multiquantal events but they are in the minority (Vevea et al., 2021; Mendonça et al., 2022). So, in our data sets, the total amount of synchronous release is underestimated more so than asynchronous release. Thus, 37% asynchronous release is probably an overestimate, which explains the 12% difference compared to Mendonça et al., 2022, who used sophisticated quantal analysis (though that study also was performed at room temperature, which could also cause differences). We have now pointed this out in the text:

      “This ratio of synchronous to asynchronous release is likely an underestimate, since our analysis only counts the number of peaks (‘events’) and does not take into account multiquantal peaks resulting from near-simultaneous multivesicular release. We have previously determined by quantal analysis that most synchronous peaks are multiquantal after a single action potential, while for AR there are still multiquantal events but they are in the minority (Vevea et al., 2021). So, in our measurements, the total amount of synchronous release is underestimated; sophisticated quantal analysis using the A184V iGlusnFR recently found the percentage of total release that is AR to be ~25%, with otherwise similar results to ours (Mendonça et al., 2022) . Nonetheless, this approach faithfully distinguishes synchronous from asynchronous release…”

      However, while this method underestimates total synchronous release, it does not misclassify synchronous events as asynchronous because of kinetics. Even the slower iGluSnFR variant does not have a rise time that would misrepresent a synchronous event as asynchronous (Marvin et al., 2018). Mendonça et al (2022) note that averaged iGluSnFR traces for the A184V are biphasic, with the transition from fast to slow component occurring around 10 ms. These authors also determined that the temporal resolution of glutamate imaging is actually limited by the frame rate, not the biosensor, and based on simulations found that detection time was biased in their data to be about 1 ms earlier than the actual timing of release events.

      The reviewer’s final point about Figure 3 is a misunderstanding, as these are data from iGluSnFR, not electrophysiology. The asynchronous proportion in these experiments is ~10% because, as noted in the manuscript, we used a faster, lower-affinity variant of iGluSnFR in train stimulation experiments (Figure 2). In contrast to the high-affinity sensor, as explained above, in our analysis this variant would be expected to underestimate the amount of asynchronous release because it fails to detect many uniquantal release events (presumably those further from the focal plane, with too little fluorescence to reach our detection threshold) as evidenced by the fact that the apparent mini rate is much lower as measured by this sensor compared to higher-affinity variants. Since synchronous peaks are mostly multiquantal after a single action potential, while asynchronous peaks are mostly uniquantal, a fraction of release going undetected results in mostly smaller synchronous peaks, which are counted the same in our analysis while many asynchronous peaks are missed entirely. We have added a bit more clarification in the text to avoid confusion on this point:

      “This sensor underestimates the fraction of AR (~10% of total release for a single action potential) as compared to the A184V variant used above that overestimates the fraction of AR (~35% of total release for a single action potential). This is because it is less sensitive and misses many uniquantal events; as discussed above, our analysis quantifies release by number of peaks, and most synchronous peaks are multiquantal after a single action potential, while most AR peaks are uniquantal (Vevea et al., 2021). Still, the S72A variant reported the same phenotypes as the A184V variant after the first action potential (Fig. 3B, C).”

      As discussed above, we think the synchronous-to-asynchronous ratio is actually harder to determine with electrophysiology, and the preparations are different (acute slice vs dissociated culture); still, our electrophysiological measurements are in line with the iGluSnFR data: 29% for Figure 2 and 26% from the first action potential of Figure 4. These values also agree with the findings from Yao et al. (2011) and Hagler and Goda (2001), discussed above.

      Finally, the ultimate goal of our study was to measure the effects of deleting Doc2 and syt7 on synchronous and asynchronous release, not to measure the exact ratio between the two. If iGluSnFR greatly misreported synchronous events as asynchronous, we would expect the results from the knockouts to diverge between our imaging and electrophysiology data, which they do not. We have also previously applied this approach to syt1 knockouts, showing the characteristic desynchronization of release (Vevea et al., 2020). Furthermore, the high-affinity and low-affinity iGluSnFR variants, which as discussed above in our analysis overestimate and underestimate the fraction of release that is asynchronous, respectively, both reported the same phenotypes.

      (2) In the acute hippocampal physiology traces, it looks like the effect on cumulative release in Doc2A mutants only appears around ~40 msec after stimulation. This is a relatively late phase of asynchronous release. Any reason this effect does not show up sooner, where most asynchronous fusion events occur, or is this due to some technical aspects of the physiology clamp that masks earlier components?

      The reviewer is correct, although the curves actually diverge at around 30 ms (see image below). This can be attributed to the fact that the EPSCs in our recordings are broad, probably because of the large number of different synaptic inputs captured in our stimulation and recording paradigm (note that the currents are also quite large), resulting in a broad spread in the timing of release. That is to say, synchronous release is likely still occurring fairly late into the trace, obscuring any changes in asynchronous release earlier than 30 ms. This is not related to Doc2 specifically, as the EGTA charge transfer curve also diverges from the control curve at the same time. This EGTA control gives us confidence that our broad EPSCs still faithfully report synchronous and asynchronous release, even if the exact timing is spread-out to some extent.

      Author response image 1.

      (3) How do the authors treat multi-vesicular release in their synchronous/asynchronous quantification? It was not clear from the methods section. Many of the optical traces show dual peaks - are those that occur in the 10 ms bin assigned to synchronous and those outside to asynchronous? Are the authors measuring the area of the response or just the peak amplitude for the measurements? The methods seem to indicate peak amplitude, but asynchronous is better quantified with area measurements for electrophysiology.

      This is an excellent point by the reviewer, and in the Methods we now explicitly state how we treat multivesicular release/multiple peaks in our analysis. Release timing is assigned based on peak timing, including when there are multiple peaks at the same bouton.

      “Timing of release was determined based on the frame in which the signal peaked, including for dual peaks in the case of synchronous and asynchronous release at the same bouton.”

      Regarding the comparison to area measurements for electrophysiology, we agree with the reviewer, which is why we used such an approach for our electrophysiological data. However, a key advantage of iGluSnFR is the ability to resolve individual quantal events (or, as is often the case for synchronous release, simultaneous multiquantal events), so temporal binning of the peaks is the appropriate analysis approach regarding these data. This is comparable to the analysis used for electrophysiology recordings of responses from single small synapses, which also detects individual quantal events, where release timing is calculated as the latency between the stimulus and the beginning of each EPSC (Miki et al., 2018).

      This leaves the general concern that multiple vesicle fusions at the same bouton that occur milliseconds apart could blur together and make it more difficult to accurately determine release timing, particularly with the slower sensor used in the single-stim experiments in Figure 1. We believe this is not a major concern, since we also performed experiments with the much faster sensor, S72A which can resolve peaks from 100 Hz stimulation (Marvin et al., 2018). Furthermore, while the peak-calling method we used is crude by comparison, the synchronous/asynchronous ratio we report is similar to that of Mendonça et al. (2022) who used a higher frame rate and deconvolution to produce more easily distinguishable quanta when synchronous and asynchronous release occur at the same bouton after the same action potential.

      (4) It would be relevant to show that calcium binding mutations in Syt7 do not support SV docking/capture in the current assays, given some evidence for Syt7 calcium-independent activities has been reported in the field.

      To our knowledge, when using the correct mutations to block calcium binding, none of the reported syt7 knockout phenotypes (including those reported by our laboratory in Liu et al., 2014) have ever been rescued. However, this does not formally rule out a calciumindependent role in transient docking. For the EM data, we originally considered including rescue experiments with normal and non-calcium binding mutants of both syt7 and Doc2 in our study. However, our EM approach is spectacularly expensive and labor-intensive and such experiments would as much as triple the amount of EM work in the study. We plan on doing such experiments, and there is a great deal of additional structure-function work to be done on both these proteins. We feel that reassessing the calcium binding mutants with iGluSnFR and zap-andfreeze falls into the scope of this future work. For now, this as a limitation of the current study.

      (5) The authors are not consistent in how they describe the role of the two proteins in asynchronous release, with the reader often drawing the impression that these two proteins solely mediate this aspect of SV fusion. As the authors note, some synapses do not require Syt7 or Doc2 for SV release, indicating different asynchronous sensors or molecular components at distinct brain synapses. Indeed, asynchronous release is only reduced, not eliminated, in the double mutants the authors report, so other components are at play even in these hippocampal synapses. The authors should be more consistent in noting this in their text, as the wording can be confusing as noted below:

      "Together, these data further indicated that AR after single action potentials is driven by Doc2α, but not syt7, in excitatory mouse hippocampal synapses."

      "after a single action potential, Doc2α accounts for 54-67% of AR at hippocampal excitatory synapses, whereas deleting syt7 has no effect."

      "This, along with our finding that syt7/Doc2a DKOs still had remaining AR, raises the possibility that there are other unidentified calcium sensors for AR."

      We have made adjustments throughout to not overstate the role of syt7 and Doc2, including at the locations the reviewer points out. This is an important point from the reviewer, and not just to avoid misleading readers. It is itself interesting; in the original manuscript we should have emphasized, far more than we did, that the DKO experiments strongly point to asyet-unidentified proteins being involved in asynchronous release. This has been rectified in the revised text: we now emphasize that another calcium sensor for asynchronous release is likely present at all relevant points in the manuscript.

      (6) Given the authors' data, I don't think it's fair to say "raises the possibility" of other AR sensors, as almost 50% of AR remained in the Doc2A mutant in some of the experimental approaches. Clearly, other AR calcium sensors or molecular components are required, so better to just state that in the 1st paragraph of the discussion with something like: "Given syt7/Doc2a DKOs still had remaining AR, further work should explore the diversity of synaptic Ca2+ sensors and how they contribute to heterogeneity in synaptic transmission throughout the brain."

      We agree; this was poor phrasing on our part. We meant to imply that there may be proteins that have not even been considered, because it is also technically possible that the remaining asynchronous release is supported by the known machinery (i.e., syt1). We have changed “raises the possibility” to “indicates”.

      Minor points:

      (1) Remove "on" from the abstract sentence "Consequently, both synchronous and asynchronous release depress from the second pulse on during repetitive activity".

      We have changed “on” to “onward” to reduce ambiguity.

      (2) Shouldn't syt7 be Syt7 and syt1 be Syt1 when referring to the proteins?

      To our knowledge there is not a hard-and-fast convention for non-acronym mouse protein abbreviations. The technically correct full name is lowercase, so we find it reasonable to use lowercase for the abbreviation.

      (3) Both calcium and Ca2+ are used in the manuscript - better to stick to one term throughout.

      We thank the referee for catching this error; we now use only “Ca2+” throughout our study.

      Reviewer #2 (Recommendations For The Authors):

      (1) While the GluSnFR experiments appear to be well done, what is striking is the relatively small and "jagged" fluorescent responses. Are the authors concerned that they are missing many fast (with peaks occurring within 10 ms) synchronous events and incorrectly identifying them asynchronous? If this is not a concern, why not?

      With respect to the small raw responses, this is the nature of measuring individual quanta from individual boutons while imaging at 100 Hz, even with the excellent signal-to-noise ratio of the iGluSnFR variants we used.

      As far as kinetics, as noted in the response to Reviewer 1 point #1, even the slower iGluSnFR variant has a rise time fast enough that it cannot misrepresent a synchronous event as asynchronous (Marvin et al., 2018). This threshold for iGluSnFR has been used by others: see Mendonça et al., 2022, who note that averaged iGluSnFR traces are biphasic, with the transition from fast to slow component occurring around 10 ms. The ‘jaggedness’ is in large part due to the frame rate (100 Hz); Mendonça et al., 2022 used 250 Hz and deconvolution to produce smoother, cleaner traces, but still achieved similar results to us.

      Finally, we reiterate what we wrote in response to Reviewer 1 point #1: “the ultimate goal of our study was to measure the effects of deleting Doc2 and syt7 on synchronous and asynchronous release, not to measure the exact ratio between the two. If iGluSnFR misreported synchronous events as asynchronous, we would expect the results from the knockouts to diverge between those data and our electrophysiology data, which they do not. We have also previously applied this approach to syt1 knockouts, showing the characteristic desynchronization of release (Vevea et al., 2020). Also, the phenotypes reported by the faster and slower iGluSnFR variants were identical. ”

      (2) On page 6, I'm not sure I would agree that short-term plasticity is "so catastrophically disrupted". It is probably enough to say that plasticity is disrupted in the ko.

      We argue that syt7 knockout causes the most severe phenotype specific to short-term plasticity so far described (that is, without affecting initial release probability), but we have changed “catastrophically” to “strongly”.

      (3) Differences in the post-stim number of "docked" vesicles between conditions are, in absolute numbers, very small. For example, it seems that the number of docked vesicles goes from ~ 2.2 prior to stimulation, to ~ 1.5 in the first 5 ms window following stimulation. While this number may be statistically significant, I worry about bias and sampling errors. It is comforting that images are randomized prior to analysis. Nevertheless, the differences are very small and this should be explicitly acknowledged.

      This ~40% decrease in number of docked vesicles in dissociated cultured hippocampal neurons has been consistent throughout all our studies using flash-and-freeze and zap-and-freeze electron microscopy (Watanabe et al., 2013; Kusick et al., 2020, Li et al., 2021), as well as those of other labs (Chang et al., 2018). Statistically, 40% is far beyond the limit to detect differences between samples with 200-300 synapses quantified per condition and an average of ~2 docked vesicles per image. The low absolute number of docked vesicles per synaptic profile (since the 40 nm section only captures a portion of the active zone, which contain an average of 12 docked vesicles in total; Kusick et al., 2020) is not relevant except that it does reduce the statistical power to detect differences, but this is compensated for by the huge number of images we capture and annotate per sample. We are able to detect differences in fusion and endocytic pits (albeit with much less precision and sensitivity), such as the Doc2 phenotype in this study, even though these events are an order of magnitude rarer than docked vesicles. Biologically, in our view, a 40% reduction in all docked vesicles across all synapses, considering that the majority of synapses do not have even 1 vesicle fusion, after only a single action potential, is substantial. We have even been puzzled why there is such a large decrease, but as stated above this result has been consistent for a decade of using this approach. For comparison to the magnitude of baseline docking changes in mutants, this 40% is similar to the effect of deleting synaptotagmin 1 (Imig et al, 2014; Chang et al, 2018; note in Imig et al., considered a gold standard in the field, the average number of docked vesicles per tomogram is ~10, but there are fewer than 25 tomograms per sample, so the actual amount of sampling in our data set is slightly greater).

      (4) The related point is that how can one know about the "transient" nature of vesicle docking when the analysis is performed on completely different sections from different cells? Moreover, what does it mean that the docked granules have recovered or not recovered (abstract)? This should be explained in more detail.

      This is a fundamental difficulty of interpreting time-resolved electron microscopy data. We cannot observe a sequence of events at any given synapse, but only try to measure each time point as accurately as we can and interpret the data.

      By ‘recovery’ we simply mean that the number of docked vesicles at a given time point after stimulation is similar to the no-stimulation baseline. We have replaced ‘recovery’ in the abstract with ‘replenishment’ to avoid confusion.

      We now realize that in the context of this study the term ‘transient docking’ is confusing, since we only measured out to 14 ms in this study. In experiments with samples frozen at 5 ms, 14 ms , 100 ms, 1,s and 10 s, the return to baseline at 14 ms appears temporary, since samples frozen at 100 ms have a similar reduction of docked vesicles as those at 5 ms (Kusick et al., 2020). The number of vesicles again returns to baseline at 10 s, so we used the term ‘transient docking’ to distinguish the recovery at 14 ms from the slower and presumably permanent return to baseline that takes 10 s. The apparently temporary nature of this process is why we believe it contributes to facilitation, which likewise peaks soon after stimulation and decays over the course of ~100 ms.

      To make the transient docking terminology less confusing, we have removed the word ‘transiently’ from the title and added a clarification of what transient docking is when it is first mentioned:

      “vesicles can dock within 15 ms of an action potential to replenish vacated release sites and undock over the next 100 ms”

      As noted by the reviewer, such a sequence of events, where vesicles dock within 14 ms, then undock over the course of 100 ms, then dock again over the course of 10 s, is an inference, but is based on predictions from electrophysiological data and modeling (see Silva, Tran, and Marty, 2021 for review; those authors use the term ‘calcium-dependent docking’ but this refers to the same process), and as yet there is no way to directly observe vesicle dynamics at synapses down to nanometer resolution in live cells.

      On the reviewers recommendation we have removed references to syt7 ‘feeding’ vesicles from the abstract and the beginning of the “physiological relevance” section of the discussion. This phrasing could imply a direct molecular pipeline between syt7 and syt1/Doc2, which is a misrepresentation of our actual model that syt7 simply helps recruit docked vesicles.

      “These findings result in a new model whereby syt7 drives activity-dependent docking, thus providing synaptic vesicles for synchronous (syt1) and asynchronous (Doc2 and other unidentified sensors) release during ongoing transmission.”

      “In the case of paired-pulse facilitation it can supply docked vesicles for syt1-mediated synchronous release to enhance signaling; it likely functions in the same manner to reduce synaptic depression during train stimulation. In the case of AR, syt-7-mediated docked vesicles can be used by Doc2α, which then directly triggers this slow mode of transmission.”

      (5) In this study, docking is phenomenologically defined and, therefore, arbitrary; vesicles are defined as docked if there is no space between them and the plasma membrane. What happens if the definition is broadened to include some small distance between the respective membranes? Does the timecourse of "recovery" change?

      We always quantify at least all vesicles within 100 nm of the active zone; these data are shown in Figure S6D. We show only docking in the main figures because, consistent with our previous work and as stated in the text, we found no change in the number of vesicles at any distance from the plasma membrane at the active zone after stimulation, nor did we find any difference in the mutants. In our previous work on syt7 (Vevea et al., 2021) we quantified all the vesicles within the synapse and also found no differences after stimulation or in the KO further from the active zone.

      The reviewer is correct that the term ‘docking’ at synapses is often used quite arbitrarily; even among morphological studies the definition is inconsistent. We consider our strict docking definition that we explain in the manuscript (in high-pressure-frozen and freeze-substituted samples) of no visible distance between membranes to be less arbitrary, since only the number of these attached vesicles decreases after stimulation (Watanabe et al., 2013, Kusick et al., 2020, Li et al., 2021, this study) and in SNARE knockouts (Imig et al., 2014). Broadening the definition, as is done in some other studies (for example Chang et al., 2018), retains the effect, since the majority of vesicles within 10 nm are at ~0 nm, but again all that is actually changing is the number of vesicles at ~0 nm.

      (6) My overall impression is that this model is not adding much to the story. Specifically, the model was not fit to any data and has a huge number of states and free parameters given the dynamics that it is trying to capture (ie I think this is overkill). Many of the free parameters were arbitrarily constrained with little to no justification and there was minimal parameter space exploration, in part because the model wasn't being quantitatively constrained to any data. While advertised to be a 3-state model, there is a combinatorial explosion of substates by distinguishing between levels of calcium occupancy simultaneously in three separate calcium sensors so that one ends up with 9 empty states, 9 tethered states, and 45 docked states for a total of 63 distinguishable states. At 63 states and 21 free parameters, one could of course model just about any dynamics imaginable. But the relatively simple dynamics of AR and its perturbation by removal of Doc2 and Syt7 can likely be captured with far fewer states and parameters (such as Neher's recent proposal). Specifically, starting with the Neher ES-LS-TS model along with adding a transient labile docked state affected by Syt7 and Doc2 (TSL in Neher nomenclature), I wonder if the authors could more or less capture what they are observing during stimulus trains. The advantage of a minimal model is that readers don't have to struggle with fairly elaborate systems of differential equations and parameter plots to get a feel for what's going on. Especially since the point of this model is to develop intuition rather than to capture with physical accuracy exactly what is transpiring at a docked vesicle (which would require many more details excluded from the current model).

      We would like to thank the reviewer for pointing out unclarities and mistakes in the description of the model. We have worked on improving on these points. We now more elaborately explain why we have made certain assumptions and what decisions we have made to constrain the parameter values in the model. As the reviewer points out other models might also work in explaining the dynamics of the experimental data presented in this paper. Thus, we agree that it is unlikely that this theory and model implementation is the only one that can account for the observations. With this model we aimed to investigate whether the theory proposed based on the experimental data could indeed reproduce the dynamics that are observed experimentally. In the section below we will briefly explain why we made different decisions in constructing the model to comment on the reviewer’s concerns. We will also discuss more precisely what adjustments we have made to the model’s description to improve its readability and be open about its limitations.

      One of the main concerns of the reviewer is that the model has many states and free parameters, some of which are poorly constrained. We agree that the model indeed contains many states. However, in essence, the model corresponds to a two-step docking model, in which SVs get tethered to an empty release site and subsequently dock/prime in a fusion-competent state. This structure of the model corresponds to the ES-LS-TS model (Neher and Brose 2018, Neuron) mentioned by the reviewer or the replacement-docking model (Miki et al., 2016, Neuron). As the reviewer points out, by making the transition rates calcium-dependent in those models, we would indeed be able to capture similar dynamics with these models as with ours. However, instead of directly implementing calcium-dependent rates, we let the rates depend on the number of calcium ions bound to syt7, Doc2 and Syt1. We decided to do so, as some information on the calcium binding dynamics of these proteins is available. By simulating the calcium binding to the proteins explicitly we could integrate this knowledge into our model. Moreover, by explicitly simulating calcium-binding to these proteins, we included the time it takes before a new steady state-binding occupancy is reached after a change of calcium levels. Especially for Ca2+ sensors with slow kinetics such as, syt7 and Doc2, this is crucial. These properties are highly relevant for asynchronous release (which we quantified as the release >5 ms after onset of AP). The consequence is that because of combinatorics (e.g., if we assume 5 calcium ions to bind to syt1 and 2 to Doc2 this leads to 24 different states), explicit simulation of all relevant states extends the number of potential different states a vesicle can be in. In the main text of the manuscript, we added this explanation on why we decided on the structure of the model as it is presented and discussed it in context of other previous models.

      Our decision to simulate calcium binding to syt1, syt7 and Doc2 also increased the number of parameters in our model. As the reviewer points out, the large number of parameters in our model compared to the relative low number of features in the experimental behavior the model is compared to – is a limitation. However, after thorough exploration of the model, we are certain that the model cannot create any type of desired dynamics. The large number of parameters does make it possible that different combinations of parameter values would lead to similar responses, as can be seen in the parameter space exploration in Figure S9. This means that our modelling effort does not provide estimates of parameter values. We now mention this explicitly in the discussion section of the model. Some of the parameter values we were able to constrain based on previous literature (10 parameters), others were more arbitrary set (8 parameters), and some of them were adjusted to match the experimental data closely (7 parameters). We indicated more clearly now in Supplementary Table 3 to which category each parameter value belongs in table. We determined the values of the model parameters through a manual exploration of the parameter space. One of the main reasons why we decided not to perform a fitting of the model to data obtained in this work is that the obtained parameters would not be informative (e.g., multiple combinations of parameters will lead to similar results). We agree with the reviewer that a direct quantitative comparison between model predictions and experimental data obtained by fitting would be nice. However, fitting the model to experimental data would be close to impossible computationally. This is in part because of the large number of states, but mainly due to the large number of APs that need to be simulated. Especially since the transients in our model have slow and fast parts (the decay of the residual Ca2+-transient, and the peak of the local Ca2+transient), the model is challenging to solve with ODE solvers available in Matlab, even when using a high-performance computer system optimized for parallel computation (32 cores). Moreover, fitting the model to experimental data would require the addition of extra assumptions and parameters to the model. As the experiments are performed using different samples, different parameter settings are probably required (e.g. it is likely that the number of release site or the fusion probability differs between cultured hippocampal neurons and hippocampal slices). Additionally, if we decide to fit the model, we would need to define a cost function (i.e., a quantitative measure of how well the model is fitting to experimental data), which requires us to determine the different weights the different experiments we are comparing our model predictions to have. The decision on how to weight the different types of data is very difficult (not to say arbitrary).

      Therefore, we constrained the parameter values in our model based on a manual (but systematic) exploration of the parameter space. The simulations of the model were evaluated based on the increase in the number of docked vesicles between 5 and 15 ms after AP stimulation (this should be as large as possible for the control and Doc2- model, and close to 0 for the syt7- model simulations), the peak release rates in response to the first AP (to be equal between all conditions), the ratio between the peak release rate of the 1st and 10th response (depressive phenotype should be more prominent in the syt7- model simulation and the least in the Doc2- simulation), and the amount of asynchronous release (syt7- and Doc2- simulations should have approximately half of the total amount of asynchronously released vesicles compared to the control simulations). Moreover, the parameter values for the calcium transient should be realistic. We do not know the exact parameter values of the calcium transient in the samples used in the experiments performed here, but previous studies have provided a range of realistic parameter values (Brenowitz and Regehr 2007, PMID: 17652580; Helmchen et al., 1998, PMID: 9138591; Sabatini and Regehr 1998, PMID: 9512051; Wang et al., 2008, PMID: 19118179). Furthermore, we decided to set the parameters describing calcium binding to syt7 and Doc2 to the same values, as the scope of the model was to investigate the role of syt7 and Doc2 in asynchronous release when they act on different steps in the reaction scheme. By using the same parameter values both proteins are identical except for their mechanism of action. We added this section to the methods of the manuscript.

      In the parameter space evaluation, we decided to vary parameters one-by-one or in pairs of two. We decided not to further extend the parameter space evaluation as it will be challenging to give a proper interpretation of these results, to visualize them, and to simulate it (computationally expensive).

      (7) The graphics, equations, and nomenclature all need some work. The equations aren't numbered or indexed, so I can't really refer to any of them in particular, but the symbols being used generally were not defined well enough for a naïve reader to follow. The 15 diffEQs compressed into a single expression at the bottom of page 19 are basically impenetrable. The 'equation' near the bottom of p. 20 is not an equation - it is a set of four symbols lacking a definition. The fusion rate equation (with f1 and f2 factors) isn't spelled out clearly enough (top of p. 20). Can fusion occur from any of the 45 docked states but just with a different probability? Or does fusion only occur from the 3 states where Doc2+Syt1 Ca occupancy = 5? The graphical representation of Syt7 occupancy and its effects in Fig S7 doesn't work well. Tons of color and detail but very hard to decipher and intuit what Syt7 is doing to the SV buried in the arrow lengths. And this is a crucial point of the paper - it really needs to shine through in this figure.

      We thank the reviewer for pointing out the unclarities in the description of the model. We have worked on improving this section. Specifically, we have improved the equations and now more clearly explain the symbols used in these equations. We have altered the graphical representation of the effect of calcium binding to syt7 on docking and undocking rates.

      (8) I would strongly recommend abandoning this large-scale soft modeling effort altogether, but if the authors feel that all the states and parameters are absolutely required, they need to justify this point, define all symbols systematically, number all equations, and provide some evidence of actual data fitting, systematic parameter space exploration, and more exposition of why they are making the various assumptions and constraints that were used to lower the number of free parameters. For instance, why are the tethering and untethering (or docking and undocking) rate constants set to equal each other? And why is it assumed that Syt7 enhances both the docking and undocking rates? Why is fusion set to occur as long as the sum of Syt1 and Doc2 calcium occupancy is exactly 5 regardless of the specific occupancy of either Syt1 or Doc2? Again probably quite important but unjustified physically. Given the efforts of this model to capture some sort of realistic calcium liganding by Syt1, Syt7, and Doc2, the model doesn't seem to take into account the copy number of each protein at a release site. Shouldn't it matter if there are 2 Syt7s vs 20 Syt7s? Or the stoichiometry between Doc2 and Syt1? Either this model assumes that there is exactly one copy of each protein at a release site or that all copies are always identically liganded and strictly act as a unit. Neither of these possibilities seems plausible.

      Despite the fact that this model (as all models) is a simplified version of reality and despite the fact that this model (as all models) has its limitations, we decided to keep the model in our work to illustrate that this well-defined hypothesis put forth in this paper is consistent with the experimental data. Again, we are not claiming that this model is the only one that may explain this, nor do we claim that we have uniquely identified its parameters. As indicated above, we worked on improving the description of the model in the methods and improved on our description of how the parameter values are constrained. For the reasons mentioned above (first and foremost because of infeasibility due to excessive computation time) we did not perform data fitting or changed the parameter space exploration. We would like to thank the reviewer for pointing out that some of the assumptions of the model are not well enough explained. We added an extra explanation of these assumptions to the main text.

      One of the assumptions we made, as the reviewer points out, is that the tethering and untethering and docking and undocking rates constants are set to equal each other. This is indeed an arbitrary assumption, with the main aim of reducing the number of free parameters in our model given that there is currently no experimental constraint on the relation between the two rate constants. We agree that this assumption is as good as any other, and we have pointed this out more clearly in the main text.

      In the model syt7 enhances both docking and undocking rates as we assumed it to function as a catalyst of the docking reaction. A catalyst lowers the energy barrier for the reaction and thereby promotes both forward and backward rates. One of the main reasons we decided on this is because in the model also syt1 and Doc2 are assumed to function by lowering the energy barrier for the fusion reaction. However, since fusion is irreversible this would only affect the forward reaction rate. We cannot exclude that syt7 acts on the forward rate only, which we now mention in the results section of the model.

      In our model fusion can occur from any possible docked SV state. The probability of fusion however increases the more calcium ions are bound to Doc2 or Syt1, with Syt1-bound to Calcium being more effective in promoting fusion. This structure matches the dual-sensor model proposed by Sun et al., 2007, Science (PMID: 18046404) and Kobbersmed et al. 2020, Elife (PMID: 32077852), and is based on the assumption that each protein bound to calcium lowers the energy barrier with a certain amount. We have explained this more in the results section of the model.

      We decided that syt1 and Doc2 together could have no more than five calcium ions bound to them. This is based on the idea that syt1 and Doc2 are competing for the same type of resources, which could for instance be a limited number of SNARE complexes that are available to execute the reaction. An indication for competition between the two proteins can be found in the synchronous release amplitudes after stimulus 2, which are larger in the Doc2KO.

      The reviewer rightfully points out that for realistic simulations of the role of syt1, syt7 and Doc2 the stoichiometry of these proteins at the release site is relevant. In the ideal scenario, we would have included this in our model. However, this would massively increase the possible number of states (which this reviewer criticizes already in our simpler model), making the model even more computationally expensive to run. Additionally, we currently have no reliable estimates of the number of syt7 and Doc2 molecules per release site. In our model, all syt1s expressed on an SV can bind up to five calcium ions. We have recently shown that this simplified model can capture the features of all syt1 proteins per vesicle that compete for the binding of three substrates on the plasma membrane to exert their function in speeding up fusion (Kobbersmed et al., 2022 eLife PMID: 35929728). This means that the copy number is indirectly covered in our model. This number of five calcium ions (and two for Doc2 and syt7) however is not based on the estimated number of syt1s on an SV (which would be around 15, Takamori 2006), but rather on the calcium-dependence of the fusion reaction. Similarly, the number of two calcium ions binding to Doc2 is based on the Calcium-dependence of asynchronous fusion rates (Sun et al., 2007). Based on the reviewer’s comment we now more explicitly mention in the text that the numbers of calcium ions binding to syt1, Doc2 and syt7 corresponds to the total number of calcium ions that can bind to each of these molecules per release site/SV.

      We again would like to thank the reviewer for asking us to improve the explanation on the assumptions made to construct our model and how we constrained the parameter values in our model.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are pleased to send you a revised version of our manuscript entitled “voyAGEr: free web interface for the analysis of age-related gene expression alterations in human tissues” and the associated shiny web app, in which we incorporate the referees’ feedback. We would like to express our gratitude for their time and valuable insights, which have contributed to the improvement of our work. We appreciate the rigorous evaluation process that eLife maintains.

      In this letter, we address each of the reviewers' comments and concerns, point-by-point, offering detailed responses and clarifications. We have made several revisions to our manuscript following their recommendations.

      We must note that the revised version of the manuscript has two novel joint first authors, Rita Martins-Silva and Alexandre Kaizeler, who performed all the requested reanalyses, given that the initial first author, Arthur Schneider, already left our lab. We must also point to the following minor unsolicited improvements we took the opportunity to make:

      • Added a comprehensive tutorial to the GitHub repository on how to navigate through voyAGEr’s features.

      • Implemented sample randomisation in the scatter plots depicting gene expression across the age axis to ensure data privacy.

      • Implemented minor adjustments within the web app to enhance user comprehension and clarity when visualizing the data.

      • Improved clarity of the methodological sections.

      Reviewer 1

      (1.1) While this may be obvious to others for some reason that escaped me, I was unsure what was the basis for the authors' choice of 16 years as the very specific sliding window size. If I'm not alone in this, it might add clarity for other readers and users if this parameter choice were explained and justified more explicitly.

      We apologise for our omission in providing the rationale behind our choice in the previous version. We chose 16 years as our sliding window size because this was the minimum needed to guarantee the presence of more than one sample per window, across all the tissues considered in the study (Figure R1 below).

      We added the following sentence to the manuscript (v. Methods, ShARP-LM):

      “This was the minimum age span needed to guarantee the presence of more than one sample per window, across all considered tissues.”

      (1.2) "In particular, tissue-specific periods of major transcriptional changes in the fifth and eighth decades of human lifespan have been revealed, reflecting the so-called digital aging and consistently with what is observed in mice" here I think that "consistently" should be "consistent".

      We thank the reviewer for the comment and following the suggestion, we have revised 'Consistently' to 'consistent' as it is the correct usage in our sentence.

      (1.3) "On a different note, sex biases have been reported in for the expression of SALL1 and KAL1 in adipose tissue and lung, respectively." Here I think that "in for" should be "in".

      As recommended by the reviewer, we have replaced ‘in for’ for ‘in’. As we substituted KAL1, the current sentence now stands as “On a different note, sex biases have been reported in the expression of SALL1 and DDX43 in adipose tissue and lung, respectively”.

      (1.4) "We downloaded the matrix with the RNA-seq read counts for each gene in each GTEx v7 sample from the project's data portal (https://www.gtexportal.org/)." In my pdf manuscript this hyperlink appears to be broken.

      We appreciate the reviewer's attention to the broken link, and we have rectified the issue. The link should now be fully operational, effectively directing users to the GTEx Portal.

      (1.5) Under methods, I might suggest "Development platform" or "Development platforms" over "Development's platform" as a heading.

      We have modified the heading of this section in the methods to 'Development Platforms', as we believe it better reflects the information conveyed.

      Reviewer 2

      (2.1) In this tool/resource paper, it is crucial that the data used is up-to-date to provide the most comprehensive and relevant information to users. However, the authors utilized GTEx v7, which is an outdated (2016) version of the dataset. It is worth noting that GTEx v8 includes over 940 individuals, representing a 35% increase in individuals, and a 50% increase in the total number of samples. The authors should check the newer versions of GTEx and update the data.

      When the development of the voyAGEr web application began, GTEx version 7 was the most up to date. Nevertheless, we agree that the version 8 offers a notably more extensive dataset, encompassing a larger number of individuals, samples, and introducing new tissues. Consequently, we have updated our application to incorporate the data from GTEx version 8.

      (2.2) The authors did not address any correction for batch effects or RNA integrity numbers, which are known to affect transcriptome profiles. For instance, our analysis of GTEx v8 Cortex tissue revealed that after filtering out lowly expressed genes, in the same way authors did, PC1 (which accounts for 24% of the variation) had a Spearman's correlation value of 0.48 (p<6.1e-16) with RNA integrity number.

      We acknowledge the validity of the reviewer’s comment and appreciate the importance of such corrections to enhancing data interpretation. In response, we conducted a thorough unbiased investigation into potential batch effects, with the COHORT variable emerging as the primary driver of those observed across most tissues. Furthermore, SMRIN (as the reviewer pointed), DTHHRDY, MHSMKYRS and the number of detected genes in each sample were consistently associated with the primary sources of variation. As a result, we implemented batch effect correction for those five conditions, in a tissue-specific manner.

      We provide a detailed explanation of the batch effect correction methodology and its importance in the biological interpretation of results in the Methods section, specifically under "Read count data pre-processing". Additionally, we have included two new supplementary figures, Sup. Figures 7 and 8, to illustrate a batch effect example in lung tissue and emphasise the critical role of this correction in data interpretation.

      (2.3) The data analyzed in the GTEx dataset is not filtered or corrected for the cause of death, which can range from violent and sudden deaths to slow deaths or cases requiring a ventilator. As a result, the data may not accurately represent healthy aging profiles but rather reflect changes in the transcriptome specific to certain diseases due to the age-related increase in disease risk. While the authors do acknowledge this limitation in the discussion, stating that it is not a healthy cohort and disease-specific analysis is not feasible due to the limited number of samples, it would be useful for users to have the option to analyze only cases of fast death, excluding ventilator cases and deaths due to disease. This is typically how GTEx data is utilized in aging studies. Alternatively, the authors should consider including the "cause of death" variable in the model.

      This comment is closely related to the prior discussion (point 2.2). Notably, two of the covariates selected for batch effect correction, namely, DTHHRDY (Death classification based on the 4-point Hardy Scale1) and COHORT (indicating whether the participant was a postmortem, organ, or surgical donor1), have a direct relevance to this issue, i.e., both relate to the cause of death of the individual.

      1 According to the nomenclature of variables described in https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/ GetListOfAllObjects.cgi?study_id=phs000424.v9.p2&object_type=variable

      We therefore effectively account for their influence on gene expression, mitigating these factors' impact.

      This approach represents a compromise, as it is practically infeasible to ascertain the absence of underlying health conditions in the remaining samples, even if only considering cases of “fast death”. Hence, we opted to keep all samples, independently of the cause of death of its donor, to dilute potential effects associated with individual causes of death.

      (2.4) The age distribution varies across tissues which may impact the results of the study. The authors' claim that age distribution does not affect the outcomes is inconclusive. Since the study aims to provide cross-tissue analysis, it is important to note that differing age distributions across tissues can influence the overall results. To address this, the authors should conduct downsampling to different age distributions across tissues and evaluate the level of tissue-specific or common changes that remain after the distributions are made similar.

      We acknowledge that variations in age distributions are evident across different tissues, with brain tissues displaying a notably pronounced disparity (green density lines in Figure R2 below).

      To address this issue comprehensively, we conducted tissue-specific downsampling, by reducing the number of samples in a given age window to the minimum available sample size within all age windows for a given tissue. The histograms (density plots) of the number of samples per age window of 16 years considered in the ShARP-LM model, as well as the minimum number of samples in each age window, per tissue are illustrated in Figure R1. After performing downsampling, we computed the logFC and p-value of differential expression for each gene, per age window, and compared them (for all genes in a given age window) with those involving all samples.

      Despite changes in logFC with downsampling, a considerable positive correlation is maintained (Figure R3, top panel). This suggests that the overall trends in gene expression changes persist. However, the downsampling process expectedly results in a decrease of statistical power within each age window concomitant with the decreased sample size, evident from the shift of genes from the third to the first quadrant in Figure R3, bottom panel. Consequently, we have opted for maintaining results encompassing all samples and removing the paragraph in the Discussion that asserted the absence of age distribution impact on the overall outcomes (“Indeed, we found no confounding between the distribution of samples’ ages and the trend of gene expression progression over age in any tissue.”), as we deem it inaccurate, potentially leading to misinterpretation. We have added a supplementary figure (Supplementary Figure 8, identical to Figure R3) illustrating the effect of downsampling, and the following paragraph to the manuscript’s Discussion section:

      “When downsampling to ensure a balanced age distribution, a loss of statistical power is apparent but a considerable positive correlation with the original results is maintained and a substantial number of significant alterations remain so (Supplementary Figure 8).”

      We acknowledge that this limitation can be addressed with the growing accumulation of human tissue transcriptomes in publicly available databases, a trend we anticipate in the near future. We are committed to promptly updating voyAGEr with any new data releases that may offer a solution to this concern.

      Nonetheless, we want to underscore, as the reviewer has astutely pointed out, that while voyAGEr can facilitate cross-tissue comparisons, it must be done with caution. In this regard, we inserted the following paragraph into the Discussion:

      “Due to the tissue-specific nature of the pre-processing steps (v. Read count data preprocessing in the Methods section), and given that most of the plotted gene expression distributions are centred and scaled by tissue, it is important to note that voyAGEr may not be always suited for direct comparisons between different tissues. For instance, it does not allow to directly ascertain if a gene exhibits different expression levels in different tissues or if the expression of a particular gene in one tissue changes more drastically with age than in another tissue.”

      (2.5) The GTEx resource is extremely valuable, however, it comes with challenges. GTEx contains tissue samples from the same individuals across different tissues, resulting in varying degrees of overlap in sample origin across tissues as not all tissues are collected for all individuals. This could affect the similar/different patterns observed across tissues. As this tool is meant for broader use by the community, it is crucial for the authors to either rule out this possibility by conducting a cross-tissue comparison using a non-parametric model that accounts for the dependency between samples from the same individual, or to provide information on the degree of similarity between samples so that the users can keep this possibility in mind when using the tool for hypothesis generation.

      We agree that the variable degrees of overlap between tissues (Figure R4) could lead to a confounding between trends in a population of common individuals and those associated with age. We therefore examined the contributions of variables 'donor,' 'tissue,' and 'age' to the overall variance in the data (Figure R5, panel A), having normalised the data collectively across all tissues. Tissue and donor contribute approximately 90% and 10% of the variance, respectively. Age exhibits minimal impact (around 1%), which may be attributed to the relative subtlety of its effects on gene expression and to the tissue specificity of ageing-associated changes. Notably, removing the 'donor' variable does not transfer this variance to 'age', suggesting a limited confounding between these variables (see Figure R5, panel B).

      We also specifically examined the pairs of tissues exhibiting the lowest (Brain Amygdala / Small Intestine), median (Pancreas / Heart Left Ventricle), and highest (Kidney Cortex / Muscle Skeletal) percentages of shared donors. We identified and selectively removed samples from shared donors while maintaining the original sample size imbalance between tissues. Subsequently, we calculated each gene’s mean expression within each age window from the ShARP-LM pipeline, followed by each gene’s Pearson’s correlation of expression between tissue pairs. The resulting coefficients, both with and without the removal of common donors, were compared in scatter plots (Figure R6, left plots). As this process inherently involves downsampling, which may impact results (v. comment 2.4), we performed additional downsampling by randomly removing samples from both tissues according to the proportions defined for the removal of common donors (Figure R6, right plots).

      In the chosen scenarios, we note a similar impact between the targeted removal of common donors and random downsampling. Nevertheless, the effects of removing samples may vary according to the absolute number of remaining samples. Consequently, singling out individual cases may not provide conclusive insights. To systematically address this, we represented all tissue pairs in a heatmap, colour-coded based on whether the removal of common donors is more impactful (red) or less impactful (blue) than random downsampling (Figure R7). The values depicted in the heatmap, denoted as the Impact of Common Donors (ICD), are computed for each tissue pair. This calculation involves several steps: first, we determined the absolute difference in Pearson’s correlation for each gene’s mean expression within each age window from the ShARP-LM pipeline, between the original data and the subset of data without common donors (DiffWoCD) or with random downsampling (DiffRD). Subsequently, the medians of DiffWoCD and DiffRD are computed, and the difference between these median values provides the ICD for each tissue pair. Due to the unidirectional nature of correlation (i.e., the results for tissue 1 vs tissue 2 mirror those for tissue 2 vs tissue 1), the resulting matrix is triangular in form.

      We have added a supplementary figure (Supplementary Figure 4, a composition of Figures R4-R7, together with a scatterplot relating the values of heatmaps R4 and R7) that aims to provide guidance to users when interpreting specific tissue pairs, acknowledging inherent limitations (refer to comment 2.4). We have also inserted the following paragraph into the manuscript’s Discussion section:

      “Furthermore, we must emphasise that the majority of GTEx donors contributed samples to multiple tissues (Supplementary Figure 4A), potentially introducing biases and confounders when comparing gene expression patterns between tissues. Our analyses of variance (Supplementary Figure 4B) and downsampling to control for common donors (Supplementary Figures 4C-E) suggest very limited global confounding between the impacts of donor and age on gene expression and that any potential cross-tissue bias not to depend much on the proportion of common donors (Supplementary Figure 4E). However, this effect must be taken into account when comparing specific pairs of tissues (e.g., Colon – Transverse and Whole Blood, Supplementary Figure 4D).”

      (2.6) The authors aimed to create an open-source and ever-evolving resource that could be adapted and improved with new functionality. However, this goal was only partially achieved. Although the code for the web app is open source, crucial components such as the statistical tests or the linear model are not included in the repository, limiting the tool's customizability and adaptability.

      We greatly appreciate the reviewer’s concern and share their commitment to maintaining the principles of openness, reproducibility, and adaptability for voyAGEr. voyAGEr was primarily designed as a visualisation tool, displaying pre-processed results, and indeed only the code for the Shiny app itself was accessible through the project's GitHub repository.

      To address this shortcoming, we have made the entire data preprocessing script publicly available in the GitHub repository of voyAGEr. This script encompasses, among others, filtration, normalisation, batch effect correction, the ShARP-LM pipeline and statistical tests employed, and module definition. Moreover, the web app itself offers functionality to export relevant plots and tables.

      (2.7) Furthermore, the authors' choice of visualization platform (R shiny) may not be the best fit for extensibility and open-source collaboration, as it lacks modularity. A more suitable alternative could be production-oriented platforms such as Flask or FastAPI.

      We appreciate this thoughtful concern. The decision to use Shiny was primarily driven by our data having already been prepared in the R environment during pre-processing steps. Consequently, and as the web app serves the purpose of visualisation only (and not data processing), Shiny is as a natural and convenient extension of our scripts, enabling data visualisation seamlessly.

      We acknowledge that Shiny may lack the modularity required for optimal open-source collaboration. While we recognise the merits of alternative platforms like Flask or FastAPI, we decided to keep Shiny because the current iteration of voyAGEr offers significant value to the community. Transitioning to a different platform would be a time-consuming endeavour, that would postpone the release of such resource.

      However, the reviewer’s feedback regarding modularity and open-source collaboration is duly noted and highly valuable. We will certainly take it into account when developing new web applications within our laboratory.

      (2.8) To facilitate collaboration and improve the tool's adaptability, data resulting from the preprocessing pipeline should be made publicly available. This would make it easier for others to contribute and extend the tool's functionality, ultimately enhancing its value for the scientific community.

      As outlined in point 2.6 of this rebuttal letter, certain metadata used in our analysis are subject to restricted access. To address this, we have taken several measures to foster transparency and reproducibility of our analyses. First, we have made the scripts for data pre-processing publicly available, along with a comprehensive explanation of our methodology within the main manuscript. This empowers users to replicate our analyses and provides a foundation for those interested in contributing to the tool's development. Furthermore, we have created new issues on voyAGEr’s GitHub repository, outlining novel features and improvements we envision for the application in the future. We actively encourage users to engage with this section.

      (2.9) It is unfortunate that the manuscript has no line numbers, which makes pointing out language issues or typos cumbersome. Below are some minor typos present in the current version mostly due to inconsistent usage of British vs US English, and the authors would be advised to do a thorough proofreading for the final submission.

      • Page 12: Inconsistent spelling of "analyzed" and "analysed". Should be "analyzed", since US English is used throughout the rest of the paper.

      • Page 14: "randomised"

      • Page 15: "emphasise"

      We apologise for it and include line numbers in the revised version. We have opted for British English and corrected the manuscript accordingly.

      (2.10) Some figures in the supplemental material have a low resolution (e.g. S. Fig 5). Especially figures that are not based on screenshots would ideally be of a higher resolution.

      As voyAGEr is designed as a web application for visualisation, it is inherent that some screenshots of the final resource may have lower resolutions. In response to this concern, we re-generated the figures in this manuscript with a resolution that maintains clarity and readability. We also recreated figures not derived from screenshots, further improving their resolution.

      We saved all figures in PDF format and are sending them together with this letter and the revised manuscript, to address any potential issues related to low-resolution figures that may occur during the export of the Word document.

      <(2.11) In Fig. 1 in the bottom row the sex labels are hard to see.

      We have adapted the figure to address this concern.

      (2.12) Math symbols and equations are not well formatted. For example, the GE equation on p. 13, or Oiij equation should be properly typeset. Also, the Oiij notation might be confusing, I believe the authors meant to use a capital "I", i.e. OI_ij.

      We have incorporated these recommendations into the revised manuscript.

      (2.13) The Readme file in the git repo is very short. It would be helpful to have build and run instructions.

      We have updated the README file in the GitHub repository, which now contains, among other features, instructions for launching the Shiny app and building the associated Docker image. Additionally, a simple tutorial has also been included to assist users in navigating through voyAGEr's functionalities.

      (2.14> "Module" tab's UI inconsistent to other tabs (i.e. "Gene" and "Tissue"), since it contains an "About" page. Adding the "About" page in the actual "Module" page might make the UI clearer.

      We believed that the Modules section, due to its distinct methodology, would benefit from an additional tab explaining its underlying rationale. We relate to the reviewer’s concern regarding the use of tabs throughout the application and made changes to the app in order to ensure consistency.

      (2.15) I would suggest changing the type of the article to "Tools and Resources".

      We agree and followed the reviewer’s suggestion.

      Reviewer 3

      (3.1) In the gene-centric analyses section of the result, to improve this manuscript and database, linear regression tests accounting for the entire range of age should be added. The authors' algorithm, ShARP-LM, tests locally within a 16-year window which makes it has lower power than the linear regression test with the whole ages. I suspect that the power reduction is strongly affected in the younger age range since a larger number of GTEx donors are enriched in old age. By adding the results from the lm tests, readers would gain more insight and evidence into how significantly their interest genes change with age.

      We are grateful for the reviewer's thoughtful and pertinent recommendation and have thus conducted linear regression tests covering the entire age range. The outcomes of these tests have been integrated into the web application, denoted by a dotted orange line on the 'Gene Expression Alterations Over Age' plots. Additionally, a summary of statistics of overall changes, encompassing pvalues, t-statistics, and logFC per year, has been included below the plot title. We have also updated the manuscript to include such changes (v. Methods, Gene-centric visualisation of tissue-specific expression changes across age):

      “We also applied a linear model across the entire age range, thereby providing users with more insight and supporting evidence into how a specific gene changes with age. For visualisation purposes, we incorporated a dashed orange line, with the logFC per year for the Age effect as slope, in the respective scatter plots (Figure 3B c). We depict the Sex effect therein by prominent dots on the average samples, with pink and blue denoting females and males, respectively.”

      Concerning the observation about the potential reduction in statistical power due to the limited number of samples in younger ages, we acknowledge its validity. Indeed, we have addressed this issue in the manuscript's Discussion (v. Supplementary Figure 6).

      (3.1) In line with the ShARP-LM test results, it is not clear which criterion was used to define the significant genes and the following enrichment analyses. I assume that the criterion is P < 0.05, but it should be clearly noted. Additionally, the authors should apply adjusted p-values for multiple-test correction. The ideal criterion is an adjusted P < 0.05. However, if none or only a handful of genes were found to be significant, the authors could relax the criteria, such as using a regular P < 0.01 or 0.05.

      We apologise for any confusion regarding the terminology "significant genes." Our choice to use nonadjusted p-values for determining the significance of gene expression changes with Age, Sex, and their interaction was deliberate, and we would like to clarify our reasoning:

      (1) In the "Gene" tab of the application, individual genes are examined. When users inquire about a specific gene, multiple-testing correction of the p-value does not apply.

      (2) In the "Tissue" tab, using adjusted p-values and a threshold of 0.05 yielded very few differentially expressed genes, limiting the utility of Peaks. Our objective therein is not to assess the significance of alterations in individual genes but to provide a metric for global alterations within a tissue. We then determine significance based on the False Discovery Rate (FDR), using the p-values as a nominal metric of gene expression alterations.

      To avoid using the concept of “differential expression”, commonly linked to significance, we now refer to 'altered genes' in both the manuscript and the app. For clarity and to align with voyAGEr's role as a hypothesis-generation tool, we define 'altered genes' as those with non-adjusted p-values < 0.01 or < 0.05, as discriminated in the Methods section.

      (3.3) In the gene-centric analyses section, authors should provide a full list of donor conditions and a summary table of conditions as supplementary.

      We appreciate the suggestion and we have now included a reference that directs readers to those data, alternatively to including this information as an additional supplementary table. We would like to emphasise that the web app includes information on donor conditions we hypothesise to affect gene expression.

      3.4) The tissue-specific assessment section has poor sub-titles. Every title has to contain information.

      We agree and revised the sub-titles to more accurately reflect the information conveyed in each corresponding section.

      (3.5) I have an issue understanding the meaning of NES from GSEA in the tissue-specific assessment section. The authors performed GSEA for the DEGs against the background genes ordered by tstatistics (from positive to negative) calculated from the linear model. I understand the p-value was two-tailed, which means that both positive and negative NES are meaningful as they represent up-regulated expression direction (positive coefficient) and down-regulated expression direction (negative coefficient) with age, respectively, within a window. However, in the GSEA section of Methods, authors were not fully elaborate on this directionality but stated, "The NES for each pathway was used in subsequent analyses as a metric of its over- or downrepresentation in the Peak". The authors should clearly elaborate on how to interpret the NES from their results.

      We added the following paragraph to the manuscript’s Methods section, in order to clarify the NES’ directionality:

      “We extracted the GSEA normalised enrichment score (NES), which represents the degree to which a certain gene set is overrepresented at the extreme ends of the ranked list of genes. A positive NES corresponds to the gene set’s overrepresentation amongst up-regulated genes within the age window, whereas a negative NES signifies its overrepresentation amongst down-regulated genes. The NES for each pathway was used in subsequent analyses as a metric of its up- or down-regulation in the Peak.”

      (3.6) In the Modules of co-expressed genes section, the authors did not explain how or why they selected the four tissues: brain, skeletal muscle, heart (left ventricle), and whole blood. This should be elaborated on.

      We apologise for not providing a detailed explanation for this selection. As the ‘Modules of coexpressed genes’ section was primarily intended as a proof of concept, we opted to include tissues for which we had a substantial number of samples available and availability of comprehensive cell type signatures, those being the tissues that met such criteria. Nonetheless, as the diversity of cell type signatures increases (e.g., through the increasing availability of scRNA-seq datasets), we plan to encompass a wider range of tissues in the near future. However, as this task is time-demanding and in order to avoid a substantial delay in the release of voyAGEr, we opted to approach this issue in the next version of the App and included a dedicated issue in the projects’ GitHub repository so that users can share their preferences of the next tissues to include.

      We also added a brief sentence in this regard to the Methods section of the manuscript:

      “The four tissues (Brain - Cortex, Muscle - Skeletal, Heart - Left Ventricle, and Whole Blood) covered by the Module section of voyAGEr were selected due to their relatively high sample sizes and availability of comprehensive cell type signatures. The increasing availability of human tissue scRNA-seq datasets (e.g., through the Human Cell Atlas) will allow future updates of voyAGEr to encompass a wider range of tissues.”

      (3.7) In the modules of the co-expressed genes section, the authors did not provide an explanation of the "diseases-manual" sub-tab of the "Pathway" tab of the voyAGEr tool. It would be helpful for readers to understand how the candidate disease list was prepared and what the results represent.

      We greatly appreciate the reviewer's feedback, and in response, we have restructured the 'Modules of co-expressed genes' method section to provide a more comprehensive explanation of the 'diseases' sub-section. To clarify, we obtained a curated set of diseases and their associated genes from DisGeNET v.7.0. We assessed the enrichment of modules in relation to these diseases through two methods: a manual approach utilising Fisher’s tests (i.e. comparing the genes of a given module with the genes associated with a given disease) and another through use of the disgenet2r package, employing the function disease_enrichment. Significance of these enrichments were determined by adjusting p-values using the Benjamini-Hochberg correction.

      (3.8) Most figures have low resolutions, and their fonts are too small to read.

      As already mentioned in issue 2.10, we have recreated all of the images with better resolution to enhance legibility. We also exported such figures in PDF, which we attach to this revision.

      (3.9) Authors used GTEx V7, which is not latest version. Although researchers have developed a huge amount of pipelines and tools for their research, most of them were neglected without a single update. I am sure many users, including myself, would appreciate it if the authors kept updating the database with GTEx V8 for the future version of the database.

      We express our gratitude to the reviewer for their valuable suggestion, and, as already explained in issue 2.1, we have incorporated GTEx V8 into voyAGEr.

      (3.10) I would like to have an option for downloading the results as a whole for gene, tissue, and coexpressed genes. This would be a great option for secondary analysis by users.

      The implementation of such feature would be a time-demanding endeavour that would delay the release of voyAGEr, and we therefore chose not to perform it for this version. However, we agree that it would be a good resource for secondary analyses and acknowledge the possibility of adding this feature in the future. For now, voyAGEr allows the user to download all plots and corresponding data.

      (3.11) How the orders of tissues in the heatmaps (both gene and tissue section) were determined? Did the authors apply hierarchical clustering? If not, I would recommend the authors perform the hierarchical clustering and add it to display the heatmap display.

      We apologise for the oversight in explaining the process behind determining the order of tissues. To clarify, we employed hierarchical clustering to establish the tissue order for visualisation within the app. Although the reviewer suggested adding a dendrogram to illustrate this clustering, we decided against it. The reason for such is that including a dendrogram, while informative, is not essential for the app's primary purpose.

      (3.12) I understand that this is a vast amount of work, but I hope that the authors can expand the coexpressed module analysis to include other tissues in the future version of the database.

      Knowing what co-expressed genes in line with aging are and their pathway and disease enrichments across tissues would be highly informative, and I'm sure many users, including myself, would greatly appreciate it. <br /> We express our gratitude to the reviewer for the valuable suggestion and for acknowledging the extensive effort required to incorporate new tissues into the module section. We completely agree that understanding co-expressed genes across the aging process is of significant value, and we are committed to the ongoing inclusion of additional tissues. As already stated in issue 3.6, comprehensive list of tissues slated for integration in future voyAGEr versions is readily available on voyAGEr’s GitHub repository.

      Author response image 1.

      Density plots (“smoothed” histograms) of the distribution of numbers of samples per moving age window for the ShARP-LM pipeline, categorised by tissue. The numerical value within each rectangle represents the minimum number of samples observed across all age windows for that particular tissue.

      Author response image 2.

      Density lines (“smoothed” histograms) of the distribution of the age of donors per tissue. As depicted in the chart, there are more samples for older ages, particularly of brain tissues.

      Author response image 3.

      Effect of downsampling in ShARP-LM results. A – Per tissue violin plots of gene-wide distributions of Pearson’s correlation coefficients between original and downsampled logFC values for the Age variable across age windows, with tissues coloured by and ordered by increasing percentage of downsampling-associated reduction in the number of samples. B – Density scatter plots of comparison of associated original and downsampled p-values for each tissue, coloured by the downsampling percentage in each age window, highlighting the low range of p-values (from 0 to 0.1). Despite changes in logFC with downsampling, a considerable correlation in significance is maintained, although downsampling naturally results in a loss of statistical power, evident by the shift of points towards the first quadrant (dashed lines: p-value = 0.05).

      Author response image 4.

      Heatmap depicting the percentage of common donors between pairs of tissues. A given square illustrates the percentage of all samples of tissue in the x axis (Tissue 1) that is in common with the tissue in the y axis (Tissue 2)

      Author response image 5.

      Assessment of the relative contributions of different sources to the dataset’s variance. A - tissue accounts for approximately 90% of the total variance, while donor contributes around 10%; age has a minimal impact (1%), likely due to the relative subtlety of its effects on gene expression and to the tissue specificity of ageing dynamics. B - Removal of the donor variable does not transfer variance to age, suggesting limited confounding between the two variables.

      Author response image 6.

      Impact of the relative proportion of common donors on gene expression correlation between tissue pairs. Panels A, B, and C showcase the tissue pairs with the highest (Muscle Skeletal / Kidney Cortex), median (Pancreas / Heart Left Ventricle), and lowest (Small Intestine / Brain Amygdala) percentages of common donors, respectively. The left panels illustrate gene-bygene Pearson’s correlations of gene expression between the two tissues, comparing the scenarios with (x-axis) and without (yaxis) the removal of common donors. The ri ght panels depict the same comparisons, but with random downsampling (y-axis) in both tissues based on the proportions defined for common donor removal. The depicted examples show that the outcomes are comparable when removing common donors or employing random downsampling.

      Author response image 7.

      Comparison of the impacts of removing common donor samples and random downsampling across tissue pairs. The heatmap is coloured based on whether the removal of common donors has a greater (red) or lesser impact (blue) than random downsampling. The values depicted in the heatmap, denoted as the Impact of Common Donors (ICD), are computed for each tissue pair. This calculation involves several steps: first, by determining the absolute difference in Pearson’s correlation for each gene’s mean expression within each age window from the ShARP-LM pipeline, between the original data and the subset of data without common donors (DiffWoCD) or with random downsampling (DiffRD). Subsequently, the medians of DiffWoCD and DiffRD are computed, and the difference between these median values provides the ICD for each tissue pair. Due to the unidirectional nature of correlation (i.e., the results for tissue 1 vs tissue 2 mirror those for tissue 2 vs tissue 1), the resulting matrix is triangular in form. Grey tiles denote NA values, i.e., where the tissue-tissue comparison does not have a meaning, namely self-self and between sex-specific tissues. Top right insert: density line (“smoothed” histogram) of all ICD values.

    1. Limits of Reconciliation# When we think about repair and reconciliation, many of us might wonder where there are limits. Are there wounds too big to be repaired? Are there evils too great to be forgiven? Is anyone ever totally beyond the pale of possible reconciliation? Is there a point of no return? One way to approach questions of this kind is to start from limit cases. That is, go to the farthest limit and see what we find there by way of a template, then work our way back toward the everyday. Let’s look at two contrasting limit cases: one where philosophers and cultural leaders declared that repairs were possible even after extreme wrongdoing, and one where the wrongdoers were declared unforgivable.1 Nuremberg Trials# After the defeat of Nazi Germany, prominent Nazi figures were put on trial in the Nuremberg Trials. These trials were a way of gathering and presenting evidence of the great evils done by the Nazis, and as a way of publicly punishing them. We could consider this as, in part, a large-scale public shaming of these specific Nazis and the larger Nazi movement. Some argued that there was no type of reconciliation or forgiveness possible given the crimes committed by the Nazis. Hannah Arendt argued that no possible punishment could ever be sufficient: The Nazi crimes, it seems to me, explode the limits of the law; and that is precisely what constitutes their monstrousness. For these crimes, no punishment is severe enough. It may well be essential to hang Göring, but it is totally inadequate.

      I think the Nuremberg Trials illustrate a critical boundary in the concept of reconciliation. In my view, they show that while legal justice is vital, it may not always provide complete closure or moral resolution, especially for vast atrocities. This challenges us to think deeply about the limits of forgiveness and justice.

    1. Harassment in social media contexts can be difficult to define, especially when the harassment pattern is created by a collective of seemingly unconnected people. Maybe each individual action can be read as unpleasant but technically okay. But taken together, all the instances of the pattern lead up to a level of harm done to the victim which can do real damage. Because social media spaces are to some extent private spaces, the moderators of those spaces can ask someone to leave if they wish. A Facebook group may have a ‘policy’ listed in the group info, which spells out the conditions under which a person might be blocked from the group. As a Facebook user, I could decide that I don’t like the way someone is posting on my wall; I could block them, with or without warning, much as if I were asking a guest to leave my house. In the next section, we will look in more detail about when harassment tactics get used; how they get justified, and what all this means in the context of social media.

      in my opinion, this detailed exploration of the nuanced nature of violence and harassment underscores the complexity of defining and addressing these issues within the frameworks of law and social norms. It leads me to think the fine line between permissible actions and those that cause harm, which totally broaden my views.

    2. You might remember from Chapter 14 that social contracts, whether literal or metaphorical, involve groups of people all accepting limits to their freedoms. Because of this, some philosophers say that a state or nation is, fundamentally, violent. Violence in this case refers to the way that individual Natural Rights and freedoms are violated by external social constraints. This kind of violence is considered to be legitimated by the agreement to the social contract. This might be easier to understand if you imagine a medical scenario. Say you have broken a bone and you are in pain. A doctor might say that the bone needs to be set; this will be painful, and kind of a forceful, “violent” action in which someone is interfering with your body in a painful way. So the doctor asks if you agree to let her set the bone. You agree, and so the doctor’s action is construed as being a legitimate interference with your body and your freedom. If someone randomly just walked up to you and started pulling at the injured limb, this unagreed violence would not be considered legitimate. Likewise, when medical practitioners interfere with a patient’s body in a way that is non-consensual or not what the patient agreed to, then the violence is considered illegitimate, or morally bad. We tend to think of violence as being another “normatively loaded” word, like authenticity. But where authenticity is usually loaded with a positive connotation–on the whole, people often value authenticity as a good thing–violence is loaded with a negative connotation. Yes, the doctor setting the bone is violent and invasive, but we don’t usually call this “violence” because it is considered to be a legitimate exercise of violence. Instead, we reserve the term “violence” mostly for describing forms of interference that we consider to be morally bad. 17.4.2. A Bit of History# In much of mainstream Western thought, the individual’s right to freedom is taken as a supreme moral good, and so anything that is viewed as an illegitimate interference with that individual freedom is considered violence or violation. In the founding of the United States, one thing on people’s minds was the way that in a Britain riddled with factions and disagreement, people of one subgroup could not speak freely when another subgroup was in power. This case was unusual because instead of one group being consistently dominant, the Catholic and Protestant communities alternated between being dominant and being oppressed, based on who was king or queen. So the United States wanted to reinforce what they saw as the value of individual freedoms by writing it into the formal, explicit part of our social contract. Thus, we got the famous First Amendment to the Constitution, saying that individuals’ right to freely express themselves in speech, in their religion, in their gatherings, and so on could not legally be interfered with. As a principle, the concept is pretty clear: let people do their thing. But we do still live in a society which does not permit total freedom to do whatever one wants, with no consequences. Some actions do too much damage, and would undermine the society of freedom, so those actions are written into the law (that is, proscribed) as a basis for reprisals. This happens a few ways: Some are proscribed as crimes that lead to arrest, trial, and possibly incarceration. Some are proscribed as concepts or categories of thing, which a person could use to take someone else to court. For example, copyright infringement doesn’t usually result in someone showing up to arrest and imprison in the States. But if someone believes their copyrights have been violated, they can sue the offending party for damages pay, etc. The concept of copyright is proscribed in law, so it forms the basis for such lawsuits. Beyond what is proscribed by law, there are plenty of other actions and behaviors we don’t want people to be doing in our society, but they are not such as should be written into law. I don’t want my friends to lie to me, generally speaking, but this is not against the law. It would be weird if it was! Plain old lying isn’t proscribed, but perjury is (lying under oath in a court of law). The protections of freedom in the First Amendment were designed to help articulate a separation between what we might not like (e.g., someone having a different faith, or someone lying) and what is actually damaging enough to warrant formal legal mechanisms for reprisal (e.g. perjury). The Catholics and the Protestants don’t need to like each other, but they have the right to coexist in this society regardless of which group currently has a monarch on the throne. 17.4.3. So what is harassment?# One useful way to think about harassment is that it is often a pattern of behavior that exploits the distinction between things that are legally proscribed and things that are hurtful, but not so harmful as to be explicitly prohibit by law given the protection of freedoms. Let’s use an example to clarify. Suppose it’s been raining all day, and as I walk down the sidewalk, a car drives by, spraying me with water from the road. This does not make me happy. It makes me uncomfortable, since my clothes are wet, and it could hurt me if wet clothes means I get so cold I become ill. Or it could hurt me if I were on my way to an important interview, for which I will now show up looking sloppy. But the car has done nothing wrong, from a legal standpoint. There is no legal basis for reprisals, and indeed it would seem quite ridiculous if I tried to prosecute someone for having splashed me by driving near me. In a shared world, we sometimes wind up in each others’ splash zones. Now, suppose it was more dramatic than that. Suppose the car had to really veer to spray me with the puddle, such that they could be described as driving recklessly, if anyone happened to be describing it. This is not the splash zone of regular living; it’s malice. But it’s still not illegal, nor the basis for legal action. Finally, suppose it’s not just one car. There is a whole caravan of cars. I recognize the drivers as classmates whom I don’t get along with. They have planned a coordinated strike, each driving through the puddles so fast I can’t hardly catch a breath between splashes. My bag is soaked; my laptop and phone permanently damaged. Since damaging someone else’s private property is proscribed, I could try to prosecute the drivers. I have no idea if this hypothetical case would get anywhere in a real court, but if I could get a judge onside, they might issue a fine, to be paid by the drivers, to answer for my damages (that is, to pay for the replacement of my private property which was destroyed, specifically my laptop and phone). At a guess, I would suspect that it would be very difficult to get anywhere with such a suit in court. Puddle-based harassment isn’t something that is recognized by law. This is what harassment does: it uses a pattern of minorly hurtful actions, so that the harasser can maintain plausible deniability about intent to harm, or at least, failing that, can avoid formal consequences. When harassment concepts get proscribed, this situation shifts. Think about employment law in the States. Depending on what State you’re in and what sector, employment law does not permit racial harassment in the workplace. This means that if you can show a pattern of repeating behavior which is hurtful and based on racially coded comments, then you might have a viable case for a racial harassment suit. (Practically, this probably doesn’t mean suing. It means notifying HR that you have evidence of the pattern and request that they take disciplinary action. What the law does is say that if the harassing party subsequently sues for something like wrongful termination, the company has a legal basis for construing your evidence as showing a pattern of harassment.) If there were a rise in, or a new recognition of, widespread and harmful puddle-based harassment, we might gather with activists and fight to get puddle-based harassment recognized by law, in order to reduce its occurrence. Not that this would be easy, but it would give us the legal basis for pressing charges when coordinated puddle-attacks occur. Getting the action proscribed by the law doesn’t stop people from taking that action. They are still free to puddle-splash at will. But there would be a possibility of consequences, should their pedestrian victims seek reprisal. Harassment is behavior which uses a pattern of actions which are permissible by law, but still hurtful. Variations: Where a relevant harassment definition exists in law, there can be legal consequences. Other institutions can also make their own harassment policies. The consequences would not arise at the legal level, but at the social level. Many universities have policies about sexual harassment which are much richer and more detailed than statutory law. If behavior is reported which is defined by the university policy as harassment, then they can issue consequences such as suspension of the student. Implicit policies can be implemented as well. I don’t have a formal harassment policy that I require my houseguests to sign before entering my home; but it is my home, and if they start behaving in ways that I consider problematic, I do have the right to kick them out of my house. Harassment in social media contexts can be difficult to define, especially when the harassment pattern is created by a collective of seemingly unconnected people. Maybe each individual action can be read as unpleasant but technically okay. But taken together, all the instances of the pattern lead up to a level of harm done to the victim which can do real damage. Because social media spaces are to some extent private spaces, the moderators of those spaces can ask someone to leave if they wish. A Facebook group may have a ‘policy’ listed in the group info, which spells out the conditions under which a person might be blocked from the group. As a Facebook user, I could decide that I don’t like the way someone is posting on my wall; I could block them, with or without warning, much as if I were asking a guest to leave my house. In the next section, we will look in more detail about when harassment tactics get used; how they get justified, and what all this means in the context of social media.

      The comparison of harassment to being sprayed by oncoming cars effectively emphasizes how difficult it is to identify and deal with harassment, particularly in social media settings where people's individual actions may appear harmless but can have detrimental effects when combined. In the same way that a homeowner may ask someone to leave their home if their behavior becomes undesirable, platforms must have clear policies and procedures in place to deal with such behavior.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02270

      Corresponding author(s): Usha Vijayraghavan

      General Statements

      We thank all three Reviewers for their thorough assessment of our manuscript and their constructive feedback and comments.

      Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1

      We are encouraged by the very positive comments made on the significance of our study that it provides convincing insights on alternative modes of nuclear positioning and division which is an important question in cell biology. We also took all possible suggestions to improve the interpretation of our results, have also added some newer data to address the constructive points raised by the reviewer.

      Major comments:

      1. A) I am concerned about the lethal phenotype caused by slu7 deprivation. Slu7 deficiency causes defective nuclear positioning at the bud in late G2. This phenotype per se should not cause defective mitosis, so slu7 deficiency may also be interfering with other aspects of mitosis which might indeed impinge on cell viability.

      Response: Our data indeed show Slu7 knockdown has severe growth defect when grown on non-permissive media (YPD) where a two-fold difference in O.D. was seen by 12 hours (Supplementary figure 2.B).

      We agree with the reviewer that defective mitosis, arises from several aspects of cell cycle including those in mitosis. The data we present show G2 arrest, small-budded cells with unsegregated nuclei and large-budded cells with segregated nuclei, all which do not progress through cell cycle phases and contribute to the severe growth defect. Further, GO enrichment analysis of deregulated pathways on knockdown of Slu7 support the above findings as various cell cycle related pathways are abnormal in their expression levels. In this study, we have focused on an in depth analysis of the role of Slu7 in a particular window and uncover how it controls nuclear position for progress G2-M phase cell cycle progression. The likely targets and mechanisms by which Slu7 regulates other phases of the cell cycle which needs similar other deeper investigations in future. Our detailed analysis of nuclear movement in Slu7 knockdown cells grown in YPD for 12 hours showed no nuclear movement (Supplementary figure 3B) which is the terminal phenotype. To examine events that lead to nuclear mispositioning phenotype we investigated the dividing slu7kd cells grown in non-permissive media for only 6 hours; under these conditions Slu7 protein is still detected at lower amount (Supplementary figure 1D). From the studies of nuclear position, mitotic spindle position and dynein distribution in mother and daughter cell, we propose that in the dividing cells, the nucleus does not experience enough force to move inside the daughter bud during mitosis. Further, we delineate the role of Slu7 in the splicing of transcripts for PAC1 encoding a protein whose homolog in S. cerevisiae has a proven role in nuclear migration. In live imaging of slu7kd cells that show nuclear segregation at the start of live imaging, new bud was not formed till the end of 60 minutes, implying that are arrested after transition to mitosis. We could speculate a role for Slu7 through regulation of genes involved in mitotic exit or cytokinesis.

      1. B) Supp. Fig4 shows defective mitosis in TBZ, so TBZ may be exacerbating defective mitosis of slu7-deficient cells.

      __Response: __Studies with yeast and mammalian model systems have revealed that the mobility and repair of damaged DNA are compromised upon disruption of microtubules (Wu et al, 2008; Chung et al, 2015; Lottersberger et al, 2015; Lawrimore et al, 2017; Oshidari et al, 2018; Laflamme et al, 2019). These data point to reasons why the mutants in DNA damage checkpoint genes are sensitive to TBZ. In this context, we observed that CnSlu7 knockdown is also sensitive to MMS stress (shown below). In addition, recent work on human Slu7 in Hela cell lines has elucidated the its role in the maintenance of genome integrity by preventing the formation of R-loops (Jiménez et al, 2019). We suggest that TBZ may exacerbate the defective mitosis of Slu7 depleted cells, however, whether it is particular only to mitosis or to the other cellular processes where the microtubules are involved needs further investigation.

      Throughout the figures it can be observed uneven chromosome/nuclear segregation in cells deprived of slu7, however, these mitotic defects have not been mentioned or explored in depth. From Supp Figure 3C it can be inferred that CENP-A segregation is uneven. Is this correct? Is CENP-A-GFP segregation normal?

      __Response: __ It should be noted that in Cryptococcus, the kinetochore remains unclustered during the early phase of cell cycle, cluster to a single punctum at the end of G2 phase and then de-cluster at the end of mitosis. Since this is a highly dynamic process, its technically challenging to measure the intensity CENP-A in mother and daughter cell. In the fixed cell imaging or live imaging data, there are no appreciable differences in intensity of the GFP signal of the tagged proteins (H4 and CENPA). The uneven chromosome/nuclear segregation observed in certain panels images presented are due to technical issues in that particular stack while generating the montage. This has been re-examined and we infer that there are no major differences in the signals from GFP-H4 and GFP - CENPA through mitosis.

      Additionally, taking the cue from the reviewer’s comment, we examined the likelihood of improper chromosome segregation by evaluating if there are any appreciable cell populations that are aneuploid. We revisited our flow cytometry data, we found no significant difference in the population of aneuploid cells between the knockdown strain and wildtype strain grown in non-permissive condition for 12 hours. This data was assessed again in new experiments where we also analyzed by flow cytometry the ipl1 mutant where aneuploidy is reported (Varshney et al, 2019). It has been reported in Cryptococcus neoformans that aneuploid cells are resistance to anti-fungal drug fluconazole. Preliminary experiments showed that slu7kd cells were sensitive to fluconazole and in this assay were similar to wildtype cells. Hence, we speculate that chromosome segregation is normal in Slu7 depleted cells.

      If chromosome segregation is altered upon slu7 deprivation, this might also explain the drop in cell viability and slow growth rates of this condition.

      __Response: __ From live microscopy imaging and flow cytometry data, we believe that the chromosome segregation is normal in Slu7 depleted cells. Dilution spotting in permissive media after growth in non-permissive media revealed that slu7kd cells resumed growth without losing viability, indicating the arrest phenotype associated with the depletion of Slu7 is largely reversible and does not cause chromosome mis-segregation (figure is now added to manuscript as supplementary figure 2D). Prolonged arrest at various cell cycle phase might lead to cell death and hence drop in cell viability.

      The manuscript will improve if authors analyse chromosome segregation for example, by showing time-lapse images of chromosome dynamics during mitosis.

      __Response: __Chromosome dynamics during the mitotic phase is given below. We observe that the chromosome segregation is equal in both mother and daughter bud. The uneven chromosome/nuclear segregation observed in certain panels images presented in original manuscript were due to technical issues while generating the montage.

      The authors perform an RNA seq comparing wild-type cells with slu7 deficiency and detect changes in gene expression, however, they do not explore from this data the percentage of un-spliced introns genome-wide which might be very informative, even more than changes in gene expression, which many of them, might be an indirect consequence of Slu7 deficiency. Authors should re-analyze the RNA seq data looking for unprocessed mRNAs and provide information about the overall impact of slu7 in intron processing.

      __Response: __ A very detailed bioinformatic analysis of the impact on slu7 on global transcriptome and splice pattern, is an ongoing study in the laboratory. The findings are indeed giving good leads which are being validated by further experiments using mini-gene exon-intron constructs. These studies are extensive and form a future manuscript identifying and characterizing intronic features which predispose an intron towards Slu7 dependency. Therefore, it falls outside the scope for this study on the cell biological role of Slu7 on mitosis, specifically nuclear position to ensure faithful mitotic segregation.

      Minor comments:

      __ __1. "Previous studies of slu7 mutants in S. cerevisiae and the conditional knockdown of its S. pombe homolog". Consider replacing homolog with Ortholog.

      Response: The suggestion is well taken, and the word “homolog” has been replaced with word “ortholog”.

      1. A) Taking these results together, we conclude that the inability of the conditional mutant to grow in the non-permissive media is due to impaired progression through the G2-M phase of the cell cycle. Is the G2/M delay the cause of the slow growth phenotype of the Slu7 deficiency?

      Response: From the live microscopy, we note that even when the budding index for mitosis has been reached the nucleus in slu7kd cells is still in the mother cell and spends more time here rather than reaching the bud or bud neck. We present G2/M delay as ONE of the reasons for the slow growth of Slu7 depleted cells. Although we have showed that Slu7 depletion does not activate MAD2 dependent Spindle Assembly Checkpoint, we have not investigated the activation of other cell cycle checkpoints such as G2 DNA damage checkpoint. These are potential new leads as we infer from our RNA seq datasets that CHK1, TEL1, BDR1 and RAD51 show increased expression in Slu7 knockdown condition when compared to wildtype. It is therefore reasonable to conclude that Slu7 might play a role at various cell cycle phases through direct or indirect effect on genes involved in these phases. Delayed positioning of the nucleus during G2/M is one of the major effects that is investigated in depth in this study.

      1. B) If so, growth defects of slu7 deficiency could be suppressed by ectopic expression of G2/M activators.

      Response: We have not tested this possibility, but we predict that expression of G2/M activators would at best offer only partial rescue the growth defect of Slu7 depleted cells since multiple pathways are adversely affected in cells depleted of Slu7.

      In this line of investigation, we have tested the consequences of PAC1 overexpression, as PAC1 expression levels and splicing are affected by loss of Slu7. We report a partial rescue of nuclear position defect during mitosis, yet these cells were arrested at cytokinesis. Further, the unavailability of an array of suitable auxotrophic (or other) markers in this model system makes it technically challenging to do rescue experiments by overexpression of multiple candidate downstream genes.

      Supp Figure 3C, remove the drawing on the right. Adjust times relative to panels.

      Response: The drawing has been removed and the time points have been adjusted.

      1. Tracking the nucleus in wild-type cells with a small bud showed that the nucleus moved into the daughter bud, divided into two, and one-half migrated to the mother bud (Supplementary Figure 3B, top row).

      Please replace the sentence: "one-half" with "one of the daughter nuclei". Additionally, as this nuclear positioning occurring during late mitosis is due to spindle elongation, I would not use the term migrated but "positioned" or "moved". Nuclear movement into the bud, which is referred to as "moved", can indeed be named "migrated".

      Response: The word “migrated” in the above sentence has been replaced with the word “moved”.

      1. Indicates in Figure 2B the marker used (GFP-H4), as in Fig Supp 3B.

      Response: The marker has been indicated in the figure.

      1. Nuclear division initiates in the bud, and one of the divided nuclei with segregated chromosomes migrates back to the mother cell (Figure 2B, top panel, wildtype, quantified in Figure 2C grey bar).

      As mentioned before, I would not name this, nuclear migration as it is the result of spindle elongation, and it can be confusing or misleading for non-expert readers.

      Response: The word “migrate” in the above sentence has been replaced with the word “move”.

      1. These two conclusions should be revised and described in temporal/sequential order.
      2. Thus, we identify that the depletion of CnSlu7 severely affects the temporal and spatial sequence of events during mitosis, particularly nuclear migration and division.
      3. Together, these results confirmed that without affecting the kinetochore clustering, depletion of Slu7 affects nuclear migration during the G2 to mitotic transition in Cryptococcus neoformans.

      Response: We thank the reviewer for bringing out the clarity in the concluding statements. These has now been revised to read as follows:

      “Together, these results confirm that without affecting the kinetochore clustering, depletion of Slu7 affects nuclear movement during the G2 to mitotic transition in Cryptococcus neoformans. Thus, we identify that the depletion of CnSlu7 severely affects the temporal and spatial sequence of events during mitosis, particularly nuclear migration, and division.”

      1. In slu7d cells, in cells with small buds, numerous cMTs were nucleated from the MTOCs, and as the cell cycle progressed, they organized to form the unipolar mitotic spindle (Figure 3A, slu7kd GFP-TUB1 panel, time point 55 mins).

      Please, revise whether the term unipolar mitotic spindle is correct here.

      Response: The word unipolar has been removed.

      1. I suggest including page and line numbers in the manuscript to facilitate revision.

      Response: We regret missing out this formatting guideline. The Page and line numbers have provided.

      Reviewer #2

      We are thankful by the very positive comments on the significance of our work, its novelty and findings being of broad interest to microbiology; splicing; cell cycle and cell division communities. We respond to all comments raised below.

      1. The authors test the Mad2-dependent spindle assembly checkpoint and show that it is not relevant for slu7-depletion. This is as expected if the defect is in nuclear positioning. They could test other checkpoint pathways that would monitor nuclear positioning in budding yeasts. Perhaps they have considered this: Bub2, Bfa1, Tem1, Lte1 mutants? I don't think this experiment is essential for publication, but it could strongly support their model.

      Response: We appreciate the comment on other checkpoints operating during mitosis. However, we have not done these experiments to examine role of components that arrest mitosis (Bub2, Tem1 etc.) in response to spindle or kinetochore damage. We hope the reviewer appreciates that this line of work would require the generation of bub2Δ strain and extensive characterization for their role in checkpoint in Cryptococcus before it can be brought into strains compromised for Slu7.

      __ Minor comments:__ 1. in Figure 3, Dyn1-GFP is imaged and in many of the cells in which Slu7 is depleted, nothing (or very little) can be seen. It is later argued that this is an indirect effect, due to defects in Pac1 and associated functions. Have the authors attempted a Dynein western blot (the 3xGFP tag should be quite sensitive)? It would be good to demonstrate that the Dynein motor complex hasn't simply fallen apart and Dynein been degraded in the slu7-depletion.

      Response: A study in S. cerevisiae has reported the dynein expression does not change in pac1Δ cells (Lee et al., 2003). Since the molecular weight of CnnDYN1 along with the tag is 630kDa, we did attempt the very challenging experiment of western blot to check for the expression levels this very large protein in wildtype and slu7kd cells. Based on the reviewer’s suggestion, we have attempted dot blot of protein lysates from wild type and from slu7kd cells probed with anti GFP antibody for estimating DYN-GFP levels. Untagged WT H99 strain was used as negative control. The same blot was stripped and re-probed for PSTAIRE which served as a loading control. This experiment revealed that dynein levels are same in both wildtype and slu7kd cells.

      in Figure 7: have any intronless genes been tested for rescue of the post-mitotic delay/arrest? This is not necessary for publication, but if any have been tested already, they could be listed here.

      Response: We have not tested intronless genes for their role in the rescue of post mitotic delay/arrest. From the RNA seq data, we observed that most of the genes involved in mitotic exit network (MEN) and cytokinesis were highly expressed in slu7kd cells as compared to the wildtype indicating and indirect role for Slu7 in their expression level. So, we had validated three candidates MOB2, CDC12 and DBF2 by qRT PCR (Supplementary 7.D) and found they were upregulated in slu7kd cells and hence speculate that deregulation of these transcript could contribute to the post mitotic arrest in slu7kd.

      In SFig2C legend make it clear that these cells are HU arrested at time zero. Are the cells in glucose or galactose during HU treatment.?

      Response: We regret the lack of clarity in the legend and the required details have been added. The cells were initially grown in non-permissive media for 2 hours to deplete Slu7 and then HU was added to the non-permissive media and the cell were allowed to grow for 4 hours.

      in SFig4, the TBZ sensitivity isn't very convincing as the slu7kd strain is struggling to grow at all on YPD.

      Response: We agree with the reviewer comment on the growth of slu7kd cells on media YPD containing TBZ. TBZ may exacerbate the defective mitosis of Slu7 depleted cells, however whether it pertains only to mitosis or any cellular processes where microtubules are involved requires further investigation.

      In SFig5 legend the volcano plot needs to be better explained. What are the dashed lines etc. ?

      Response: We regret missing these details on the volcano plot which has now been added to the legend.

      __Reviewer #3 __

      We appreciate the views that our work provides strong evidence to support out conclusions that Cryptococcus neoformans Slu7 controls mitotic progression by efficient splicing of cell cycle regulators and cytoskeletal elements. We have taken all comments of the reviewer into account to revise our manuscript with additional data, and by improving the presentation. The key additional data are summarized below.

      Major comments:

      1) The authors claimed that CnSlu7 is the most divergent among the fungal homologs and closer to its human counterpart (Fig. 1A, Supplementary Fig 1A). -Just based on the phylogenetic tree including limited members, as in Supplementary Fig. 1, it cannot be concluded that CnSlu7 is closer to its human counterpart since the basidiomycete yeast such as C. neoformans itself is more closely positions to humans compared to the ascomycete yeasts S. cerevisiae and Sch. pombe in phylogenetic tree analysis. It is strongly recommended to include other fungal species from the Basidiomycota, such as Ustilago maydis, in phylogenetic analysis in Supplementary Fig. 1. - Conservation analysis among diverse eukaryotes is more meaningful data that the conservation withing the fungi group, so that it is recommended that the data of Fig. 1 A would be replaced with the revised Supplementary Fig 1. -The analysis data on amino acid identities among Slu7 homologues should be presented to support the claim.

      Response: We agree with the reviewer that our data would be better served by an improved analysis of the phylogenetic relationship between various Slu7 homologs. We have therefore reconstructed the phylogenetic tree by including other fungal groups. This is presented here and also in the revised manuscript Supplementary Figure 1A. These data too, show that Cryptococcus (deneoformans and neoformans) Slu7 is the most diverged among its homologs from various fungal species with its closest homologs being other pathogens Puccinia graminis and Ustilago maydis.

      2) Despite that CnSlu7 is the main key subject, the comparative analysis of CnSlu7 to the previously reported Slu7 homologues, in the aspect of functional domain organization, is not provided in the present manuscript. - It was reported that Slu7 contains the four motifs that control its cellular localization and canonical function as a splicing factor, such as a nuclear location signal, a zinc knuckle motif, four stretches of leucine repeats and a lysine-rich domain. Notably, human Slu7 protein is 204 amino acids longer than S. cerevisiae homolog with only 24% identity in the zinc knuckle motif (Molecular Biology of the Cell Vol. 15, 3782-3795). Thus, it is strongly recommended to provide additional information on the conserved and diverged features of CnSlu7 compared to other Slu7 homologs as a part of revised Figure.

      Response: The multiple sequence alignment of Cryptococcus neoformans Slu7 with its fungal and higher eukaryote homologs such as human Slu7 and plant Slu7 proteins revealed that only the CCHC zinc finger motif is highly conserved. We do not detect conservation in the nuclear localization signal, stretch of leucine repeats and lysine rich domain except for leucine 3 stretch near the C terminal. This additional information is presented in revised Figure 1A.

      3) The manuscript clearly demonstrated that one of key targets of Slu7-mediated splicing is PAC1 in C. neoformans. Considering, Pac1 is also conserved from S. cerevisiae to human, it could be speculated that the defect of Slu7 can affect nuclear migration in other fungal species and human cells by inefficient splicing of PAC1, despite striking differences in their nuclear position during cell division. Please discuss this possibility or provide the qRT-PCR analysis data of PAC1 homologs in the available fungal Slu7 mutant strains.

      Response: Cell cycle arrest phenotypes of splicing factor mutants (studied largely in budding and fission yeast) results from inefficient pre-mRNA splicing of cell cycle-related genes. Slu7 is a well characterized second step splicing factor in S. cerevisiae where in vitro splicing assays with ACT1 minigene transcripts with a modified single intron showed ScSlu7 is dispensable for splicing when the branchpoint to 3'SS distance is less than seven nucleotides in the mini transcript (Brys and Schwer, 1996). In fission yeast we reported the effects of metabolic depletion of Slu7, which is an essential gene (Banerjee et al., 2013) and showed unexpectedly that in addition to BrP to 3'SS distance new intronic features contributors of dependency of fission yeast intron containing transcripts on Slu7 functions. The work also showed in multi-intronic transcripts its role is intron-specific and thus the candidate gene/ transcript is likely to be to dependent on Slu7 by virtue of the intronic features and not its biological function. In this study a splicing dependent role of CnSlu7 in cell cycle progression is investigated where based on a strong nuclear mis-positioning phenotype we narrowed on PAC1 transcripts as one of targets. We show PAC1, encoding a cytoskeletal factor, has introns dependent on CnSlu7 for efficient splicing and show partial rescue of nuclear position in strain complemented with expression of an intronless PAC1 gene. In this scenario, while it is likely that in other species where PAC1 exon-introns nucleotide sequences are similar to that in Cryptococcus a role for Slu7 may be predicted, for validation by other experimentalists.

      Interestingly, PAC1 in S. cerevisiae is an intronless gene and its homolog is not annotated in S. pombe. In human cell lines, knockdown of Slu7 by siRNA resulted in metaphase arrest by inefficient splicing of soronin – which is crucial in sister chromatid cohesion and correct spindle assembly, according to recent research in human cell lines (Jiménez et al., 2019).

      Hence the roles of splicing factor in cell cycle is through splicing of targets involved in cell cycle wherein the targets regulated by splicing factor may or may not be conserved in other species.

      Minor comments:

      General points 1) Provide information on the marker sizes in the data of qRT-PCR analysis presented in Figures 5 and 6, and Supplementary Fig 2A.

      Response: We regret the omission of this technical data and have corrected the same by providing the marker sizes in all the figures.

      2) Please unify the format of gene names. Some genes were written with superscript of "+", such as CLN1+ and PAC1+ in Fig. 4. What does "+" mean in the gene names?

      Response: We have taken the suggestion to carefully review the nomenclature of genes and their expressed transcripts as is typical for Cryptococcus neoformans. To depict the wildtype form of transcript we had used +. Thus CLN1+ was used to denote Cyclin 1 cellular transcript from expressed from its own locus without any modification of promoter or the intronic features.

      3) Supplementary Figure 1 C: Please correct "Slu7KD" 6 hrs YPD to "slu7kd" 6 hrs YPD.

      Response: This error has been corrected.

      4) Supplementary Figure 2A: What do "mRNA" and "No RT29X/", respectively, indicate?

      Response: The mRNA indicates the spliced form across any intron after intron is spliced out, so denotes exon-exon sequences in the mRNA. The reactions marked as “No RT 29 X” denote semi- quantitative PCR performed on DNase treated RNA sample, without reverse transcription to generate the cDNA. These reactions were done to confirm that there is no genomic DNA present in the RNA sample used for reverse transcription reaction of the cellular transcripts. Some of these details are now included in the Supp Fig 2A legend.

      5) Supplementary Figure 4C: Please provide brief explanation in the text on why the authors employed mad2Δ slu7kd cells.

      Response: In Page 8, line 6, we had provided the rationale for generating and studying mad2Δ slu7kd strain. This is recapitulated below:

      “To investigate whether Slu7 knockdown triggers the activation of spindle assembly checkpoint (SAC), we generated a strain with conditional slu7kd in cells with mad2Δ allele and the GFP-H4 nuclear marker.”

      6) Supplementary Figure 6D legend: Please correct the description of "slu7kd SH:Slu7 FL" from "expressing intronless PAC1" to "expressing full length of SLU7".

      Response: The error in the legend is regretted and this has been corrected.

      7) Supplementary Figure 7D: The authors confirmed that MOB2, CDC12, and DFB1 were expressed at higher levels in slu7kd when compared to wildtype. Please briefly explain in the text why the expression level of these genes in slu7kd was mentioned.

      Response: slu7kd cells expressing intronless Pac1 arrest post nuclear division. Revisiting our transcriptomic data, we found that genes involved in mitosis exit network and cytokinesis, such as DFB1, MOB2, CDC12, BUD4, and CHS2, were deregulated in slu7kd when compared to wildtype. We confirmed the same by performing qRT PCRs for three candidates, MOB2, DBF1 and CDC12 and that these transcript were expressed at high levels in knockdown when compared to wildtype.

      8) The species name should be written as abbreviation after the first mention. For example, please correct Cryptococcus neoformans to C. neoformans throughout manuscript.

      Response: The suggestion is well taken, and the required edits have been made throughout the text.

      9) Please unify the format of paper titles listed in References.

      Response: This formatting error is regretted and corrected to have all references in a single format.

      10) No page information for Hoffmann et al (2010) in References.

      Response: This omission is corrected.

      11) Update the information on the published journal of Chatterjee et al. (2021) in References.

      Response: This omission is regretted and is now corrected.

      12) Information on the authors, title, published journal and pages should be provided for the papers (Yadav and Sanyal, 2018; Sridhar et al., 2021) in Supplementary Table 1, which were not included in the main Reference list.

      Response: The references are now added to the main list.

      References used for addressing the reviewer’s comments:

      1. Chung DKC, Chan JNY, Strecker J, Zhang W, Ebrahimi-Ardebili S, Lu T, Abraham KJ, Durocher D, Mekhail K (2015) Perinuclear tethers license telomeric DSBs for a broad kinesin- and NPC-dependent DNA repair process. Nat Commun doi:10.1038/NCOMMS8742.
      2. Jiménez M, Urtasun R, Elizalde M, Azkona M, Latasa MU, Uriarte I, Arechederra M, Alignani D, Bárcena-Varela M, Alvarez-Sola G et al (2019) Splicing events in the control of genome integrity: Role of SLU7 and truncated SRSF3 proteins. Nucleic Acids Res 47: 3450–3466. doi:10.1093/nar/gkz014.
      3. Laflamme G, Sim S, Leary A, Pascariu M, Vogel J, D’Amours D (2019) Interphase Microtubules Safeguard Mitotic Progression by Suppressing an Aurora B-Dependent Arrest Induced by DNA Replication Stress. Cell Rep 26: 2875-2889.e3. doi:10.1016/J.CELREP.2019.02.051.
      4. Lawrimore J, Barry TM, Barry RM, York AC, Friedman B, Cook DM, Akialis K, Tyler J, Vasquez P, Yeh E et al (2017) Microtubule dynamics drive enhanced chromatin motion and mobilize telomeres in response to DNA damage. Mol Biol Cell 28: 1701–1711. doi:10.1091/MBC.E16-12-0846.
      5. Lee WL, Oberle JR, Cooper JA (2003) The role of the lissencephaly protein Pac1 during nuclear migration in budding yeast. J Cell Biol. doi:10.1083/jcb.200209022.
      6. Lottersberger F, Karssemeijer RA, Dimitrova N, De Lange T (2015) 53BP1 and the LINC Complex Promote Microtubule-Dependent DSB Mobility and DNA Repair. Cell 163: 880–893. doi:10.1016/J.CELL.2015.09.057.
      7. Oshidari R, Strecker J, Chung DKC, Abraham KJ, Chan JNY, Damaren CJ, Mekhail K (2018) Nuclear microtubule filaments mediate non-linear directional motion of chromatin and promote DNA repair. Nat Commun doi:10.1038/S41467-018-05009-7.
      8. Varshney N, Som S, Chatterjee S, Sridhar S, Bhattacharyya D, Paul R, Sanyal K (2019) Spatio-temporal regulation of nuclear division by Aurora B kinase Ipl1 in Cryptococcus neoformans. PLoS Genet doi:10.1371/journal.pgen.1007959.
      9. Wu G, Zhou L, Khidr L, Guo XE, Kim W, Lee YM, Krasieva T, Chen PL (2008) A novel role of the chromokinesin Kif4A in DNA damage response. Cell Cycle 7: 2013–2020. doi:10.4161/CC.7.13.6130.
    1. "I don't remember. Are you measuring yourself by that?""You waited six months, and you do too remember. And this is five months. And we're not measuring anything. William and I have known each other longer than five months, but we've been together - you know, as a couple - five months. And I'm almost twenty-three, which is two years older than Mom was. And don't tell me it was different when you guys did it.""No," he heard himself say. "It's pretty much the same, I imagine?"

      In this section, I noticed the dynamic both Ballinger and Melanie have is very odd, especially the way she assumes telling her father about her significant other when the truth is the contrary. By the beginning of the conversation, we can already see that Melanie and Ballinger have a close relationship with each other, but Melanie seems to have grown distant from her family and made some choices she knew her parents would not be proud of once she told them the truth. However, she has an idea; she can start the conversation by slowly introducing her father Coombs by talking a little bit more about him and quickly assuming they already had this conversation before which does not give her father the chance to think properly about the situation except to accept it. That strategy did not work for long once her father started to figure out how serious her relationship was with Coombs as well as how impossible it was to share Melanie his problems with his wife, Mary. Ballinger and Melanie seemed to have a close relationship with each other, but this situation may drift the, away for a long time.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      • A summary of what the authors were trying to achieve.

      The authors cultured pre- and Post-vaccine PBMCs with overlapping peptides encoding S protein in the presence of IL-2, IL-7, and IL-15 for 10 days, and extensively analyzed the T cells expanded during the culture; by including scRNAseq, scTCRseq, and examination of reporter cell lines expressing the dominant TCRs. They were able to identify 78 S epitopes with HLA restrictions (by itself represents a major achievement) together with their subset, based on their transcriptional profiling. By comparing T cell clonotypes between pre- and post-vaccination samples, they showed that a majority of pre-existing S-reactive CD4+ T cell clones did not expand by vaccinations. Thus, the authors concluded that highly-responding S-reactive T cells were established by vaccination from rare clonotypes.

      • An account of the major strengths and weaknesses of the methods and results.

      Strengths

      • Selection of 4 "Ab sustainers" and 4 "Ab decliners" from 43 subjects who received two shots of mRNA vaccinations.

      • Identification of S epitopes of T cells together with their transcriptional profiling. This allowed the authors to compare the dominant subsets between sustainers and decliners.

      Weaknesses

      • Fig. 3 provides the epitopes, and the type of T cells, yet the composition of subsets per subject was not provided. It is possible that only one subject out of 4 sustainers expressed many Tfh clonotypes and explained the majority of Tfh clonotypes in the sustainer group. To exclude this possibility, the data on the composition of the T cell subset per subject (all 8 subjects) should be provided.

      In accordance with the reviewer’s suggestion, we provided the composition of the T cell subset per subject (all 8 subjects) in the revised manuscript (shown below).

      Author response image 1.

      • S-specific T cells were obtained after a 10-day culture with peptides in the presence of multiple cytokines. This strategy tends to increase a background unrelated to S protein. Another shortcoming of this strategy is the selection of only T cells amenable to cell proliferation. This strategy will miss anergic or less-responsive T cells and thus create a bias in the assessment of S-reactive T cell subsets. This limitation should be described in the Discussion.

      We thank the reviewer for raising the question related to our experimental strategy. We chose this method because a background unrelated to S protein was lower than widely used AIM methods, which is verified by reconstituting many TCRs and testing the responses in vitro. One more reason is this method can identify S-reactive functional (proliferative) T cell clonotypes than anergic or less-responsive T cells as the reviewer mentioned, which is our objective in this study. In accordance with the reviewer’s suggestion, we have carefully described our limitation and rationale of our experimental strategy in the revised manuscript.

      • Fig. 5 shows the epitopes and the type of T cells present at baseline. Do they react to HCoV-derived peptides? I guess not, as it is not clearly described. If the authors have the data, it should be provided.

      As the reviewer mentioned, the pre-existing highly expanded clonotypes that we analyzed did not react to HCoV-derived peptides. After we determined the epitopes of the clonotypes, the S peptide sequences were analyzed for homology in HCoVs. The only two clonotypes whose epitope sequences were relatively conserved in HCoV strains (clonotypes #8-pre_9 and #8-pre_10) were tested for their reactivity to the similar HCoV epitope counterparts, but no activation was observed (shown below). We added these data in the revised manuscript.

      Author response image 2.

      • As the authors discussed (L172), pre-existing S-reactive T cells were of low affinity. The raw flow data, as shown in Fig. S3, for pre-existing T cells may help discuss this aspect.

      As the reviewer mentioned, some pre-existing S-reactive T cells might appear to react with S peptides judging from the NFAT-GFP expression of their reporter cell lines. However, the percentage of GFP-expressing cells is affected by many factors such as TCR expression level and HLA molecule expression level. Thus, the affinity of pre-existing S-reactive T cells was not fully deduced from the activation of reporter cell lines as shown in Fig. S3 in the present manuscript. We thank the reviewer for this constructive suggestion, but we therefore decided not to use these data quantitatively to evaluate affinity in this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      A short-term comparison of durability of S antibody levels after 2-dose vaccination, showing that better or more poorly sustained responses correlate with the presence of Tfh cells.

      Strengths:

      Novelty of approach in expanding, sequencing and expressing TCRs for functional studies from the implicated populations.

      Weaknesses:

      Somewhat outdated question, short timeline, small numbers, over-interpretation of sequence homology data

      Reviewer #2 (Recommendations For The Authors):

      In line with my above comments, it might be useful for the authors to look at moderating some of the assertions in what is a rather small-scale descriptive account of correlates of some quite nuanced, short-term, S antibody response differences

      We clearly described that some homologous microbe-derived peptides were indeed recognized by S-reactive T cells. Also, we have removed our overstatement from the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to investigate the relationship between anti-S protein antibody titers with the phenotypes&clonotypes of S-protein-specific T cells, in people who receive SARS-CoV2 mRNA vaccines. To do this, the paper recruited a cohort of Covid-19 naive individuals who received the SARS-CoV2 mRNA vaccines and collected sera and PBMCs samples at different timepoints. Then they mainly generate three sets of data: 1). Anti-S protein antibody titers on all timepoints. 2) Single-cell RNAseq/TCRseq dataset for divided T cells after stimulation by S-protein for 10 days. 3) Corresponding epitopes for each expanded TCR clones. After analyzing these results, the paper reports two major findings & claims: A) Individuals having sustained anti-S protein antibody response also have more so-called Tfh cells in their single-cell dataset, which suggests Tfh-polarization of S-specific T cells can be a marker to predict the longevity of anti-S antibody. B). S-reactive T cells do exist before the vaccination, but they seem to be unable to respond to Covid-19 vaccination properly.

      The paper's strength is it uses a very systemic and thorough strategy trying to dissect the relationship between antibody titers, T cell phenotypes, TCR clonotypes and corresponding epitopes, and indeed it reports several interesting findings about the relationship of Tfh/sustained antibody and about the S-reactive clones that exist before the vaccination. However, the main weakness is these interesting claims are not sufficiently supported by the evidence presented in this paper. I have the following major concerns:

      (1) The biggest claim of the paper, which is the acquisition of S-specific Tfh clonotypes is associated with the longevity of anti-S antibodies, should be based on proper statistical analysis rather than just a UMAP as in Fig2 C, E, F. The paper only shows the pooled result, but it looks like most of the so-called Tfh cells come from a single donor #27. If separating each of the 4 decliners and sustainers and presenting their Tfh% in total CD4+ T cells respectively, will it statistically have a significant difference between those decliners and sustainers? I want to emphasize that solid scientific conclusions need to be drawn based on proper sample size and statistical analysis.

      In accordance with the reviewer’s request, we have also analyzed the T cells separately (shown below). We observed the average frequency was much lower in decliners than sustainers, while the difference did not reach statistical significance partly because of the large deviation due to one sustainer (#27) who possessed quite a high Tfh%. We modified our description in the revised manuscript.

      Author response image 3.

      (2) The paper does not provide any information to justify its cell annotation as presented in Fig 2B, 4A. Moreover, in my opinion, it is strange to see that there are two clusters of cells sit on both the left and right side of UMAP in Fig2B but both are annotated as CD4 Tcm and Tem. Also Tfh and Treg belong to a same cluster in Fig 2B but they should have very distinct transcriptomes and should be separated nicely. Therefore I believe the paper can be more convincing if it can present more information and discussion about the basis for its cell annotation.

      We agree with the reviewer’s concern. Since antigen stimulation only induced the proliferation of antigen-specific T cells, the multiple clusters were mostly due to the fluctuation of cell cyclerelated genes. We therefore carefully and manually annotated these clusters by selecting the cell type-related genes (Kaech et al, Nat. Rev. Immunol., 2002; Sallusto et al, Annu Rev Immunol., 2004) and determined their subsets regardless of the automatic clustering based on the whole transcriptome. Indeed, antigen-responded Tfh and Treg are close, as ICOS and PDCD1 are expressed. We mainly used IL21 and FOXP3 to distinguish the Tfh and Treg populations, respectively. We thank the reviewer for pointing out this important process that we carefully addressed. We added the description of annotation methods to the revised manuscript.

      (3) Line 103-104, the paper claims that the Tfh cluster likely comes from cTfh cells. However considering the cells have been cultured/stimulated for 10 days, cTfh cells might lose all Tfh features after such culture. To my best knowledge there is no literature to support the notion that cTfh cells after stimulated in vitro for 10 days (also in the presence of IL2, IL7 and IL15), can still retain a Tfh phenotype after 10 days. It is possible that what actually happens is, instead of having more S-specific cTfh cells before the cell culture, the sustainers' PBMC can create an environment that favors the Tfh cell differentiation (such as express more pro-Tfh cytokines/co-stimulations). Thus after 10-days culture, there are more Tfh-like cells detected in the sustainers. The paper may need to include more evidence to support cTfh cells can retain Tfh features after 10-days' culture.

      We thank the reviewer for raising this important issue. As the reviewer pointed out, culturing T cells for 10 days indeed changed the repertoire and features, so the Tfh clonotypes we detected after the expansion may not correspond to the cTfh clonotypes in vivo. Because our observation and analysis were mostly based on the dominant T cell clonotypes expanded in vitro, we modified our description and conclusion accordingly in the revised manuscript.

      (4) It is in my opinion inaccurate to use cell number in Fig4B to determine whether such clone expands or not, given that the cell number can be affected by many factors like the input number, the stimulation quality and the PBMC sample quality. A more proper analysis should be considered by calculating the relative abundance of each TCR clone in total CD4 T cells in each timepoint.

      We thank the reviewer for pointing out our inaccuracy. As the reviewer suggested, we used percentages to demonstrate the relative abundance of each clonotype in Fig. 4B of the revised manuscript.

      (5) It is well-appreciated to express each TCR in cell line and to determine the epitopes. However, the author needs to make very sure that this analysis is performed correctly because a large body of conclusions of the paper are based on such epitope analysis. However, I notice something strange (maybe I am wrong) but for example, Table 4 donor #8 clonotype post_6 and _7, these two clonotypes have exactly the same TRAV5 and TRAJ5 usage. Because alpha chain don't have a D region, in theory these clonotypes, if have the same VJ usage, they should have the same alpha chain CDR3 sequences, however, in the table they have very different CDR3α aa sequences. I wish the author could double check their analysis and I apologize in advance if I raise such questions based on wrong knowledge.

      We thank the reviewer for carefully reading our manuscript. Although the two clonotypes, donor #8 clonotype post_6 and _7, have the exactly same TRAV5 and TRAJ5 usage, they have different CDR3a aa sequences due to random nucleotide addition in the rearrangement. Likewise, donor #27 clonotype post_1 and donor #13 clonotype post_15 had the same TRAV9-2 and TRAJ17 usage but different CDR3a.

      Reviewer #3 (Recommendations For The Authors):

      (1) Related to my public review 1. To make a solid conclusion, I think the author can include more sustainers and decliners if possible, can just stimulate their PBMCs for 10 days and check the Tfh features in proliferated CD4 T cells (e.g. IL21 secretion, PD-1 expression etc). And then compare these values in sustainers vs decliners

      We thank the reviewer for the suggestion. Unfortunately, additional PBMCs from more sustainers and decliners are not available to us. Instead, we carefully described the current observation in the revised manuscript.

      (2) Related to my public review 3. The author can attempt to sort CXCR5+ cTfh and CXCR5- non cTfh, stimulate in vitro for 10 days and compare whether the stimulated cTfh still have more Tfh-related features such as increased IL- 21 secretion.

      As the reviewer recommended, sorting and culturing the cTfh and non cTfh separately will clarify this issue. Due to the limitation of the samples, we could not perform these experiments.

      (3) I couldn't find information about the availability of data and code to analyze the single cell RNA-seq dataset in the manuscript

      We clarified the availability of data and added the codes for the single cell RNA-seq dataset in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans. In this manuscript, the authors generated TRIP13 null mice and Flag-tagged TRIP13 knock-in mice to study its role in meiosis. They demonstrate that TRIP13 regulates MORMA domain proteins and is essential for meiotic completion and fertility. The main impact of this manuscript is its clarification of the in vivo function of TRIP13 during mouse meiosis and its previously unrecognized role as a dose-sensitive regulator of meiosis.

      Strengths:

      Two previously reported Trip13 mutations in mice are both hypomorphic alleles with distinct phenotypes, precluding a conclusion on its function. This study for the first time generated the TRIP13 null mice, definitively revealing the function of TRIP13 in meiosis. The authors also show the novel localization of TRIP13 at SC and its independence from the axial element components. The finding of dose-sensitive regulation of meiosis by TRIP13 has implications in understanding human meiosis and disease phenotypes.

      Weaknesses:

      This manuscript would be more impactful if more mechanistic advancements could be made. For example, the authors could follow up with one of the new interactors identified by MS to offer new insight into the molecular function of TRIP13.

      We agree that it would be interesting to follow up on new candidate interactors but think that it would be more feasible to follow up on them in future studies.

      Reviewer #2 (Public Review):

      Summary and Strengths:

      In this manuscript, Chotiner and colleagues demonstrated the localization of TRIP13 and clarified the phenotypes of Trip13-null mice in mouse meiosis. The meiotic phenotypes of Trip13 have been well characterized using the hypomorph alleles in the literature. However, the null phenotypes have not been examined, and the localization of TRIP13 was not clearly demonstrated. The study fills these important knowledge gaps in the field. The demonstration of TRIP13 localization to SC in mice provides an explanation of how HOMRA domain proteins are evicted from SC in diverse organisms. This conclusion was confirmed in both IF and TRIP13-tagged Tg mice. Further, the phenotypes of Trip13-null mice are very clear. The manuscript is well crafted, and the discussion section is well organized and comprehends the topic in the field. All in all, the manuscript will provide important knowledge in the field of meiosis.

      Weaknesses:

      The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. However, the authors did not examine meiotic recombination in the Trip13-null mice.

      Meiotic recombination was extensively characterized in Trip13 severe hypomorph mutants in two previous studies: gamma-H2AX, BLM, BRCA1, ATR, RPA, RAD51, DMC1, MLH1 (Li and Schimenti, 2007; Roig et al., 2010). All the meiotic defects in our Trip13-null mice were also present in Trip13 severe hypermorph mutants: meiotic arrest, defects in chromosomal synapsis, asynapsis at chromosomal ends, and accumulation of HORMAD1/2 on the SC axis. Therefore, the defects in meiotic recombination in Trip13-null mice are expected to be similar to those in Trip13 severe hypermorph mutants and thus we did not examine the proteins involved in meiotic recombination in the Trip13-null mutant.

      Reviewer #3 (Public Review):

      Summary:

      The authors perform a thorough examination of the phenotypes of a newly generated Trip13 null allele in mice, noting defects in chromosome synapsis and impact on localization of other key proteins (namely HORMADs) on meiotic chromosomes. The vast majority of data confirms observations of several prior studies of Trip13 alleles (moderate and severe hypomorphs). The original or primary aims of the study aren't clear, but it can be assumed that the authors wanted to better study the role of this protein in evicting HORMADs upon synapsis by studying phenotypes of mutants and better characterizing TRIP13 localization data (which they find localizes to the central element of synapsed chromosomes using a new epitope-tagged allele). Their data confirm prior reports and are consistent with localization data of the orthologous Pch2 protein in many other organisms.

      Strengths:

      The quality of data is high. Probably the most important data the authors find is that TRIP13 is localized along the CE of synapsed chromosomes. However, this was not unexpected because PCH2 is also similarly localized. Also, the authors use a clear null (deletion allele), whereas prior studies used hypomorphs.

      Weaknesses:

      There is limited new data; most are confirmatory or expected (i.e., SC localization), and thus the impact of this report is not high. The claim that TRIP13 "functions as a dosage-sensitive regulator of meiosis" is exaggerated in my opinion. Indeed, the authors make the observation that hets have a phenotype, but numerous genes have haploinsufficient phenotypes. In my opinion, it is a leap to extrapolate this to infer that TRIP13 is a "regulator" of meiosis. What is the definition of a meiosis regulator? Is it at the apex of the meiosis process, or is it a crucial cog of any aspect of meiosis?

      TRIP13 is not haploinsufficient, as Trip13 heterozygotes were still viable and fertile (albeit with defects in meiosis). TRIP13 is an ATPase and changes the conformation of meiosis-specific proteins such as HORMAD proteins. TRIP13 is essential for meiosis and its mutations cause defects in both meiotic recombination and chromosomal synapsis. Reviewer 1 stated that “TRIP13/Pch2 is a conserved essential regulator of meiotic recombination from yeast to humans”. Therefore, we feel that TRIP13 can be called a regulator of meiosis.

      Reviewer #1 (Recommendations For The Authors):

      A schematic illustration of SC structure, the components involved, and the main finding, would be helpful for readers to better understand the advancement made by this study.

      We have now added a schematic illustration in a new panel - Figure 7C.

      Fig. 1B, the stage with diplotene cells should be XII.

      The pachytene cells (Pac) were mis-labelled as diplotene cells. Corrected.

      Fig. 1C, color mislabeled.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript will provide important knowledge in the field of meiosis. I support the publication of this study. I have some suggestions to improve and polish the manuscript.

      Major points:

      (1) The heterozygous phenotypes demonstrate that TRIP13 is a dosage-sensitive regulator of meiosis. In relation to this conclusion, as summarized in the discussion section, other mutants defective in meiotic recombination showed dosage-sensitive phenotypes. Given the function of HORMAD1 in meiotic recombination, it would be informative if the authors could examine how major makers of meiotic recombination behave in Trip13-null meiosis.

      Please see our response to Weaknesses from Reviewer #2.

      (2) Relating to the above point, the complete lack of synapsis on the sex chromosomes in the Trip13-null meiosis is impressive. This result raises a question as to whether the pathway to designate XY-obligatory crossover (which can be detected with large foci of ANKRD31 and MEI4/REC114 at PAR) is affected or not. It would be interesting to examine whether the ANKRD31 and MEI4/REC114 foci are present on PAR in Trip13-null meiosis.

      We have performed immunofluorescent analysis of REC114 in spermatocytes. In Trip13-null pachytene-like spermatocytes, X and Y chromosomes are not synapsed. REC114 still formed one focus each on the unsynapsed X and Y chromosomes. We have added this new data in the Results as a new supplementary figure (Figure 4 -supplement 1).

      (3) Figure 4 can be improved if there are quantified data for each phenotype. These phenotypes look nearly complete, but it would be informative to show the penetrance of these phenotypes.

      Because some chromosomes have unsynapsed ends, resulting in two centromere or telomere foci, the total number of centromere or telomere foci is always higher in Trip13-null pachytene-like spermatocytes than wild type pachytene spermatocytes. Therefore, we did not count the foci of centromeres and telomeres. Consistently, the centromere and telomere markers localized as expected in both wild type and Trip13-null spermatocytes.

      (4) I am not fully convinced by these photos: "synapsed sister chromatids (Figure 6B)" and "Sycp2-/- spermatocytes formed short stretches of synapsis (Figure 6C)". The authors may try confocal microscopy with super-resolution deconvolution as they did for other data.

      These have been previously demonstrated. The “synapsed sister chromatids (Figure 6B)” were previously demonstrated by confocal microscopy with super-resolution deconvolution (Guan et al., 2020). The short stretches of synapsis in Sycp2-/- spermatocytes was previously demonstrated by electron microscopy (Tripartite SC structure) and SYCP1 immunofluorescence (Yang et al., 2006). We have revised the text by citing the previous evidence and the publications.

      Minor points:

      (1) Line 19-21: "Loss of TRIP13 leads to meiotic arrest and thus sterility in both sexes. Trip13-null meiocytes exhibit abnormal persistence of HORMAD1 and HOMRAD2 on synapsed SC". These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles. This information can be added to the abstract. Otherwise, it sounds like these are totally new findings, as written.

      This information is now added to the abstract: “These findings confirm the previously reported phenotypes of the Trip13 hypomorph alleles.”

      (2) The introduction section seems too long and contains unnecessary information. Some molecular details that are not touched in the result section can be deleted (e.g., Line 65-73).

      We would like to keep the molecular details on the two conformation states, as it provides biochemical background on TRIP13-HORMAD interactions.

      (3) Introduction, Line 92. A rationale can be added as to why the authors characterized the Trip13-null allele.

      a rationale has been added as follows: “To determine the effect of complete loss of TRIP13, we characterized Trip13-null mice.”

      (4) Line 205: Typo "TRRIP13". Corrected.

      Reviewer #3 (Recommendations For The Authors):

      Just a few recommendations:

      (1) In my opinion, the title is an overreach. "Regulator" invokes other concepts such as transcription factors.

      Please see our explanation in response to weaknesses from Reviewer #3.

      (2) The first sentence of the results deals with TRIP13 expression in only 3 tissues. The authors might look at more comprehensive RNA-seq data from mice and humans.

      We examined TRIP13 protein expression in 8 mouse tissues by WB and found that TRIP13 protein was abundant in testis but present at a very low level in ovary and liver (Figure 1A). We feel that readers can easily look up the relative transcript levels of Trip13 in more tissues from mice and humans from NCBI database under “Gene”.

      (3) The null allele is semi-lethal. Is body size affected? Were the mice abnormal in any other ways, given that TRIP13 has been implicated in other diseases and processes, and is expressed in other tissues (TRIP13 stands for Thyroid receptor interacting protein).

      The body weight of 2-3 month-old males was not significantly different between wild type (24.3±2.8 g, n=5) and Trip13 KO mice (22.8±1.7 g, n=5, p=0.3, Student’s t-Test). We have included the body weight information in the revised manuscript. We didn’t observe abnormal somatic defects in the viable Trip13-null mice, nor did the authors report any in the Trip13 hypomorph mutants in two previous studies (Li and Schimenti, 2007; Roig et al., 2010).

      (4) Line 276 : It would be nice to elaborate on the "spatial explanation."

      We meant that TRIP13 localizes to SC while HORMAD proteins are removed from SC upon chromosomal synapsis, thus providing a spatial explanation. However, we have now deleted “spatial”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      However, there are several concerns to be explained more in this study. In addition, some results should be revised and updated.

      Thank you for your comments. The concerns were addressed by the description and experiment.

      Some results were revised and updated accordingly.

      Reviewer #2 (Public Review):

      The minor weakness of the study is inconsistent use of terminology throughout the manuscript, occasional logic-jump in their flow, and missing detailed description in methodologies used either in the text or Materials and Methods section, which can be easily rectified.

      Thank you for your review. We have revised the manuscript and corrected errors according to your comments.

      Reviewer #3 (Public Review):

      Importantly, besides the Miwi ubiquitination experiment which is performed in a heterologous and therefore may not be ideal for extracting conclusions, the possible involvement of ubiquitination was not shown for any other proteins that the authors found that interact with FBXO24. Could histones and transition proteins be targets of the proposed ubiquitin ligase activity of FBXO24, and in its absence, histone replacement is abrogated?

      Thank you for your comments. The histones and transition proteins were not found in the immunoprecipitates of FBXO24, suggesting they are not the direct targets of FBXO24, shown in Figure S3G.

      Miwi should be immunoprecipitated and Miwi ubiquitination should be detected (with WB or mass spec) in WT testis.

      We agree with this suggestion. In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      Therefore, the claim that FBXO24 is essential for piRNA biogenesis/production (lines 308, 314) is not appropriately supported.

      We appreciate the comment. We have revised the description and modified the claim on page 11.

      Reviewing Editor's note for revision

      (1) As noted by all three reviewers, as currently written the rationale to focus on MIWI is not entirely clear. A transitional narrative to focus on MIWI needs to be provided as well as an explanation for how the absence of FBXO24 as an E3 ubiquitin ligase is responsible for the observed mRNA and protein differential expression.

      We appreciate your comments. We have supplemented the transitional narrative by focusing on MIWI and explained mRNA and protein differential expression upon FBXO24 deletion, shown on Page 7 and Page 13, respectively.

      (2) As it can be indirect, mass spec detection of MIWI in testis co-IP and MIWI ubiquitination should be detected (with WB or mass spec) in WT testis.

      In the revision, the expression and ubiquitination of MIWI were detected in WT testis by the immunoprecipitation and ubiquitination assay, as shown in Figure 8H.

      (3) Please tone down the claim that FBXO24 is essential for piRNA biogenesis/production as it requires further evidence.

      We have revised the description and modified the claim on page 11.

      (4) Ontology analysis of the genes with abnormally spliced mRNAs to provide an explanation for developmental defects.

      In the revision, we have performed the ontology analysis and provided new data regarding the abnormally spliced genes, as shown in Figure S4D.

      Reviewer #1 (Recommendations For The Authors):

      Major comments

      (1) The authors performed mainly with the WT (or knock-in) and Fbxo24-knockout mouse model. Do the heterozygous males and their sperm have any physiological defects like FBXO24-deficient mice?

      This is a good question. We did the phenotype analysis and found that heterozygous males are all fertile, and their sperm do not have any physiological defects.

      (2) Fbxo24-KO sperm carries swollen mitochondria. How do the mitochondria affect sperm function?

      Thank you for raising this interesting question. Based on our data and published literature, the defective mitochondria were associated with energetic disturbances and reduced sperm motility, as shown on Page 12.

      (3) TEM images show that Fbxo24-KO spermatids carry swollen mitochondria and enlarged chromatoid bodies. How the swollen mitochondria and enlarged chromatid are defective for sperm motility and flagellar development, requires more explanation. In addition, it is unclear how the enlarged diameter of the chromatoid body is critical for normal sperm development.

      Thank you for your comments. The chromatoid bodies are considered to be engaged in mitochondrial sheath morphogenesis. Analysis of the chromatoid bodies' RNA content reveals enrichment of PIWI-interacting RNAs (piRNAs), further emphasizing the role of the chromatoid bodies in post-transcriptional regulation of spermatogenetic genes. We added this explanation on Page 12-13.

      (4) The authors only show band images to compare the protein amounts between WT and KO sperm and round spermatids. As the blots for loading controls are not clear, the authors should quantify the protein levels and perform a statistical comparison.

      We quantified the protein levels and performed a statistical comparison, as shown in Figure S3B.

      (5) The authors show the defective sperm head structure from Fbxo24-KO sperm in Figure 5. However, the Fbxo24-KO sperm heads seem quite normal in Figure 3. How many sperm show defective sperm head structure? In addition, the authors observed altered histone-to-protamine conversion in sperm, but it is unclear whether the altered nuclear protein conversion causes morphological defects in the sperm head.

      We appreciate the comments. In our study, we found over 80% of Fbxo24 KO sperm showed defective structure in the sperm head. Altered histone-to-protamine conversion caused the decondensed nucleus of Fbxo24 KO sperm. Notably, in many knockout mice studies, impaired chromatin condensation is frequently associated with abnormal sperm head morphology, as shown in reference 15 of Page 8.

      (6) The authors compare the protein levels of RNF8, PHF7, TSSK6, which participate in nuclear protein replacement in sperm. However, considering the sperm is the endpoint for the nuclear protein conversion, it is unclear to compare the protein levels in mature sperm. The authors might want to compare the protein levels in developing germ cells.

      Thank you for your comment. Yes, we actually detected the protein levels of RNF8, PHF7, and TSSK6 in the testes, not in sperm. We have corrected it in the Figure 5E. We apologize for our carelessness.

      (7)This reviewer suggests describing more rationales for how the authors focus on the MIWI protein. Also, it is wondered whether MIWI is also detected from testis co-IP mass spectrometry.

      We agree with this suggestion. Since MIWI was a core component of CB and also identified as an FBOX24 interacting partner from our immunoprecipitation-mass spectrometry (IP-MS) (Table S1), we focused on the examination of MIWI expression between WT and Fbxo24 KO testes. We have added this description in the revision (see lines 191-193 on page 7).

      (8) The authors need to provide a more detailed explanation for how the altered piRNA production affects physiological defects in germ cell development. In addition, it will be good to describe more how the piRNAs affect a broad range of mRNA levels.

      Thank you for your comments. The previously published studies have demonstrated that piRNAs could act as siRNAs to degrade specific mRNAs during male germ cell development and maturation. We have cited these studies on lines 369-372 of Page 13.

      (9) The authors observed an altered splicing process in the absence of FBXO24. However, it is a little bit confusing how the altered splicing events affect developmental defects. Therefore, the authors should state which mRNAs have undergone abnormal splicing processes and provide ontology analysis for the genes.

      We have performed the ontology analysis and showed the new data in Figure S4D.

      Minor comments

      (1) Figure 1A-C - Statistical comparison is missed. Numbers for biological replication should be described in corresponding legends.

      Thank you for your careful review. We have provided the statistical comparison and the numbers for biological replication in the legends of Figure 1A-C.

      (2) Figure 1E, F - Current images can't clearly resolve the nuclear localization of the FBXO24 testicular germ cells. To clarify the intracellular localization, the authors should provide images with higher resolution.

      The resolution of Figure 1E, F was improved, as suggested. Thank you!

      (3) Figure 1E, F - Scale bar information is missing.

      The scale bars of Figure 1E, F were provided.

      (4) It will be much better to show the predicted frameshift and early termination of the protein translation in Fbxo24-knockout mice.

      The predicted frameshift of Fbxo24-knockout mice was added and shown in Figure S1B.

      (5) It is required to provide primer information for qPCR.

      The primer information for qPCR was provided, as shown in Table S7.

      (6) The authors describe that Fbxo24-KO sperm show abrupt bending of the tail. However, the description is unclear and the sperm shown in Figure 3C seems quite normal. The authors should clarify the abnormal bending pattern of the tail and show quantified results.

      Thank you for pointing out this issue. In Fbxo24 KO sperm, abnormal bending of the sperm tails mainly included neck bending and midpiece bending. We have shown them in Figure S3A.

      (7) The authors mention that Fbxo24-KO sperm have swollen mitochondria at the midpiece, but this is also unclear. How many mitochondria are swollen in Fbxo24-KO sperm?

      This is a good question. However, since it is very difficult to observe all of the mitochondria in each sperm using the electronic microscope, we could not quantify the swollen mitochondria in Fbxo24 KO sperm.

      (8) Scale bar information is missed - Fig 3C insets, Fig 3D, Fig 3F insets, 4A insets, Figure 4C insets.

      All the scale bars have been added.

      (9) How many sperm have annulus defects? In Figure 3F, WT sperm does not have an annulus, which could be damaged during sample preparation. Is the annulus defects in Fbxo24-KO sperm consistent?

      Thank you for asking these questions. Based on our results, about 30% of Fbxo24 KO sperm showed defective annulus structure. Since both TEM (Figure 3F) and SEM (Figure 3G) results clearly showed the defective annulus structure of Fbxo24 KO sperm, we believe the annulus defects are consistent and highly unlikely caused by sample preparation.

      (10) A Cross-section image for the endpiece of Fbxo24-KO sperm is not suitable. There is a longitudinal column structure of the principal piece.

      Thank you for your comments. It is difficult to observe a completely longitudinal structure of sperm tail under TEM. The cross-section of the endpiece and principal piece allowed us know the structure of the axoneme, ODFs and fibrous sheath (FS).

      (11) The endpiece of Fbxo24-KO sperm seems to have a normal axoneme. Do all endpieces of Fbxo24KO sperm have normal axoneme? Also, the authors need to describe whether an axonemal structure is damaged and disrupted in all Fbxo24-KO sperm.

      Our TEM data showed the axonemal structure was impaired in the endpiece of Fbxo24 KO sperm (See right panels of Figure 3H). Moreover, based on the ultrastructure analysis of TEM, we found over 90% of Fbxo24 sperm had a damaged axonemal structure.

      (12) Reference blots in Fig 3I, 3J, 4E (left), 5C and 5E are quite faint. The authors should replace the blot images.

      Thank you for pointing out this. We have rerun Western blot multiple times but could not obtain better images due to antibody sensitivity. However, we quantified the protein levels and performed a statistical comparison, as shown in Figure S3B, to establish a good readout from these images for the readers.

      (13) Loading controls are required - 7D-H.

      Done as suggested. Thanks!

      (14) How do the authors measure the midpiece length? From where to where? This should be clarified.

      Good question. We measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this on Page 16.

      (15) How are the bands for Fbxo24 shifted during IP in Fig 7A?

      The protein modification in the interaction may cause the band shift.

      (16) There are several typos throughout the manuscript. Please check carefully and fix them.

      Thank you for your careful review. We have corrected and fixed all the typos as far as we can.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      (1) Please provide a schematic of HA-Fbxo24 knock-in construct and strategy together with knockout (Figure S2) or even separately early in Figure S1. The description of using the transgenic mouse is mentioned even earlier than the knockout but there are no citations or methods provided in the text other than that listed in Materials and Methods.

      Thank you for your suggestion. As suggested, the schematic of the HA-Fbxo24 knock-in strategy has been supplemented in Figure S2A. The description of using the transgenic mouse has been added to the results, as shown on page 4 of lines 102-103.

      Also, it is not clear to what extent the phenotypic and molecular characterization of HA-transgenic mice is performed. For example, Lines 134-139: The use of Fbxo24-HA labeled transgenic mice results in the rescue of spermatogenesis and fertility as shown in Figure 2F by measuring the litter size. It is not clear how this observation leads the author to state that this rescues defects in spermiogenesis. Please clarify how and what other measures are taken to support this conclusion. Is the observed infertility due to defects in spermatogenesis or spermiogenesis?

      Thank you for your question. We crossed FBXO24-HATag males with FBXO24−/− females to obtain FBXO24−/−; FBXO24-HATag males. We examined the testes volume and histological morphology of FBXO24−/−; FBXO24-HATag males and found that they were similar to FBXO24+/−; FBXO24-HATag littermates, indicating that spermatogenesis was restored, as shown in Figure S2H.

      (2) Line 107 vs Line 114: Please use the terminology spermatogenesis and spermiogenesis consistently throughout the text. Earlier in the introduction, the authors clearly defined that spermatogenesis involves three phases, with the third phase referred to as spermiogenesis. However, the author concludes in the first line that "FBXO24 plays a role during spermatogenesis" while summarizing at the end of the paragraph that this protein is "expressed in haploid spermatids specifically during spermiogenesis". Therefore, it is not clear whether the authors conclude that FBXO24 is important for all of spermatogenesis (line 107) or only for part of spermiogenesis (line 114). Another example is line 219 vs. 238: At this point in the manuscript, it is again unclear whether the authors want to study molecular changes during spermatogenesis or spermiogenesis upon FBXO24 depletion. Many examples of such cases throughout the text, and it is recommended to be consistent in using more restrictive terminology whenever applicable for a clear interpretation.

      We thank you for your careful review. We have double-checked the terminology of spermatogenesis and spermiogenesis and made it consistent throughout the text of the revised manuscript.

      (3) It is not clear how rampant/frequent the Fbxo24-knockout sperm show defects in head morphology based on Figures 3C, 3F, and 5A since it seems that there are some sperm showing relatively normallooking sperm heads. Please provide quantification.

      We have performed the quantification and found that over 80% of Fbxo24 KO sperm showed defective structures in the sperm head.

      (4) Figure 3B: The authors describe in the figure legend that 3 mice were analyzed in each group. The standard deviation for the WT analysis is missing, or if the author wanted to set the WT value to 100%, the bar and scale shown on the y-axis do not fit. The value for WT looks more like 95%.

      We have indeed analyzed sperm motility based on the WT value set at 100% and have revised Figure 3B in the revision. We apologize for this oversight.

      (5) Figure 3 B and C: It is not clear how the motility is measured. Is CASA used (not described in Methods). The conclusion about abnormal flagellar bending in KO spermatozoa cannot be drawn from the static microscopic images alone. Please provide more details of motility analysis together with videos of live cell imaging.

      The sperm motility was measured manually using a hemocytometer, according to the reference.

      We provided the details of sperm motility analysis in the Materials and Methods section on Page 16.

      (6) Figure 3 I and J: These are one of a few figures that are not supported by statistical analysis. In particular, for 3I, GAPDH controls of WT and KO protein do not show equal loading, which could explain the lower expression of the KO protein. Please show normalized bar graphs with multiple biological replicates or at least show a representee technical replicat that shows equal loading of GAPDH to better support the conclusion.

      Thank you for your suggestion. Statistical comparison of relative protein expression was supplemented, as shown in new Figure S3B.

      (7) Line 184: It is not clear how the authors define a swollen mitochondrion? Are there any size criteria (roundness) that can be measured to distinguish between a swollen and a non-swollen mitochondrion? It is recommended to use another terminology as often 'swollen' implies there is a difference in osmolarity but there is no experiment to support this implication.

      Thank you for your comment. We have changed the “swollen” to “vacuolar” in the revision, as shown on Page 7.

      (8) Figure S4, without a bright field image, it is hard to see the purity and morphology of the isolated prep. Please provide the bright field images together or as overlaid images.

      We agree with your comment. We have provided the overlaid images in new Figure S4A.

      (9) There is a big logic jump in what prompts the authors to look MIWI protein level and link the observation to MIWI/piRNA pathway in both Introduction and Results while it is one of the main findings. It is recommended to provide a better rationale and logical flow in the text.

      Thank you for your suggestion. We have added a sentence explaining why we wanted to focus on studying MIWI expression (see lines 190-193 on page 7).

      Minor comments

      (1) Please keep all the conventions of gene vs. protein nomenclature. For example, write the genes mentioned in the figures in italics with the first letter in Capital, as it is done in the main part. Proteins should be in ALL CAPITAL like FBXO24.

      The names of gene and protein have been revised in the revision, as suggested.

      (2) In the MM section, the name of the manufacturer and the location of the materials used are missing in several sections. Please go back through the MM section and add this information in the appropriate places.

      Done as suggested. Thank you!

      (3) On page 4, the authors mentioned that "Further qPCR analysis of developmental testes and purified testicular cells showed that FBXO24 mRNA was highly expressed in the round spermatids and elongating spermatids (Fig 1B-C)". Please include statistical analyses for Fig 1B-C as well as for Fig 1A to support the written statements.

      Statistical comparison was supplemented, as shown in Figure 1. P-values are denoted in figures by *p < 0.05.

      (4) Figure 3E: Please describe in more detail how the length of the midpiece was measured. Was it based on TEM images or based on fluorescent images using MitoTracker?

      As we responded to Reviewer #1, we measured the midpiece length from the sperm neck to the sperm annulus by MitoTracker staining. We have clarified this in the Method and Material section on Page 16.

      (5) Line 431: In the "Electron Microscopy" section of the MM part, the author should indicate the ascending ethanol series (%) used.

      Done as suggested. Thank you!

      (6) Line 432: The thickness of the sections prepared is missing, as well as an indication of the microtome used.

      We have added thickness and the microtome in the Method and Material section on Page 16.

      (7) Line 433: If the generated tiff files have been processed with Adobe Photoshop, this information is missing.

      We have provided information on the usage of Adobe Photoshop for the generation of tiff files on Page 17.

      (8) Lines 445, 452, 467: In some places in the paper, the temperature is written with a space between the number and {degree sign}C, and sometimes it is not. Please go through the paper and make it consistent. The usual spelling is 4{degree sign}C.

      We have gone through the manuscript and checked all the spelling of temperature writing to make them consistent. Thank you for careful review.

      (9) Line 469: The gel documentation system used is not mentioned.

      Done as suggested. Thank you!

      (10) Line 469: The 'TM' should be superscripted.

      Done as suggested.

      (11) Line 489: A space is missing between the changes and the parenthesis.

      Done as suggested.

      (12) Line 495-496: The authors write that the fractions enriched with round spermatids after sedimentation were collected manually. Was a determination of cell concentration - e.g., 2 x106 cells/ml -performed after collection of the cells? How were the cells stored until use? Please add the sedimentation time and used temperature.

      Store the cell in the 1´ Krebs buffer on ice. The cell sediment was through a BSA density gradient for 1.5 h at 4°C. The cell concentration was determined after collection, as shown on Page 18.

      (13) Line 505: spelling error. Instead of " manufacturer's procedure" it is written manufactures' instructions.

      The spelling error was corrected.

      (14) Line 520: Please write a short sentence on how the purification of the 16-40 nt long RNA was performed.

      The length of 16–40 nt RNA was enriched by polyacrylamide gel electrophoresis. We added this information on Page 19 of line 531.

      (15) Line 528: The version of the used GraphPad software is missing.

      The version of GraphPad software was supplemented, as shown on Page 19.

      (16) Line 677: For qPCR analyses, the number of mice analyzed (N) and a statistical evaluation are missing.

      The statistical comparison and the numbers for biological replication were added, as shown on Page 26.

      (17) Figure 3D: Please add a scale bar.

      Done as suggested. Thanks!

      (18) Line 371 and Line 377: Two times "in summary" is written. Please make one summary for the whole paper.

      This sentence was revised, as shown in Page 13.

      (19) Line 382: To be consistent in the whole paper, please write Figure 10 in bold letters.

      Done as suggested.

      (20) Please make the size and font of the references consistent with the main text.

      Done as suggested. Thanks again for your careful review.

      Reviewer #3 (Recommendations For The Authors):

      I would like to see the description of the FBXO24 immunoprecipitation experiment performed in HEK293T cells. This somatic cell line does not normally express Miwi, so how Miwi was detected in FBXO24 mCherry IP beads? It is not mentioned if Miwi is expressed from a recombinant vector in this experiment. Similarly, I would like to see a better description of the experiment described in the same paragraph towards the end of it with the ubiquitin peptides, it is not clear.

      Thank you for your comments. FBXO24-mCherry was expressed in HEK293T cells and the immunoprecipitates was incubated with the protein lysate of the testes (see lines 268-272 on Page 10). The description of the ubiquitin experiment was added as well, as shown in lines 283-286 on Page 10.

      Line 263: I think the term ectopic here is not appropriate, a correction is needed.

      We have changed “ectopic” to “increased” in the revision (see line 268 on Page 10).

      I would like the authors to provide a tentative explanation or evidence of why FBXO24 KO males are completely sterile, even though there are still mature sperm produced with some motility. Since there are defects in nuclear condensation it will be very relevant to check DNA damage/fragmentation, which could contribute to the sterility phenotype.

      This is a good suggestion. We reanalyzed the sperm DNA damage by TUNEL staining and shown the new data in Figure S3E-F.

      Line 213: There have been some conflicting reports about the role of RNF8 in spermiogenesis, but a recent report has shown that RNF8 is not involved in histone PTMs that mediate histone to protamine transition (Abe et al Biol Reprod 2021 https://doi.org/10.1093%2Fbiolre%2Fioab132).

      Thank you for your comment. We have cited this critical reference and discussed it in Discussion section on Page 12.

      Figure 7: I would like to see zoomed-out views of the affected exons, so that flanking unaffected exons can be used as a reference for unaffected splicing. Most of the genome browser views in this image only show affected exons and it is impossible to see if these alone are affected or if the reduced RNAseq coverage in those exons is a result of overall reduced mapped reads in these genes. Also, a fixed Y axis with the same max value should be shown for these genome browser snapshots so that the expression level is comparable between the two genotypes.

      Thank you for your comments. Loading control of RT-PCR and scale range of Y axis were added in new Figure 7.

      Minor corrections:

      Line 70: correct "..functions as protein-protein interaction..".

      Thank you for your careful review. We have corrected this sentence (see line 69 on Page 3).

      Line 101: correct "..qPCR analysis of developmental testis..".

      We have corrected this sentence (see line 100 on Page 4). Thanks again.

      Line 116: correct "..results in detective..".

      Corrected.

      Line 186: correct ".. explored..".

      Corrected.

      Line 218: correct ".. gene expressions.

      Corrected.

      Line 221: correct "..genes significantly differentiated expressed".

      Corrected.

      Line 241: FBXO24 was shown earlier in both cytoplasm and nucleus.

      We have changed “FBXO24 is mainly confined to the nucleus” to “FBXO24 expressed in the nucleus”, as shown in line 247 on Page 9.

      Line 501-502: correct "..reverse transcriptional".

      “reverse transcriptional” was changed into “reverse transcription”, showing in Page 18.

      Line 686: correct ".. deficiency male..".

      Corrected.

      Line 769: correct "..Western blots were adopted..".

      Corrected.

      Line 784: correct "..WT tesis..".

      Corrected.

      I cannot understand exactly what is shown in Figure 9B. Some elements marked on the X-axis are single base locations (-2K, TSS, +2K) and others are stretches of sequences so they cannot be equivalent. Why there is only an intron shown? There should be a measure of normalized expression on the Y-axis.

      Thank you for your questions. The X-axis means that genome segments were scaled to the same size and were calculated the signal abundance, which was analyzed by computeMatrix. Aim to know the piRNA source, piRNA was mapped to the gene body, including introns, CDS and UTRs. The value of the Y-axis is the normalized count.

      Figure 6F is not needed.

      Figure 6F was used to illustrate the number of different types of mRNA splicing upon FBXO24 deletion in the round spermatids. To better understand the splicing for the reader, we decided to keep it.

      The last two paragraphs of the discussion seem to be redundant.

      Thank you for pointing out this. We have revised the last two paragraphs of the discussion.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      R. C. Edgar, et al., Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through host-switching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version did not necessarily fail on our dataset due to its size (ALE ran, but provided unrealistic parameter estimates and was not able to output possible reconciliations, as mentioned in our Material and Methods section). We think it most likely did not run because there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent transfers is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. Following a suggestion from reviewer #3, we are going to try running the dated version of ALE independently on the alpha and beta-coronaviruses, resulting in smaller datasets. This will help us elucidate whether the dated version of ALE fails due to data size or the absence of a codiversification pattern.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that cross-species transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host transfers involve unsampled intermediate hosts. To address the reviewer's comment, we will better underline the importance of sampling biases in our main text and include the suggested references. We will also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text.

      We agree that distinguishing between alpha and beta coronaviruses will provide useful additional insights; we are going to run separate cophylogenetic analyses for these two sub-clades. We will report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that we will now discuss.

    2. Reviewer #3 (Public Review):

      Summary:<br /> This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that cross-species transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:<br /> The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:<br /> I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      REPLY TO REVIEWERS

      Reviewer #1

      __Evidence, reproducibility and clarity: __Interesting results from exposing human brain organoids to FGF8 include suggestions that FGF8 contributes to the anterior to posterior patterning of the neocortex, as previously reported in mouse. Good, varied methods with reproducibility described well in the methods section. It would improve the reader's experience however to cite numbers of organoids used in specific experiments/assays in the main text.

      Response: We thank the Reviewer for the positive assessment of our study, and we agree that citing the number of organoids per experimental approach would better allow the readers to appreciate the intrinsic variability of organoid protocols. We will include the number of organoids per experiment both in figure legends and in Materials and Methods as a summary table.

      ....Organoids do not develop individual neocortical areas. To approach this issue of area identity, however, the authors compared control and FGF8-treated organoids against an existing dataset of transcriptomes of human fetal brains that separated pre-frontal, motor, somatosensory, and visual areas. This seems a good idea, but results showed both treated and untreated organoids alike expressed genes characteristic of somatosensory and pre-frontal cortical regions (anterior and midlevel areas) apparently suggesting that exogenous FGF8 had little effect. Because the previous dataset was not the authors' work, however, and because a comparison between organoids and actual human tissue is hard to interpret, this whole section is probably only confusing to include.

      Response: We would like to clarify to the reviewer that the effect of FGF8 on antero-posterior area identity is only partial in our organoid system, suggesting that different doses or temporal windows of FGF8 treatment may be necessary to achieve a stronger modulation of area identity genes. We agree with the Reviewer that, due to this partial effect, the transcriptomic comparison with fetal brain areas might be confusing for readers. Therefore, we plan to move this type of data to the Supplementary Material. We thank the Reviewer for bringing this to our attention.

      The authors further stress a dorsal/ventral effect in FGF8-treated organoids. The population of ventral telencephalic interneurons, produced in the lateral ganglionic eminence in mice, expand in the human organoids at the expense of glutamatergic neurons of the dorsal telencephalon. This may be consistent with the loss of ventral telencephalic structures in FGF8-deficient mice. The authors suggest that FGF8 expansion of interneurons is a novel finding not previously seen in animal research and may point to a human-specific characteristic. Readers may believe this part of the paper requires more support, just because multiple studies of FGF8 have not revealed this action. Overall, this paper would benefit from shortening, and by statements that some of the results suggest, but do not guarantee, particular conclusions.

      Response: We agree with the reviewer that before stating that FGF8-induced expansion of interneurons in dorsal telencephalic territories is a human-specific characteristic, more support in mouse studies would need to be performed. However, as suggested by reviewer 2 below, there is some evidence that ventral interneuron markers, such as ASCL1 and DLX2, are expressed in the dorsal telencephalon of the early fetal human cerebral cortex, even if at much lower levels than in the ventral telencephalon, and that individual human cortical progenitors can generate both excitatory neurons and inhibitory interneurons in culture. Thus, FGF8 might promote an intrinsic capacity of dorsal cortical neurons to induce the generation of ventral interneurons, which would indeed be a human (or maybe primate)-specific trait. We plan to better discuss this issue in the revised version of the manuscript.

      Significance

      The paper is for a fairly specialized audience interested in the development of the cerebral cortex, but also has interest regarding developmental human brain defects

      Response: Although the manuscript sounds upon first reading specific to a specialized audience interested in cortical development, we believe that the strength of our human organoid system is the formation of regionalized organoids including brain regions other than the cortex. Moreover, considering the increasing attention on brain organoids in general, and the lack of information on the action of FGF8 during human cortical development, we are confident that this study will attract a broader audience.

      Interesting results from exposing human brain organoids to FGF8 include suggestions that FGF8 contributes to the anterior to posterior patterning of the neocortex, as previously reported in mouse. Good, varied methods with reproducibility described well in the methods section. It would improve the reader's experience however to cite numbers of organoids used in specific experiments/assays in the main text.

      Response: We thank again the reviewer for acknowledging the potential of our study. As previously mentioned, we agree that providing information about the number of organoids used will enhance the statistical analysis. This will definitely be added in a revised version.

      Reviewer #2

      Evidence, reproducibility and clarity

      ……However, organoid technology offers a solution to this and the present study presents an elegant approach to addressing how FGF8 signalling directs both anterior/posterior and dorsal/ventral identity in neural progenitors and their offspring in human development. This has both biological and clinical relevance has the study demonstrates how FGF8 may be a key regulator of expression of susceptibility genes for neurodevelopmental conditions. The methods and approach are described clearly and in great detail and it serves as an exemplar for how studies like this might be pursued in the future. Likewise, the results are presented logically, using excellent figures with clear descriptions of the findings. It is positively entertaining to read and very thought provoking. We don't have any major issues with the conclusions.

      Response: We sincerely appreciate the reviewer’s enthusiastic and thoughtful feedback. The positive remarks on the clarity and detail of our methods and results are very encouraging, and we are pleased that the reviewer found our study both entertaining and thought-provoking.

      We have some minor issues over presentation and interpretation that we would like the authors to consider.

      1) Developmental staging. It is stated that the organoids have reached a developmental stage equivalent to 16.5 GW based on expression of key genes such as CRYAB. Firstly, we would prefer an unambiguous way of stating age such as post-conceptional age. It is never clear what gestational weeks exactly means (post-menstrual, post-ovulatory?). Secondly, in several figures, UMAPs generated from the organoids are presented alongside representative mouse brain sections from E13.5 which is equivalent to about 11 post conceptional weeks in human. Although we find the mouse sections helpful, perhaps the potential discrepancy in developmental stage should be pointed out.

      Response: We agree with the reviewer that the staging of human organoids in vitro can be very tricky. We will clarify this issue by using post-conceptional weeks (PCW) instead of gestational weeks in the revised version of the manuscript. It is true, that schematic representations of brain sections of mouse telencephalon of around E13.5 were used in the paper, but the idea was to choose an age where dorsal and ventral territories are clearly separated during embryogenesis to highlight the expression of the different genes. We will change the schematics to make sure they can be better compared with scRNA-seq data and will highlight that they represent early mid-gestation stages of mouse embryos.

      2) Dorso-ventral patterning. Firstly, we wondered why VGLUT2 was used as a marker for dorsal identity when it is generally regarded as being expressed by subcortical neurons, e.g. thalamus and midbrain, whereas VGLUT1 is the standard marker for cortical neurons :https://doi.org/10.1016/j.tins.2003.11.005? Potentially, VGLUT2 expression may be more an indicator of mid/hindbrain identity than cortical identity. Is there any evidence for VGLUT2 expression by cortical cells in development? Also, MASH1 (more correctly called ASCL1) is not exclusively ventral, having shown to be expressed in a subset of intermediate progenitor cells for glutamatergic neurons in rodent doi:10.1093/cercor/bhj168 and particularly human doi: 10.1111/joa.12971. We are surprised that the recent evidence that human cortical progenitors do have capacity to generate GABAergic neurons 10.1038/s41586-021-04230-7; 10.1101/2023.11.06.565899 is not mentioned in this section as perhaps FGF8 doesn't so much ventralise progenitor cells as promote an inherent property. This might explain why MGE-like identity is not observed, whereas LGE/CGE like is, as it has already been shown that MGE-like gene expression by dorsal progenitors is very much less likely than LGE/CGE like expression 10.1038/s41586-021-04230-7; DOI 10.1007/s00429-016-1343-5

      Response: We fully agree and thank the reviewer for bringing to our attention this interesting discussion and pointing to our confusion between VGLUT1 and VGLUT2 expression profiles. After checking our scRNA-seq data, we realized that the Reviewer is absolutely correct about the issue of using VGLUT2 as a dorsal telencephalic marker, as it is expressed in both dorsal and ventral cells. In contrast, VGLUT1 appears to be more specific for neocortical (dorsal) neurons (see UMAP images below). Moreover, it perfectly fits with our results showing a downregulation of VGLUT1 in dorsal glutamatergic neurons.

      We are currently conducting additional staining experiments to support this point. Specifically, our plan includes:

      • Performing immunostaining assays to validate the expression patterns of VGLUT2 in dorsal cortical neurons, notably triple VGLUT2/TRB1/CTIP2 and double VGLUT2/SATB2 stainings, to be added in Supplementary material. This will allow to confirm the use of VGLUT2 as a dorsal marker.
      • Performing additional immunostainings involving VGLUT1, either juxtaposed with GAD67 to assess dorso-ventral neuronal balance or in conjunction with dorsal cortical markers to examine co-expression. This new analysis will be quantified using AI and integrated into Figure 4. Notably, these experiments will provide a comprehensive understanding of the expression patterns of VGLUT1 and VGLUT2 in the dorsal or ventral telencephalon and will further elucidate their utility as markers for specific neuronal populations in human brain organoids.

      Furthermore, and importantly, we fully agree with the reviewer that human dorsal cortical progenitors do have the ability to generate GABAergic neurons, even if at lower efficiency than glutamatergic neurons, and that FGF8 might promote this inherent property in human organoids. This new discussion and the new references suggested by the reviewer will significantly contribute to our data interpretation about LGE/MGE development. Therefore, we intend to incorporate them into the revised version of the text. Again, thank you to the reviewer for these insightful suggestions.

      3) MEA recordings. The presentation of electrophysiological data is quite simple. Detection of spikes is claimed therefore representative traces of the spikes should be included and these can be easily generated with the Maxwell system software. It isn't clear how many times the experiments were repeated and there is no statistical analysis. For example, in the text they state on page 15 'Notably, WNTi+FGF8 organoids showed lower spike frequency (firing rate) and amplitude'. The amplitude difference is 43uV vs 41uV; we doubt this is significantly different. Threshold for detecting burst firing appears to be different between Figure 5C and 5d. Why? Shouldn't it be the same? The axonal tracking analysis in fig 5E/F needs more explanation. How many axons were tracked? Is there any statistical analysis beyond means and standard deviation?

      Response: We agree with the Reviewer that the presentation of our electrophysiological data need further improvement. We are currently repeating key recordings on four additional samples coming from two different batches, which will allow us to conduct a better statistical analysis.

      In detail, we plan to:

      • Extract representative traces of spikes from the Maxwell software, which will be included as Supplementary material. Footprints of action potentials will be extracted using the in-built analysis tool available in the software.
      • Perform axon tracking analysis on three control and three FGF8-treated samples coming from two distinct batches of organoids. Recordings and analyses will be conducted over a period of two weeks to monitor the growth of axonal tracts, enabling us to perform statistical analysis and observe the temporal evolution of axonal growth. Furthermore, placing the threshold for detecting bursts in the network analysis at different levels in control or treated samples seems to be a routine procedure in this MEA system. Indeed, while the user can set a fixed multiplying factor (that is, of course, the same for both control and treated samples), it is the software that multiplies such factor by the basal average activity of the sample. In this way, bursts can be detected as synchronized activity emerging from the basal one, which, of course, varies in every sample. We plan to better explain this point in the Materials and Methods section, and we thank the reviewer for raising this lack of clarity.

      4) Anterior/posterior patterning. Returning to the subject of cortical GABAergic neurons, it has been proposed that the prefrontal cortex contains a relatively higher proportion of GABAergic neurons, although the mechanism for this has not been elucidated (see https://doi.org/10.1111/joa.13055 and references therein). Might higher anterior FGF8 specifying cortical progenitors to produce GABA neurons have a role in this?

      Response: We thank the reviewer for citing this very interesting review. It is highly possible that FGF8 normally expressed anteriorly might have a role in inducing distinct GABAergic subtypes, such as Calretinin+ interneurons, which have been found to be more abundant in frontal cortices of the developing human fetal brain. Our organoids are too early in terms of developmental age to verify whether interneuron subtypes such as CalR+ are more or less represented, but we will definitely add this very interesting point to our discussion in the revised version.

      5) Nomenclature. As this study principally presents data on mRNA expression levels it might be preferable to use italicised capitals for all gene names (except where referring to mouse genes). Also, common names are used in places and standard gene names in others, e.g. COUPTF1 is referred to NR2F1 but VGLUT1 is not referred to SLC17A7 (also see above re MASH1). It would be good to see everything standardised.

      Response: We appreciate the Reviewer for highlighting these discrepancies. We will standardize gene names both in the text and figures accordingly.

      Significance

      This study involves a very imaginative use of organoids combined with a variety of approaches to test if fundamental principles of forebrain development, particularly cell specification and regional patterning, that we have learnt from mouse models are relevant to human brain development. It also has clinical relevance as it explores potential disruptions to development that leader to diseases of higher cognition, such as autism of schizophrenia. It is a very accessible manuscript that should have broad appeal. It makes several incremental additions to the field and points the way to future experiments in this area.

      Response: We sincerely thank the Reviewer's insightful comments and positive assessment of our study.

      __Reviewer #3 __

      __Evidence, reproducibility and clarity: __

      In the manuscript "FGF8-mediated gene regulation affects regional identity in human cerebral organoids" the authors used FGF8 to change cellular fate in human brain organoids. The experiments are well-performed and the authors used well-established protocols to generate brain organoids. The results clearly show that FGF8 addition induces an increase of diencephalon/midbrain markers (OTX2, EN2), suggesting that long-term FGF8 treatment can induce also posterior regional identities. These data are reinforced also by scRNAseq highlighting a possible mix of cellular identity.

      Response: We thank the reviewer for this encouraging report about our study highlighting the significance of our findings.

      Main concern:

      1. The authors should start using FGF8 at later stages than day 19-21, in trying to maintain the forebrain identity.

      Response: As the Reviewer correctly pointed out, the temporal window of FGF8 treatment seems of pivotal importance for the final outcome of regional identity acquisition. Indeed, while early treatment with FGF8 at day 5 disrupts FOXG1 expression in organoids, as demonstrated in Supplementary Figure 1, our first attempts at adding FGF8 at day 15 resulted in poor regulation of the major FGF8-target gene NR2F1. However, we noticed that high expression of FOXG1 was still maintained, supporting forebrain identity. We fully agree with the reviewer that it is worth treating organoids with FGF8 at later stages to test whether forebrain identity becomes enriched while midbrain one is reduced, which would highlight an FGF8-dependent dosage of forebrain identity acquisition. To this purpose, we have already started additional experiments to assess the effect of delayed FGF8 treatment on forebrain markers and FGF-target genes, such as ETV1, SPRY4, DUSP6, ETV4 and ETV5, but also on representative midbrain markers. Importantly, we will treat the same batch of organoids with the same amount of FGF8 but at different times to be able to compare the different treatments in parallel. We plan to incorporate these supplementary analyses into the Supplementary material to provide a more comprehensive characterization of the efficiency time windows of FGF8.

      In detail, we plan to structure these additional experiments as follows:

      • We will culture in parallel neural progenitors (cortical induction protocol, with XAV-939 as a WNT inhibitor) that will be treated with 100 ng/ML FGF8 starting at day5 (early treatment), at day10 (normal treatment) or at day 20 (late treatment).
      • Each condition will require at least n=6 organoids.
      • Samples will be cultured until day 30.
      • At day 30, we will fix n=3 organoids per condition to be processed by immunostaining, and harvest n=3 organoids per condition for RNA extraction and Real Time RT-PCR analysis.
      • By immunostaining, we will measure the number of FOXG1+ cells as a read-out of telencephalic identity and the intensity of NR2F1 staining to evaluate FGF8 action.
      • By RT-PCR, we will measure the expression level of the following regional identity markers and FGF8 target genes: FOXG1, EN2, OTX2, NR2F1, ETV1, SPRY4, DUSP6, ETV4 and ETV5. This experimental setup will allow us to further detail the efficiency of distinct temporal windows for FGF8 treatment and their effects on cell identity and FGF target gene modulation. However, based on the first data we already obtained, we expect poor FGF target gene modulation upon late FGF8 treatment. This is why we believe that the temporal window we selected for our study already represents an optimal compromise between maintaining high levels of FOXG1 while effectively modulating FGF8 targets in human organoids.

      To verify the identity of the neurons in the organoids the authors should check their ability to make projections in immunodeficient mice. Human iPSC-derived cortical neurons establish subcortical projections in the mouse brain after transplantation and the location of the different neuronal projections could reveal the rosto-caudal identity of the cortical neurons.

      Response: We agree with the reviewer that in general conducting in vivo transplants of human organoids offers an interesting approach to testing the identity of differentiated neurons by tracking their projections. However, we believe that due to the multi-regional character of FGF8-treated organoids (which includes also midbrain-like neurons), their transplant into the neocortex would be of difficult interpretation and would not reveal the precise rostrocaudal identity of transplanted human cortical neurons, as requested by the reviewer. Furthermore, this would almost constitute an entire project on its own, given the technical challenges associated with such experimental approaches. We think that our thorough scRNA sequencing analysis is powerful enough for assessing cell identity, as supported by the majority of organoid studies investigating cell identity through scRNA-seq without resorting to transplantation. In our study, the scRNA-seq analysis was subsequently validated by several steps of immunostainings, a simple but fundamental corroborative control approach that is sometimes overlooked in similar studies. Finally, we would like to emphasize that reviewers #1 and 2 found our complementary approaches (molecular, cellular, and functional) appropriate, well-performed, logical and reproducible.

      Significance:

      The proposed protocol is useful to generate brain organoids with mixed cell populations from different regions of the brain (forebrain, midbrain, hindbrain). However, has limited applications since is not clear whether the proposed structures have some kind of organization.

      Response: We agree with the Reviewer that each protocol comes with its own limitations and that a careful characterization of the proportion of different regional domains could definitively improve the significance and applicability of our protocol. To this aim, we are now using artificial intelligence-mediated detection of cortical versus midbrain-like domains in control and FGF8-treated organoids, to further improve the characterization of distinct cellular populations and quantify the extent of their domains in multi-regional organoids. These data will be added in Figure 3.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to first thank the Editor as well as the two reviewers for their enthusiasm and careful evaluation of our manuscript. We also appreciate their thoughtful and constructive comments and suggestions. They did, however, have concerns regarding experimental design, data analysis, and over-interpretation of our findings. We endeavored to address these concerns through refinement of our framing, inclusion of additional new analyses, and rewriting some parts of our discussion section. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review)

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      Thanks very much again for the evaluation and comments. Please find our revision plans to each comment below.

      The weak points of this paper are that its findings are not sufficiently supporting their arguments, and there are several reasons for this:

      (1) Does the grid-like activity reflect 'navigation over the social space' or 'navigation in sensory feature space'? The grid-like representation in this study could simply reflect the transition between stimuli (the length of bar graphs). Participants in this study associated each face with a specific length of two bars, and the 'navigation' was only guided by the morphing of a bar graph image. Moreover, any social cognition was not required to perform the task where they estimate the gridlike activity. To make social decision-making that was conducted separately, we do not know if participants needed to navigate between faces in a social space. Instead, they can recall bar graphs associated with faces and compute the decision values by comparing the length of bars. Notably, in the trust game in this study, competence and trustworthiness are not equally important to make a decision (Equation 1). The expected value is more sensitive to one over the other. This also suggests that the space might not reflect social values but perceptual differences.

      The Reviewer raises an interesting point. We apologize for not being clear enough to address this possibility in our original manuscript and we will improve the clarity in our revision. To address this issue, we would like to break it into two sub-questions and answer them separately: 1) Are participants merely memorizing the values associated with each avatar or do they place the avatars on a two-dimensional map in their internal representation. 2) If so, are the two dimensions of this internal representation social dimensions relating to competence and trust or sensory dimensions relating to bar height (i.e., social space or sensory space).

      For the first question, we hope our analysis of the distance effect on the reaction time in the comparison task can address this issue. Specifically, it came from the idea that distance is a measure of similarity between two avatars in the 2D social space. The closer two avatars are, the more similar they are, hence distinguishing them will be harder and result in longer reaction time. If participants are merely memorizing the avatars as six isolated instances without integrating them into a low-dimensional map, then avatars should be equidistant (as if they were lying on the vertices of a 5-simplex), and would not show a distance effect. Therefore, we interpreted the stronger distance effect as a behavioural index of having a better internal map-like representation. This approach is adopted from the work by Park et al. (2020), where they used the distance effect to demonstrate human brains map abstract relationships among entities from piecemeal learning.

      For the second question of ‘social space’ vs. ‘sensory space’, our study adopted the paradigm developed by, in which they used a similar way to construct a conceptual space and found that such space can be represented with grid-like code in the entorhinal and prefrontal cortex. We stayed close to the original design by Constantinescu et al. (2016) and hoped that our work could provide, to some extent, a close replication of their result but using non-spatial social concepts instead. Indeed, this led to the limitation of our study that participants are passively traversing the artificial space rather than actively navigating in the space to make decisions/inferences. And we did not find sufficient evidence as reported in previous grid-like coding fMRI studies. This may have to do with low signal quality in the medial temporal region, we are not entirely sure. Nevertheless, we don’t think our findings contradict or disprove previous findings in any way. Here we would also like to point to the work by Park et al. (2021). Their task involves making novel inferences in a 2D social hierarchy space and found that grid-like code in the entorhinal cortex and medial prefrontal cortex support such novel inferences. Hence, we argue that results from these studies and partial evidence from our study collectively support the idea that the entorhinal is important for representing abstract knowledge (spatial and non-spatial).

      (2) Does the brain have a common representation of faces in a social space? In this study, participants don't need to have a map-like representation of six faces according to their levels of social traits. Instead, they can remember the values of each trait. The evidence of neural representations of the faces in a 2-dimensional social space is lacking. The authors argued that the relationship between the reaction times and the distances between faces provides evidence of the formation of internal representations. However, this can be found without the internal representation of the relationships between faces. If the authors seek internal representations of the faces in the brain, it would be important to show that this representation is not simply driven by perceptual differences between bar graphs that participants may recall in association with each face.

      Considering these caveats, it is hard for me to agree if the authors provide evidence to support their claims.

      With regard to the common representation of faces, this is a potential limitation of our paradigm because our current task design didn’t include a stage of face presentation to properly test this question. With regard to the asymmetry between the two dimensions in determining expected value. We think that the prerequisite for identifying six-fold grid-like coding is to have an abstract space formed by orthogonal dimensions, i.e., competence and trustworthiness in our task are not correlated. In addition, the scanner task does not require computation of expected value. However, we do think that it is worth investigating whether the extent to which each dimension contributes to decision-making and inference will distort the grid-like representation of the map. Our prediction is that the entorhinal cortex will maintain a representation of the map invariant to this aspect so that it can support inferences in different contexts where different weights may be assigned to different dimensions. But this will be an interesting hypothesis for future studies to test. We hope that our revision plans with above considerations could address the Reviewer’s comments.

      Reviewer #2 (Public Review)

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits of warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid.

      From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Thank you very much again for your careful evaluation and thoughtful comments. Please find our response to the comments below.

      Weaknesses:

      In various parts of this manuscript, the authors appear to use a variety of terms to refer to the (ostensibly) same neural regions: prefrontal cortex, frontal pole, ventromedial prefrontal cortex (vmPFC), and orbitofrontal cortex (OFC). It would be useful for the authors to use more consistent terminology to avoid confusing readers.

      Thanks for pointing out the use of terms, we will try to improve that in the revision of our manuscript.

      Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      On a conceptual level, it is not entirely clear how this work advances our understanding of gridlike encoding of two-dimensional abstract spaces, or of social cognition. The study design borrows heavily from Constantinescu et al. 2016, which is itself not an inherent weakness, but the Constantinescu et al. study already suggests that grid codes are likely to underlie two-dimensional spaces, no matter how abstract or arbitrary. If there were a hypothesis that there is something unique about how grid codes operate in the social domain, that would help motivate the search for social grid codes specifically, but no such theory is provided. The authors do note that warmth and competence likely have ecological importance as social traits, but other past studies have used slightly different social dimensions without any apparent loss of generality (e.g., Park et al. 2021). There are some (seemingly) exploratory analyses examining how individual difference measures like social anxiety and avoidance might affect the brain and behavior in this study, but a strong theoretical basis for examining these particular measures is lacking.

      We acknowledge that we used very similar dimensions to the work by Park et al. (2021). While Park and colleagues (2021) took a more innovative and rigorous approach, we tried to stay close to the original design by Constantinescu et al. (2016) with the hope that our work could provide, to some extent, a close replication of their result. Our data was collected before the 2021 paper came out and as the comment points out, we did not find as complete and convincing evidence as in these previous grid-like coding fMRI papers. This may be due to low signal quality in the medial temporal region, we are not entirely sure. But we don’t think our current findings can contradict or disprove previous findings in any way.

      I found it difficult to understand the analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. It is possible that I have misunderstood the authors' logic and/or methodology, but I do not feel comfortable commenting on the correctness or implications of this approach given the information provided in the current version of this manuscript.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis aims to examine if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and test if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait. For the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioral index of having better internal map-like representation.

      It was puzzling to see passing references to multivariate analyses using representational similarity analysis (RSA) in the main text, given that RSA is only used in analyses presented in the supplementary material.

      We speculate if RSA in entorhinal ROI would be more sensitive than the wholebrain univariate analysis to identify grid-like code because a previous paper on grid-like code in olfactory space (Bao et al., 2019) didn’t identify grid-like representation with univariate analysis but identified it with RSA analysis. However, we failed to find evidence of grid-like code in the entorhinal ROI aligned to its own putative grid orientation with the RSA approach. We reported this result in the main text to show that we carried out a relatively thorough investigation to test the hypothesis using various approaches and decided to add references to the RSA approach in the main text as well.

      Reviewer #3 (Public Review)

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes and is relatively well-powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in the entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably by Park et al., 2021, Nature Neuroscience.

      Thanks very much again for your careful evaluation and comments. Please find our response to the comments below.

      Below, I raise a few issues and questions on the evidence presented here for a grid-like code as the basis of navigating abstract social space or social knowledge.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid-like, i.e., show six-fold symmetry. In real-world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two-dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raising the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much for the references to the papers that we haven’t considered enough in our discussion. We will endeavour to discuss the topic in more depth in our revision. In summary, we raise this discussion point because various research groups have found gridlike representations in 2D artificial conceptual space. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      Data and analysis

      (2) Concerning the negative correlation of distance with activation in the fusiform gyrus and visual cortex: this is a slightly puzzling but potentially interesting finding. However, could this be related to reaction times? The larger the distance, the longer the reaction times, so the original finding might reflect larger activations with smaller distances.

      Thanks very much for the suggestion. However, we didn’t find a correlation between response time in the choice stage in the scanner task and the negative distance activation in the fusiform gyrus (Figures below). Meanwhile, the morph period in each trial remains the same, the negative correlation of distance with activation in the fusiform gyrus could also be interpreted as a positive correlation of morphing speed with activation in the fusiform gyrus. Indeed, stronger negative activation indicates larger activation for smaller distances, but we are uncertain what it indicates concerning the functional role of Fusiform in our current task.

      Author response image 1.

      (3) Concerning the correlation of grid-like activity with behavior: is the correlation with reaction time just about how long people took (rather than a task-related neural signal)? The authors have only reported correlations with reaction time. The issue here is that the duration of reaction times also relates to the starting positions of each trial and where participants will navigate to. Considering the speed-accuracy tradeoff, could performance accuracy be negatively correlated with these grid consistency metrics? Or it could be positively correlated, which would suggest the grid signal reflects a good representation of the task.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. The reaction time used to calculate the distance effect is from a task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioural index of having better internal map-like representation. This was the motivation behind this analysis.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science,352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron, 107(6), 1226-1238 e1228. https://doi.org/10.1016/j.neuron.2020.06.030

    1. Author Response

      The following is the authors’ response to the original reviews.

      We wish to thank the reviewers for their helpful insightful comments. Their concerns were mainly related to the interpretation of the data, help in clarifying our statements and improving our discussion.

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting study It involves the utilization of hippocampal neuronal cultures from syntaxin 1 knock-out mice. These cultures serve as a platform for monitoring changes in synaptic transmission through electrophysiological recording of postsynaptic currents, upon lentiviral infection with various isoforms, chimeras, and point mutations of syntaxins.

      The authors observe the following:

      (1) Syntaxin2 restores neuronal viability and can partially rescue Ca2+-evoked release in syntaxin1 knock-out neurons that it is much slower (cumulative charge transfer differences) and with a clearly smaller RRP than when rescued with syntaxin1. In contrast, syntaxin2-mediated rescue leads to a high increase in spontaneous release (Figure 1). Convincingly, the authors conclude that syntaxin 1 is optimized for fast phasic release and for clamping of spontaneous release, in comparison with syntaxin2.

      (2) The replacement of the SNARE domain (or its C-terminal part) of syntaxin1 by the SNARE domain of syntaxin2 (or its C-terminal part) rescues the fast kinetics, but not the amplitude, of Ca2+-evoked release. This is associated with a decrease in the size of the RRP and an increase in spontaneous release. The probability of vesicular release (PVR) is a little bit increased, which is intriguing because a little decrease would be expected instead according to the reduced RRP, indicating that an enhancement of Ca2-dependent fusion is occurring at the same time by unknown mechanisms as the authors properly point out. The replacement of the Analogous experiments in which the SNARE domain of syntaxin1 is replaced into syntaxin2, reveals the exitance of differential regulatory elements outside the SNARE domain.

      (3) Different constructs of syntaxin 1 and syntaxin 2 display different expression levels. On the other hand, the expression levels of Munc-18 are associated with the characteristics of the transfected specific syntaxin construct. In any case, the electrophysiological phenotypes cannot be consistently explained by changes in Munc-18.

      (4) Mutations in several residues of the outer surface of the C-terminal half of the syntaxin1 SNARE domain lead to alterations in the RRP and the frequency of spontaneous release, but the changes cannot attributed to a change in the net surface charge, because the alterations occur even in paired mutations in which electrical neutrality is conserved.

      Comments:

      (1) This is a comment regarding the interpretation of the results. In general, the decrease in the RRP size is associated with the increased frequency of spontaneous release due to unclamping. The authors claim that both phenomena seem to be independent of each other. In any case, how can the authors discard the possibility that the unclamping of spontaneous release leads to a decrease in the RRP size?

      The main argument against the reduction of the RRP being caused by the observed increase in the mEPSC frequency is based on kinetics of refilling and depletion. The average time a vesicle fuses spontaneously after it becomes primed is 500 – 1000 seconds (spontaneous vesicle release rate – STX1 Figure 1, Figure 2 and Figure 3). The time it takes to refill the RRP after depletion is in the order of 3 seconds (Rosenmund and Stevens, 1996). Therefore, the refilling of the RRP is more than 100 times faster. Even when the spontaneous release would increase 5 fold, this would lead to less than 5 % of the steady state depletion of the RRP.

      (2) The authors have analyzed the kinetics of mEPSCs and found differences (Fig2-Supp. Fig1; Fig2-Supp. Fig1). It would be interesting and pertinent to discuss these data in the context of potential phenotypes in the fusion pore kinetics involving syntaxin1 and syntaxin2 and their SNARE domains. Indeed, the figure will improve by including averaged traces of mEPSCs.

      We thank the reviewer for the idea. Upon closer examination of the changes in mEPSC rise time and mEPSC decay time we noticed a minor slowing in the mEPSC rise time from 0.443ms (SEM0.0067) of STX1A to 0.535ms (SEM0.0151) for STX1A-2(SNARE) or 0.507ms (SEM0.01251) for STX1A-2(Cter), while the mEPSC half widths did not change significantly. It is possible that the measured change is related to the detection algorithm as mEPSC detection at elevated frequencies becomes more difficult due to increased overlap of event, and we therefore prefer to refrain from making any mechanistic claims.

      Minor comments:

      (1) Fig2 J; Fig 3 J. It is difficult to distinguish between different colors and implementing a legend within the graph will be very helpful.

      (2) Fig3 H. Please change the color of the box plot for Stx1 A to improve the contrast with the individual data points.

      (3) Page 6. Line 225. "Figure 2D and E" should be corrected to "Figure 2C and D"

      (1) Colors were changed for clearer visualization. (2) Unfortunately, changing the color did not improve the contrast with the individual plots. However, the numerical data is all included in the data sheets of the corresponding figure. (3) The mistake was corrected.

      Reviewer #2 (Recommendations For The Authors):

      Line 135-136: Are cited numbers cited in the text mean and SEM? Please indicate.

      Line 139 and Figure 1G: The difference between purple and blue was very hard to see on my hard copy.

      Line 152: Reference to Figure 1L should probably be 1K.

      Line 183: Reference to Figure 2C should probably be Figure 2F.

      Line 225: Reference to Figure 2D and 2E should probably be 2C and 2D.

      Line 239: Reference to Figure 3I should probably be 3H.

      All typos were addressed and colors were changed for better visualization.

      Line 210-211: Sentence ("One of the benefits..") is hard to understand.

      Thank you for noticing this mistake, agreeably the the sentence did not add any important or new information and so it was deleted. Additionally, the message of the mentioned sentence was already clearly stated in lines 209-211.

      Figure 4E-H misses data for STX2, for the figure to be arranged like Figure 5.

      Given that STX1 is the endogenous syntaxin in hippocampal neurons, we use it at a control for all the analysis done in STX2 and STX2-chimera experimental groups, thus it is included in Figure 3 and 5.

      It appears that the authors do not present or discuss the Western Blot in Fig. 4D. Are the quantitative results of the Western Blot consistent with or different from the quantification of the immunostainings (Fig. 4B-C)? A similar question for Figure 5D, which also seems not to be presented.

      In terms of quantification, we have relied mainly on the ICC experiments because they test also for putative impairments in transport to the presynaptic compartment. Our WB data are overall consistent with the results, but were not used to quantitate expression of our syntaxin chimeras and mutations in the STX1-null hippocampal neuron model.

      Figure 6F-G: The normalization of spontaneous vesicular release rates is not clear, because the vesicular release rates already contain a normalization (mEPSC rate divided by RRP size). Is a further normalization of the STX1A condition informative? The authors should consider presenting the release rates themselves. In any case, the normalization should be presented/explained, at least in the legends.

      The reviewer is in principle correct. Due to the large number of experimental groups we had to perform recordings from multiple cultures, where not all experimental groups were present, while the WT STX1 was present as a consistent control. The reduce culture to culture variability, additional normalization to the WT control group was performed. However, we also included the raw data numerical values in the data-source sheets (Normalized and absolute), which produce a similar overall outcome.

      References to Figure 7 subpanels (A, B, and C) are missing.

      Thank you for the comment. We have integrated all panels into one for better representation and understanding since they are representative of one another.

      Lines 330-339 and Figure 7 in Discussion: the authors discuss that adding the non-cognate STX2 SNARE-domain to syntaxin-1 might destabilize the primed state and decrease the fusion energy barrier (as indicated in Figure 7C). What is the evidence that the decrease in RRP size is not caused solely by the depletion of the pool due to the increased spontaneous fusion?

      Please see the comments to major point 2 of reviewer 1.

      Statistics: Missing is the number of observations (n) for all data. Even if all data points are displayed, this should be stated.

      N numbers are included in the data sheets attached to each figure.

      The statement (start of Discussion,) that the SNARE-domain of STX1 'plays a minimal role in the regulation for Ca2+-evoked release' is somewhat puzzling, since without the SNARE-domain in STX1 there would be no Ca2+-evoked release. I guess these statements (similar statements are found elsewhere) are due to the interesting finding that STX2 leads to a decrease in release kinetics, compared to STX1, and this is not (entirely) due to differences in the SNARE-domain. I would suggest rephrasing the finding in terms of release kinetics. Also, the statement in the last sentence of the Abstract is not clear.

      Thank you for pointing this out and we agree that our experiments showed strong impact of the syntaxin isoform exchange on release kinetics and overall release output. A similar comment came also from reviewer #3 and so, we have addressed both comments as one.

      Our confusing statement resulted from the order of the presented results and our summarizing remarks for each section. Our statement reflected our finding that mutating residues in the C-terminal part of the STX1 SNARE motif affected only spontaneous release and RRP size but not release efficacy. We now state (pg. 6 lines 231-233) that the data observed from the comparison of “the results obtained from the Ca2+-evoked release between STX1 and STX2 support major regulatory differences of the domains outside of the SNARE domain between isoforms”.

      We have changed the abstract pg. 2 lines 55-56

      We have changed the introduction pg. 3 lines 102-105 for a better contextualization.

      We have changed the start of the discussion pg. 9 lines 250-252 for better contextualization.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, Salazar-Lázaro et al. presented interesting data that C-terminal half of the Syx1 SNARE domain is responsible for clamping of spontaneous release, stabilizing RRP, and also Ca2+-evoked release. The authors routinely utilized the chimeric approach to replace the SNARE domain of Syx1 with its paralogue Syx2 and analyzed the neuronal activity through electrophysiology. The data are straightforward and fruitful. The conclusions are partly reasonable. One obvious drawback is that they did not explore the underlying mechanism. I think it is easy for the authors to carry out some simple assays to verify their hypothesis for the mechanism, instead of just talking about it in the discussion section. In all, I appreciate the data presented in the manuscript. If the authors could supply more data on the mechanisms, this would be important research in the field. Some critical comments are listed below:

      We thank the reviewer for his/her comments and suggestions.

      Major comments:

      (1) In pg.3, lines 102-104, the authors stated that 'We found that the C-terminal half of the SNARE domain of STX1.. ..while it is minimally involved in the regulation of Ca2+-evoked release.' But in pg.5, lines 174-176, they wrote that 'Replacement of the full-SNARE domain (STX1A-2(SNARE)) or the C-terminal half (STX1A-2(Cter)) of the SNARE domain of STX1A with the same domain from STX2 resulted in a reduction in the EPSC amplitude (Figure 2B).' and in pg.5-6, lines 197-199, they wrote that 'Taken together our results suggest that the C-terminal half of the SNARE domain of STX1A is involved in the regulation of the efficacy of Ca2+-evoked release, the formation of the RRP and in the clamping of spontaneous release.' It puzzles me a lot as to what the authors are really trying to express for the relationship between C-half of the SNARE complex and Ca2+-evoked release (i.e., minimally involved or significantly participate in the process?). Please clarify and reorganize the contexts.

      Please see our reply to the last comment of reviewer 2.

      (2) Figure 1-figure supplement 1, the authors should analyze Syx1/VGlut1 level additionally. And, if possible, compare the difference between Syx1/VGlut1 and Syx2/VGlut1.

      The levels of STX1/VGlut1 and STX2/VGlut1 were analyzed in detail in Figures 4 and 5.

      The direct comparison between the expression levels of these two proteins is not possible since affinities of the antibodies to the target proteins are different and can induce potential biases. While this could be overcome by the use of a FLAG-tag to the syntaxin proteins, we have not utilized this approach in this publication. We in addition inferred sufficient and comparable expression of both syntaxins from their ability to rescue some of syntaxin1 loss of function phenotypes.

      (3) Figure 2D only analyzed the EPSC half-width, could the author alternatively analyze the rise/decay time? Also, in Figure 3-figure supplement 1, does it refer to the kinetic parameters of Syx2-1A in Figure 3? It is very confused.

      We have changed the text accordingly and each parameter is referenced to its corresponding figure for clarity. As for the decay and rise time of STX1 and STX1-chimeras, they are in Figure 2-figure supplement 1A and B.

      (4) On pg.4, lines 151-152, 'Finally, no change was observed in the paired-pulse ratio (PPR) between STX1A and STX2 groups (Figure 1L).' does not contain any explanations and comments for this observation in the texts.

      The small EPSC amplitudes and altered kinetics on the STX2 constricts (Figure 1 and Figure 3) have made it more difficult to quantitate paired pulse experiments. Therefore, we preferred not to overinterpret these measurements. The findings that the paired pulse data were not significantly different, fit with the vesicular release probability measurements which showed no major changes. We have made our statement on this basis.

      (5) On pg.6, lines 235-236, the authors wrote that 'Additionally, we found that only STX2-1A(SNARE) and STX2-1A(Cter) could rescue the RRP to around double of what we measured from STX2 and STX2-1A(Nter) (figure 3F)'. However, in Figure 3F, the authors indicated 'n.s.' (p>0.05) for the differences between STX2 and STX2-1A(SNARE)/STX2-1A(Cter). It is perplexing how the authors interpret their data. Definitely, the p-value could not be arbitrarily used as a criterion of difference. An easier way is that indicating the exact p-values for each comparison (indicate in figure legends or list in tables).

      We apologize for any confusion, and hope the modification gives more clarity in our interpretation. The calculated p-values are included in attached data source tables and hope this will provide clarity to our comparative analysis. We have changed the text in pg 7 lines 238-241 and are cautious to overinterpret these results and rely more on the data observed in STX1A-chimeras, which show significant changes in the RRP.

      (6) I noticed that the authors preferred using 'xx% increase/decrease' or 'xx-fold increase/decrease' to interpret their inter-group data. I would doubt whether the interpretations are appropriate. First, it seems that most of the individual scatters from one set were not subject to Gaussian distribution; also, the authors utilized non-parameter tests to compare the differences. Second, the authors did not explicitly indicate the method to calculate the % or fold, e.g., by comparing mean value or median. I think it is a bad choice to use the median to calculate fold changes; meanwhile, the mean value would also be biased, given the fact that the data were not Gaussian-distributed. The authors should be cautious in interpreting their data.

      We thank the reviewer for pointing the inaccuracy of our descriptions and have included the parameter used to calculated the percentage and fold increase/decrease in the materials and methods section. Specifically, the mean. Our intention is to plainly state the amount of change seen in a parameter based on the observed changes in the mean value. We agree with the reviewer that interpreting this could be problematic if we are speculating possible mechanisms. Further test should be conducted as to state whether similar increase/decrease changes in a parameter are due to the disturbance of the same mechanisms or different. E.g., we discussed whether the regulation of SYT1 might be or not be the mechanism affected in some of the chimeras that show an increase in the spontaneous release rate, for the release rate observed in some is massively higher than that seen in SYT1-KO (Bouazza-Arostegui et al., 2022). It is tempting to speculate that it could be due to other mechanisms based on the differences in the changes. For this reason, we have given an array of possible mechanisms affected when we manipulate the SNARE domain of STX1.

      (7) The authors routinely analyzed the levels of Munc18-1 in neuronal lysates by WB and Munc18-1/VGlut1 by immunofluorescence in various Syx1 mutants. However, in my view, these assays were slightly indirect. It is evident that the SNARE domain of Syx1 participates in the binding to Munc18-1 according to the atomic structures (pdb entries: 3C98 and 7UDB). Meanwhile, Han et al. reported that K46E mutation (located in domain 1 of Munc18-1) strongly impairs Syx1 expression, Syx1-interaction, vesicle docking and secretion (Han et al., 2011, PMID: 21900502). Intriguingly, the residue K46 of Munc18-1, which is close to D231/R232 of Syx1, may have potential electrostatic contacts to D231 and R232 of Syx1. This is reminiscent of the possibility that Syx1D231/R232 and some Syx1-2 chimeras lost their normal function through their defective binding to Munc18-1.nmb, To better understand the underlying mechanism, the authors may need to carry out in vivo and/or in vitro binding analysis between syntaxin mutants/chimeras and Munc18-1. They also need to conduct more discussions about the issue.

      We express our gratitude for the identification of a previously overlooked aspect in our investigation of the interplay between Munc18-1 and STX1. In response, we have incorporated additional discourse on this matter in pg11 lines 419-431.

      Additionally, we appreciate the thoughtful suggestion regarding additional experiments to further explore the molecular relationship between Munc18-1 and STX1. We agree that co-immunoprecipitation experiments (either by using an antibody against Munc18-1 or STX1 and STX2) would offer greater insight into whether the binding of these proteins is affected in the isoform or the mutants. Notably, we performed immunoprecipitation experiments by using neuronal lysates of the corresponding groups and using STX1A and STX2 antibodies for the pull-downs. However, we were unable to co-IP Munc18-1 when doing so. Changing the conditions of the experiment did not yield better results and so these experiments remained inconclusive for the moment. For this reason, we included it as an open question and a potential concluding hypothesis of the molecular mechanism. However, Shi et al., 2021, have performed co-IP assays using Munc18-1-wt and a mutant form which affects the binding to the C-terminal half of the SNARE domain of STX, and STX1-wt and a STX mutants targeting some of our residues of interest and showed a decrease in the pulled-down levels of Munc18-1 using HeLa cells. We have made sure to mention the conclusion of this important publication in our discussion.

      (8) The third possible mechanism (i.e., interaction with Syt1) proposed by the authors seems more reasonable. However, the discussions raised by the authors were not enough. For instance, plenty of literature has indicated that Syt1 may participate in synaptic vesicle priming through stabilizing partially or fully assembled SNARE complex (Li et al., 2017, PMID: 28860966; Bacaj et al., 2015, PMID: 26437117; Mohrmann et al., 2013, PMID: 24005294; Wang et al., 2011; PMID: 22184197; Liu et al., 2009, PMID: 19515907); complexins are also SNARE binding modules that regulate synaptic exocytosis. Lack of complexins could lead to unclasping of spontaneous fusion of synaptic vesicles, though it causes severe Ca2+-triggered release at the same time (Maximov et al., 2009, PMID: 19164751). Meanwhile, different domains of complexin may accomplish different steps of SV fusion, early research had indicated that the C-terminal sequence of complexin is selectively required for clamping of spontaneous fusion and priming but not for Ca2+-triggered release (Kaeser-Woo et al., 2012, PMID: 22357870). Likewise, if possible, the authors may need to carry out in vivo and/or in vitro binding analysis to confirm their hypothesis.

      The exploration of complexin´s involvement was limited in our study primarily due to our methodological focus on comprehending molecular mechanisms concerning the sequence disparities between STX1 and STX2. Our laboratory has studied the role of Complexin extensively, and we certainly have had a possible involvement in mind. However, since the sites identified on syntaxin are either conserved between STX1 and STX2 or not close to the central or accessory helical domains of complexin, we did not perform experiments to test putative interactions, and we refrained from discussing complexin in this paper.

      (9) Lastly, I would suspect that whether the defects of Syx2 and Syx1 chimeras were caused by the SNARE complex itself, from another point of view that is different from the hypothesis raised by the authors. Changing the outward residues (or we say the solvent-accessible residues) of the SNARE complex may affect the stability, assembly kinetics, and energetics (Wang and Ma, 2022, PMID: 35810329; Zorman et al., 2014, PMID: 25180101), especially for the C-terminal halves. Is this another possible mechanism through which the C-terminus of Syx1 might contribute to SV priming and clamping of spontaneous release? The authors should at least conduct some discussions about the point.

      Thank you for this suggestion. We indeed assumed that since the hydrophobic layers of the SNARE domains that form the hydrophobic pocket of STX2 and STX1 are mainly conserved, that the intrinsic stability of the SNARE complex is largely unchanged. Additionally, Li et al., (2022) PMID: 35810329 examined the stability of the alfa-helix structure of the SNARE domain of SNAP25. And while they found no changes in the stability and formation of the alfa-helix when mutating outwards-facing residues for methodological purposes (bimane-tryptophan quenching), their study did not selectively explore the effect of mutations of outer-surface residues on the stability of the alfa-helix.

      Zorman et al., (2014) PMID: 25180101, as noted by the reviewer, observed that changes in the sequence of the SNARE domain (by using SNARE proteins from different trafficking systems (neuron, GLUT4, yeast…) correlated with changes in the step-wise SNARE complex assembly. However, they also did not selectively mutate the outer solvent-accessible residues, hindering conclusive speculations in the contribution of said residues on the kinetics and energetics of assembly and intrinsic stability of the SNARE complex.

      Upon petition of the reviewer, we have added this paragraph to discuss an additional mechanism:

      “As a final remark, it is possible that the changes in the spontaneous release rate and the priming stability may stem from a reduced stability of the SNARE complex itself through putative interactions between outer surface residues. Studies of the kinetics of assembly of the SNARE complex which mutate solvent-accessible residues in the C-terminal half of the SNARE domain of SYB2 have shown reduction in the stability of the SNARE complex assembly and are correlated with impaired fusion (Jiao et al., 2018). However, STX1 mutations of outward residues were inconclusive and were always accompanied by hydrophobic layer mutations (Jiao et al., 2018), which affect the assembly kinetics and energetics of the SNARE complex (Ma et al., 2015). Single molecule optical-tweezer studies have focused on the impact of regulatory molecules on the stability of assembly such as Munc18-1 (Ma et al., 2015; Jiao et al., 2018) and complexin (Hao et al., 2023), or on the intrinsic stability of the hydrophobic layers in the step-wise assembly of the SNARE complex (Gao et al., 2012; Ma et al., 2015; Zhang et al., 2017). Although the conserved hydrophobic layers in the SNARE domains of STX1A and STX2 (Figure 1) suggest unchanged zippering and intrinsic stability of the complex, further studies addressing the contribution of surface residues on the stability of the alfa-helix structure of the SNARE domain of STX1 (Li et al., 2022) or the stability of the SNARE complex should be conducted.”

      Minor comments:

      (1) In pg.6, line 236, 'figure 3F', the initial 'f' should be uppercased.

      (3) On pg.11, line 396, the section title 'The interaction of the C-terminus of de SNARE domain of STX1A with Munc18-1 in the stabilization of the primed pool of vesicles.' The word 'de' is confusing, please check.

      (4) In pg.12, line 446, the section title, should 'though' be 'through'?

      These comments have been acknowledged and changed. Thank you

      (2) In pg.7, line 239, '..had an increased PVR (Figure 3G), no change in the release rate (Figure 3I)', should Figure 3I be Figure 3H? and line 240, 'and an increase in short-term depression during 10Hz train stimulation (Figure 3I)', should Figure 3I be Figure 3J? If so, Figure 3I will not be cited in the texts and lack adequate interpretations. Please check.

      We apologize for the oversight in not referencing this specific subpanel of the figure and have incorporated the reference in the text. Additionally, our interpretation of this data is connected to the mechanisms that govern efficacy of Ca2+-evoked response, and its dependence on the integrity of the entire-SNARE domain. We wish to highlight the modifications made to the discussion on the regulation of the Ca2+-evoked response based on previous reviewer comment #1, and a similar comment from reviewer #2 (as stated previously).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Visual Perceptual Learning (VPL) results in varying degrees of generalization to tasks or stimuli not seen during training. The question of which stimulus or task features predict whether learning will transfer to a different perceptual task has long been central in the field of perceptual learning, with numerous theories proposed to address it. This paper introduces a novel framework for understanding generalization in VPL, focusing on the form invariants of the training stimulus. Contrary to a previously proposed theory that task difficulty predicts the extent of generalization - suggesting that more challenging tasks yield less transfer to other tasks or stimuli - this paper offers an alternative perspective. It introduces the concept of task invariants and investigates how the structural stability of these invariants affects VPL and its generalization. The study finds that tasks with high-stability invariants are learned more quickly. However, training with low-stability invariants leads to greater generalization to tasks with higher stability, but not the reverse. This indicates that, at least based on the experiments in this paper, an easier training task results in less generalization, challenging previous theories that focus on task difficulty (or precision). Instead, this paper posits that the structural stability of stimulus or task invariants is the key factor in explaining VPL generalization across different tasks

      Strengths:

      • The paper effectively demonstrates that the difficulty of a perceptual task does not necessarily correlate with its learning generalization to other tasks, challenging previous theories in the field of Visual Perceptual Learning. Instead, it proposes a significant and novel approach, suggesting that the form invariants of training stimuli are more reliable predictors of learning generalization. The results consistently bolster this theory, underlining the role of invariant stability in forecasting the extent of VPL generalization across different tasks.

      • The experiments conducted in the study are thoughtfully designed and provide robust support for the central claim about the significance of form invariants in VPL generalization.

      Weaknesses:

      • The paper assumes a considerable familiarity with the Erlangen program and the definitions of invariants and their structural stability, potentially alienating readers who are not versed in these concepts. This assumption may hinder the understanding of the paper's theoretical rationale and the selection of stimuli for the experiments, particularly for those unfamiliar with the Erlangen program's application in psychophysics. A brief introduction to these key concepts would greatly enhance the paper's accessibility. The justification for the chosen stimuli and the design of the three experiments could be more thoroughly articulated.

      Response: We appreciate the reviewer's feedback regarding the accessibility of our paper. In response to this feedback, we plan to enhance the introduction section of our paper to provide a concise yet comprehensive overview of the key concepts of Erlangen program. Additionally, we will provide a more thorough justification for the selection of stimuli and the experimental design in our revised version, ensuring that readers understand the rationale behind our choices.

      • The paper does not clearly articulate how its proposed theory can be integrated with existing observations in the field of VPL. While it acknowledges previous theories on VPL generalization, the paper falls short in explaining how its framework might apply to classical tasks and stimuli that have been widely used in the VPL literature, such as orientation or motion discrimination with Gabors, vernier acuity, etc. It also does not provide insight into the application of this framework to more naturalistic tasks or stimuli. If the stability of invariants is a key factor in predicting a task's generalization potential, the paper should elucidate how to define the stability of new stimuli or tasks. This issue ties back to the earlier mentioned weakness: namely, the absence of a clear explanation of the Erlangen program and its relevant concepts.

      Response: Thanks for highlighting the need for better integration of our proposed theory with existing observations in the field of VPL. Unfortunately, the theoretical framework proposed in our study is based on the Klein’s Erlangen program and is only applicable to geometric shape stimuli. For VPL studies using stimuli and paradigms that are completely unrelated to geometric transformations (such as motion discrimination with Gabors or random dots, vernier acuity, spatial frequency discrimination, contrast detection or discrimination, etc.), our proposed theory does not apply. Some stimuli employed by VPL studies can be classified into certain geometric invariants. For instance, orientation discrimination with Gabors (Dosher & Lu, 2005) and texture discrimination task (F. Wang et al., 2016) both belong to tasks involving Euclidean invariants, and circle versus square discrimination (Kraft et al., 2010) belongs to tasks involving affine invariance. However, these studies do not simultaneously involve multiple geometric invariants of varying levels stability, and thus cannot be directly compared with our research. It is worth noting that while the Klein’s hierarchy of geometries, which our study focuses on, is rarely mentioned in the field of VPL, it does have connections with concepts such as 'global/local', 'coarse/fine', 'easy/difficulty', 'complex/simple': more stable invariants are closer to 'global', 'coarse', 'easy', 'complex', while less stable invariants are closer to 'local', 'fine', 'difficulty', 'simple'. Importantly, several VPL studies have found ‘fine-to-coarse’ or ‘local-to-global’ asymmetric transfer (Chang et al., 2014; N. Chen et al., 2016; Dosher & Lu, 2005), which seems consistent with the results of our study.

      In the introduction section of our revised version and subsequent full author response, we will provide a clear explanation of the Erlangen program and elucidate how to define the stability of new stimuli or tasks. In the discussion section of our revised version, we will compare our results to other studies concerned with the generalization of perceptual learning and speculate on how our proposed theory fit with existing observations in the field of VPL.

      • The paper does not convincingly establish the necessity of its introduced concept of invariant stability for interpreting the presented data. For instance, consider an alternative explanation: performing in the collinearity task requires orientation invariance. Therefore, it's straightforward that learning the collinearity task doesn't aid in performing the other two tasks (parallelism and orientation), which do require orientation estimation. Interestingly, orientation invariance is more characteristic of higher visual areas, which, consistent with the Reverse Hierarchy Theory, are engaged more rapidly in learning compared to lower visual areas. This simpler explanation, grounded in established concepts of VPL and the tuning properties of neurons across the visual cortex, can account for the observed effects, at least in one scenario. This approach has previously been used/proposed to explain VPL generalization, as seen in (Chowdhury and DeAngelis, Neuron, 2008), (Liu and Pack, Neuron, 2017), and (Bakhtiari et al., JoV, 2020). The question then is: how does the concept of invariant stability provide additional insights beyond this simpler explanation?

      Response: We appreciate the alternative explanation proposed by the reviewer and agree that it presents a valid perspective grounded in established concepts of VPL and neural tuning properties. However, performing in the collinearity and parallelism tasks both require orientation invariance. While utilizing the orientation invariance, as proposed by the reviewer, can explain the lack of transfer from collinearity or parallelism to orientation task, it cannot explain why collinearity does not transfer to parallelism.

      As stated in the response to the previous review, in the revised discussion section, we will compare our study with other studies (including the three papers mentioned by the reviewer), aiming to clarify the necessity of the concept of invariant stability for interpreting the observed data and understanding the mechanisms underlying VPL generalization.

      • While the paper discusses the transfer of learning between tasks with varying levels of invariant stability, the mechanism of this transfer within each invariant condition remains unclear. A more detailed analysis would involve keeping the invariant's stability constant while altering a feature of the stimulus in the test condition. For example, in the VPL literature, one of the primary methods for testing generalization is examining transfer to a new stimulus location. The paper does not address the expected outcomes of location transfer in relation to the stability of the invariant. Moreover, in the affine and Euclidean conditions one could maintain consistent orientations for the distractors and targets during training, then switch them in the testing phase to assess transfer within the same level of invariant structural stability.

      Response: Thanks for raising the issue regarding the mechanism of transfer within each invariant conditions. We plan to design an additional experiment that is similar in paradigm to Experiment 2, aiming to examine how VPL generalizes to a new test location within a single invariant stability level.

      • In the section detailing the modeling experiment using deep neural networks (DNN), the takeaway was unclear. While it was interesting to observe that the DNN exhibited a generalization pattern across conditions similar to that seen in the human experiments, the claim made in the abstract and introduction that the model provides a 'mechanistic' explanation for the phenomenon seems overstated. The pattern of weight changes across layers, as depicted in Figure 7, does not conclusively explain the observed variability in generalizations. Furthermore, the substantial weight change observed in the first two layers during the orientation discrimination task is somewhat counterintuitive. Given that neurons in early layers typically have smaller receptive fields and narrower tunings, one would expect this to result in less transfer, not more.

      Response: We appreciate the reviewer's feedback regarding the clarity of our DNN modeling experiment. We acknowledge that while DNNs have been demonstrated to serve as models for visual systems as well as VPL, the claim that the model provides a ‘mechanistic’ explanation for the phenomenon still overstated. In our revised version,

      We will attempt a more detailed analysis of the DNN model while providing a more explicit explanation of the findings from the DNN modeling experiment, emphasizing its implications for understanding the observed variability in generalizations.

      Additionally, the substantial weight change observed in the first two layers during the orientation discrimination task is not contradictory to the theoretical framework we proposed, instead, it aligns with our speculation regarding the neural mechanisms of VPL for geometric invariants. Specifically, it suggests that invariants with lower stability rely more on the plasticity of lower-level brain areas, thus exhibiting poorer generalization performance to new locations or stimulus features within each invariant conditions. However, it does not imply that their learning effects cannot transfer to invariants with higher stability.

      Reviewer #2 (Public Review):

      The strengths of this paper are clear: The authors are asking a novel question about geometric representation that would be relevant to a broad audience. Their question has a clear grounding in pre-existing mathematical concepts, that, to my knowledge, have been only minimally explored in cognitive science. Moreover, the data themselves are quite striking, such that my only concern would be that the data seem almost too clean. It is hard to know what to make of that, however. From one perspective, this is even more reason the results should be publicly available. Yet I am of the (perhaps unorthodox) opinion that reviewers should voice these gut reactions, even if it does not influence the evaluation otherwise. Below I offer some more concrete comments:

      (1) The justification for the designs is not well explained. The authors simply tell the audience in a single sentence that they test projective, affine, and Euclidean geometry. But despite my familiarity with these terms -- familiarity that many readers may not have -- I still had to pause for a very long time to make sense of how these considerations led to the stimuli that were created. I think the authors must, for a point that is so central to the paper, thoroughly explain exactly why the stimuli were designed the way that they were and how these designs map onto the theoretical constructs being tested.

      (2) I wondered if the design in Experiment 1 was flawed in one small but critical way. The goal of the parallelism stimuli, I gathered, was to have a set of items that is not parallel to the other set of items. But in doing that, isn't the manipulation effectively the same as the manipulation in the orientation stimuli? Both functionally involve just rotating one set by a fixed amount. (Note: This does not seem to be a problem in Experiment 2, in which the conditions are more clearly delineated.)

      (3) I wondered if the results would hold up for stimuli that were more diverse. It seems that a determined experimenter could easily design an "adversarial" version of these experiments for which the results would be unlikely to replicate. For instance: In the orientation group in Experiment 1, what if the odd-one-out was rotated 90 degrees instead of 180 degrees? Intuitively, it seems like this trial type would now be much easier, and the pattern observed here would not hold up. If it did hold up, that would provide stronger support for the authors' theory.

      It is not enough, in my opinion, to simply have some confirmatory evidence of this theory. One would have to have thoroughly tested many possible ways that theory could fail. I'm unsure that enough has been done here to convince me that these ideas would hold up across a more diverse set of stimuli.

      Response: (1) We appreciate the reviewer’s feedback regarding the justification for our experimental designs. We recognize the importance of thoroughly explaining how our stimuli were designed and how these designs correspond to the theoretical constructs being tested. In our revised version, we will enhance the introduction of Erlangen program and provide a more detailed explanation of the rationale behind our stimulus designs, aiming to enhance the clarity and transparency of our experimental approach for readers who may not be familiar with these concepts.

      (2) We appreciate the reviewer’s insight into the design of Experiment 1 and the concern regarding the potential similarity between the parallelism and orientation stimuli manipulations.

      The parallelism and orientation stimuli in Experiment 1 were first used by Olson & Attneave (1970) to support line-based models of shape coding and then adapted to measure the relative salience of different geometric properties (Chen, 1986). In the parallelism stimuli, the odd quadrant differs from the rest in line slope, while in the orientation stimuli, in contrast, the odd quadrant contains exactly the same line segments as the rest but differs in direction pointed by the angles. The result, that the odd quadrant was detected much faster in the parallelism stimuli than in the orientation stimuli, can serve as evidence for line-based models of shape coding. However, according to Chen (1986, 2005), the idea of invariants over transformations suggests a new analysis of the data: in the parallelism stimuli, the fact that line segments share the same slope essentially implies that they are parallel, and the discrimination may be actually based on parallelism. Thus, the faster discrimination of the parallelism stimuli than that of the orientation stimuli may be explained in terms of relative superiority of parallelism over orientation of angles—a Euclidean property.

      The group of stimuli in Experiment 1 has been employed by several studies to investigate scientific questions related to the Klein’s hierarchy of geometries (L. Chen, 2005; Meng et al., 2019; B. Wang et al., n.d.). Due to historical inheritance, we adopted this set of stimuli and corresponding paradigm, despite their imperfect design.

      (3) Thanks for raising the important issue of stimulus diversity and the potential for "adversarial" versions of the experiments to challenge our findings. We acknowledge the validity of your concern and recognize the need to demonstrate the robustness of our results across a range of stimuli. We plan to design additional experiments to investigate the potential implications of varying stimulus characteristics, such as different rotation angles proposed by the reviewer, on the observed patterns of performance.

    1. Author Response

      Reviewer #1 (Public Review):

      This study used a multi-day learning paradigm combined with fMRI to reveal neural changes reflecting the learning of new (arbitrary) shape-sound associations. In the scanner, the shapes and sounds are presented separately and together, both before and after learning. When they are presented together, they can be either consistent or inconsistent with the learned associations. The analyses focus on auditory and visual cortices, as well as the object-selective cortex (LOC) and anterior temporal lobe regions (temporal pole (TP) and perirhinal cortex (PRC)). Results revealed several learning-induced changes, particularly in the anterior temporal lobe regions. First, the LOC and PRC showed a reduced bias to shapes vs sounds (presented separately) after learning. Second, the TP responded more strongly to incongruent than congruent shape-sound pairs after learning. Third, the similarity of TP activity patterns to sounds and shapes (presented separately) was increased for non-matching shape-sound comparisons after learning. Fourth, when comparing the pattern similarity of individual features to combined shape-sound stimuli, the PRC showed a reduced bias towards visual features after learning. Finally, comparing patterns to combined shape-sound stimuli before and after learning revealed a reduced (and negative) similarity for incongruent combinations in PRC. These results are all interpreted as evidence for an explicit integrative code of newly learned multimodal objects, in which the whole is different from the sum of the parts.

      The study has many strengths. It addresses a fundamental question that is of broad interest, the learning paradigm is well-designed and controlled, and the stimuli are real 3D stimuli that participants interact with. The manuscript is well written and the figures are very informative, clearly illustrating the analyses performed.

      There are also some weaknesses. The sample size (N=17) is small for detecting the subtle effects of learning. Most of the statistical analyses are not corrected for multiple comparisons (ROIs), and the specificity of the key results to specific regions is also not tested. Furthermore, the evidence for an integrative representation is rather indirect, and alternative interpretations for these results are not considered.

      We thank the reviewer for their careful reading and the positive comments on our manuscript. As suggested, we have conducted additional analyses of theoretically-motivated ROIs and have found that temporal pole and perirhinal cortex are the only regions to show the key experience-dependent transformations. We are much more cautious with respect to multiple comparisons, and have removed a series of post hoc across-ROI comparisons that were irrelevant to the key questions of the present manuscript. The revised manuscript now includes much more discussion about alternative interpretations as suggested by the reviewer (and also by the other reviewers).

      Additionally, we looked into scanning more participants, but our scanner has since had a full upgrade and the sequence used in the current study is no longer supported by our scanner. However, we note that while most analyses contain 17 participants, we employed a within-subject learning design that is not typically used in fMRI experiments and increases our power to detect an effect. This is supported by the robust effect size of the behavioural data, whereby 17 out of 18 participants revealed a learning effect (Cohen’s D = 1.28) and which was replicated in a follow-up experiment with a larger sample size.

      We address the other reviewer comments point-by-point in the below.

      Reviewer #2 (Public Review):

      Li et al. used a four-day fMRI design to investigate how unimodal feature information is combined, integrated, or abstracted to form a multimodal object representation. The experimental question is of great interest and understanding how the human brain combines featural information to form complex representations is relevant for a wide range of researchers in neuroscience, cognitive science, and AI. While most fMRI research on object representations is limited to visual information, the authors examined how visual and auditory information is integrated to form a multimodal object representation. The experimental design is elegant and clever. Three visual shapes and three auditory sounds were used as the unimodal features; the visual shapes were used to create 3D-printed objects. On Day 1, the participants interacted with the 3D objects to learn the visual features, but the objects were not paired with the auditory features, which were played separately. On Day 2, participants were scanned with fMRI while they were exposed to the unimodal visual and auditory features as well as pairs of visual-auditory cues. On Day 3, participants again interacted with the 3D objects but now each was paired with one of the three sounds that played from an internal speaker. On Day 4, participants completed the same fMRI scanning runs they completed on Day 2, except now some visual-auditory feature pairs corresponded with Congruent (learned) objects, and some with Incongruent (unlearned) objects. Using the same fMRI design on Days 2 and 4 enables a well-controlled comparison between feature- and object-evoked neural representations before and after learning. The notable results corresponded to findings in the perirhinal cortex and temporal pole. The authors report (1) that a visual bias on Day 2 for unimodal features in the perirhinal cortex was attenuated after learning on Day 4, (2) a decreased univariate response to congruent vs. incongruent visual-auditory objects in the temporal pole on Day 4, (3) decreased pattern similarity between congruent vs. incongruent pairs of visual and auditory unimodal features in the temporal pole on Day 4, (4) in the perirhinal cortex, visual unimodal features on Day 2 do not correlate with their respective visual-auditory objects on Day 4, and (5) in the perirhinal cortex, multimodal object representations across Days 2 and 4 are uncorrelated for congruent objects and anticorrelated for incongruent. The authors claim that each of these results supports the theory that multimodal objects are represented in an "explicit integrative" code separate from feature representations. While these data are valuable and the results are interesting, the authors' claims are not well supported by their findings.

      We thank the reviewer for the careful reading of our manuscript and positive comments. Overall, we now stay closer to the data when describing the results and provide our interpretation of these results in the discussion section while remaining open to alternative interpretations (as also suggested by Reviewer 1).

      (1) In the introduction, the authors contrast two theories: (a) multimodal objects are represented in the co-activation of unimodal features, and (b) multimodal objects are represented in an explicit integrative code such that the whole is different than the sum of its parts. However, the distinction between these two theories is not straightforward. An explanation of what is precisely meant by "explicit" and "integrative" would clarify the authors' theoretical stance. Perhaps we can assume that an "explicit" representation is a new representation that is created to represent a multimodal object. What is meant by "integrative" is more ambiguous-unimodal features could be integrated within a representation in a manner that preserves the decodability of the unimodal features, or alternatively the multimodal representation could be completely abstracted away from the constituent features such that the features are no longer decodable. Even if the object representation is "explicit" and distinct from the unimodal feature representations, it can in theory still contain featural information, though perhaps warped or transformed. The authors do not clearly commit to a degree of featural abstraction in their theory of "explicit integrative" multimodal object representations which makes it difficult to assess the validity of their claims.

      Due to its ambiguity, we removed the term “explicit” and now make it clear that our central question was whether crossmodal object representations require only unimodal feature-level representations (e.g., frogs are created from only the combination of shape and sound) or whether crossmodal object representations also rely on an integrative code distinct from the unimodal features (e.g., there is something more to “frog” than its original shape and sound). We now clarify this in the revised manuscript.

      “One theoretical view from the cognitive sciences suggests that crossmodal objects are built from component unimodal features represented across distributed sensory regions.8 Under this view, when a child thinks about “frog”, the visual cortex represents the appearance of the shape of the frog whereas the auditory cortex represents the croaking sound. Alternatively, other theoretical views predict that multisensory objects are not only built from their component unimodal sensory features, but that there is also a crossmodal integrative code that is different from the sum of these parts.9,10,11,12,13 These latter views propose that anterior temporal lobe structures can act as a polymodal “hub” that combines separate features into integrated wholes.9,11,14,15” – pg. 4

      For this reason, we designed our paradigm to equate the unimodal representations, such that neural differences between the congruent and incongruent conditions provide evidence for a crossmodal integrative code different from the unimodal features (because the unimodal features are equated by default in the design).

      “Critically, our four-day learning task allowed us to isolate any neural activity associated with integrative coding in anterior temporal lobe structures that emerges with experience and differs from the neural patterns recorded at baseline. The learned and non-learned crossmodal objects were constructed from the same set of three validated shape and sound features, ensuring that factors such as familiarity with the unimodal features, subjective similarity, and feature identity were tightly controlled (Figure 2). If the mind represented crossmodal objects entirely as the reactivation of unimodal shapes and sounds (i.e., objects are constructed from their parts), then there should be no difference between the learned and non-learned objects (because they were created from the same three shapes and sounds). By contrast, if the mind represented crossmodal objects as something over and above their component features (i.e., representations for crossmodal objects rely on integrative coding that is different from the sum of their parts), then there should be behavioral and neural differences between learned and non-learned crossmodal objects (because the only difference across the objects is the learned relationship between the parts). Furthermore, this design allowed us to determine the relationship between the object representation acquired after crossmodal learning and the unimodal feature representations acquired before crossmodal learning. That is, we could examine whether learning led to abstraction of the object representations such that it no longer resembled the unimodal feature representations.” – pg. 5

      Furthermore, we agree with the reviewer that our definition and methodological design does not directly capture the structure of the integrative code. With experience, the unimodal feature representations may be completely abstracted away, warped, or changed in a nonlinear transformation. We suggest that crossmodal learning forms an integrative code that is different from the original unimodal representations in the anterior temporal lobes, however, we agree that future work is needed to more directly capture the structure of the integrative code that emerges with experience.

      “In our task, participants had to differentiate congruent and incongruent objects constructed from the same three shape and sound features (Figure 2). An efficient way to solve this task would be to form distinct object-level outputs from the overlapping unimodal feature-level inputs such that congruent objects are made to be orthogonal from the representations before learning (i.e., measured as pattern similarity equal to 0 in the perirhinal cortex; Figure 5b, 6, Supplemental Figure S5), whereas non-learned incongruent objects could be made to be dissimilar from the representations before learning (i.e., anticorrelation, measured as patten similarity less than 0 in the perirhinal cortex; Figure 6). Because our paradigm could decouple neural responses to the learned object representations (on Day 4) from the original component unimodal features at baseline (on Day 2), these results could be taken as evidence of pattern separation in the human perirhinal cortex.11,12 However, our pattern of results could also be explained by other types of crossmodal integrative coding. For example, incongruent object representations may be less stable than congruent object representations, such that incongruent objects representation are warped to a greater extent than congruent objects (Figure 6).” – pg. 18

      “As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation.” – pg. 18

      (2) After participants learned the multimodal objects, the authors report a decreased univariate response to congruent visual-auditory objects relative to incongruent objects in the temporal pole. This is claimed to support the existence of an explicit, integrative code for multimodal objects. Given the number of alternative explanations for this finding, this claim seems unwarranted. A simpler interpretation of these results is that the temporal pole is responding to the novelty of the incongruent visual-auditory objects. If there is in fact an explicit, integrative multimodal object representation in the temporal pole, it is unclear why this would manifest in a decreased univariate response.

      We thank the reviewer for identifying this issue. Our behavioural design controls unimodal feature-level novelty but allows object-level novelty to differ. Thus, neural differences between the congruent and incongruent conditions reflects sensitivity to the object-level differences between the combination of shape and sound. However, we agree that there are multiple interpretations regarding the nature of how the integrative code is structured in the temporal pole and perirhinal cortex. We have removed the interpretation highlighted by the reviewer from the results. Instead, we now provide our preferred interpretation in the discussion, while acknowledging the other possibilities that the reviewer mentions.

      As one possibility, these results in temporal pole may reflect “conceptual combination”. “hummingbird” – a congruent pairing – may require less neural resources than an incongruent pairing such as “bark-frog”.

      “Furthermore, these distinct anterior temporal lobe structures may be involved with integrative coding in different ways. For example, the crossmodal object representations measured after learning were found to be related to the component unimodal feature representations measured before learning in the temporal pole but not the perirhinal cortex (Figure 5, 6, Supplemental Figure S5). Moreover, pattern similarity for congruent shape-sound pairs were lower than the pattern similarity for incongruent shape-sound pairs after crossmodal learning in the temporal pole but not the perirhinal cortex (Figure 4b, Supplemental Figure S3a). As one interpretation of this pattern of results, the temporal pole may represent new crossmodal objects by combining previously learned knowledge. 8,9,10,11,13,14,15,33 Specifically, research into conceptual combination has linked the anterior temporal lobes to compound object concepts such as “hummingbird”.34,35,36 For example, participants during our task may have represented the sound-based “humming” concept and visually-based “bird” concept on Day 1, forming the crossmodal “hummingbird” concept on Day 3; Figure 1, 2, which may recruit less activity in temporal pole than an incongruent pairing such as “barking-frog”. For these reasons, the temporal pole may form a crossmodal object code based on pre-existing knowledge, resulting in reduced neural activity (Figure 3d) and pattern similarity towards features associated with learned objects (Figure 4b).”– pg. 18

      (3) The authors ran a neural pattern similarity analysis on the unimodal features before and after multimodal object learning. They found that the similarity between visual and auditory features that composed congruent objects decreased in the temporal pole after multimodal object learning. This was interpreted to reflect an explicit integrative code for multimodal objects, though it is not clear why. First, behavioral data show that participants reported increased similarity between the visual and auditory unimodal features within congruent objects after learning, the opposite of what was found in the temporal pole. Second, it is unclear why an analysis of the unimodal features would be interpreted to reflect the nature of the multimodal object representations. Since the same features corresponded with both congruent and incongruent objects, the nature of the feature representations cannot be interpreted to reflect the nature of the object representations per se. Third, using unimodal feature representations to make claims about object representations seems to contradict the theoretical claim that explicit, integrative object representations are distinct from unimodal features. If the learned multimodal object representation exists separately from the unimodal feature representations, there is no reason why the unimodal features themselves would be influenced by the formation of the object representation. Instead, these results seem to more strongly support the theory that multimodal object learning results in a transformation or warping of feature space.

      We apologize for the lack of clarity. We have now overhauled this aspect of our manuscript in an attempt to better highlight key aspects of our experimental design. In particular, because the unimodal features composing the congruent and incongruent objects were equated, neural differences between these conditions would provide evidence for an experience-dependent crossmodal integrative code that is different from its component unimodal features.

      Related to the second and third points, we were looking at the extent to which the original unimodal representations change with crossmodal learning. Before crossmodal learning, we found that the perirhinal cortex tracked the similarity between the individual visual shape features and the crossmodal objects that were composed of those visual shapes – however, there was no evidence that perirhinal cortex was tracking the unimodal sound features on those crossmodal objects. After crossmodal learning, we see that this visual shape bias in perirhinal cortex was no longer present – that is, the representation in perirhinal cortex started to look less like the visual features that comprise the objects. Thus, crossmodal learning transformed the perirhinal representations so that they were no longer predominantly grounded in a single visual modality, which may be a mechanism by which object concepts gain their abstraction. We have now tried to be clearer about this interpretation throughout the paper.

      Notably, we suggest that experience may change both the crossmodal object representations, as well as the unimodal feature representations. For example, we have previously shown that unimodal visual features are influenced by experience in parallel with the representation of the conjunction (e.g., Liang et al., 2020; Cerebral Cortex). Nevertheless, we remain open to the myriad possible structures of the integrative code that might emerge with experience.

      We now clarify these points throughout the manuscript. For example:

      “We then examined whether the original representations would change after participants learned how the features were paired together to make specific crossmodal objects, conducting the same analysis described above after crossmodal learning had taken place (Figure 5b). With this analysis, we sought to measure the relationship between the representation for the learned crossmodal object and the original baseline representation for the unimodal features. More specifically, the voxel-wise activity for unimodal feature runs before crossmodal learning was correlated to the voxel-wise activity for crossmodal object runs after crossmodal learning (Figure 5b). Another linear mixed model which included modality as a fixed factor within each ROI revealed that the perirhinal cortex was no longer biased towards visual shape after crossmodal learning (F1,32 = 0.12, p = 0.73), whereas the temporal pole, LOC, V1, and A1 remained biased towards either visual shape or sound (F1,30-32 between 16.20 and 73.42, all p < 0.001, η2 between 0.35 and 0.70).” – pg. 14

      “To investigate this effect in perirhinal cortex more specifically, we conducted a linear mixed model to directly compare the change in the visual bias of perirhinal representations from before crossmodal learning to after crossmodal learning (green regions in Figure 5a vs. 5b). Specifically, the linear mixed model included learning day (before vs. after crossmodal learning) and modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object). Results revealed a significant interaction between learning day and modality in the perirhinal cortex (F1,775 = 5.56, p = 0.019, η2 = 0.071), meaning that the baseline visual shape bias observed in perirhinal cortex (green region of Figure 5a) was significantly attenuated with experience (green region of Figure 5b). After crossmodal learning, a given shape no longer invoked significant pattern similarity between objects that had the same shape but differed in terms of what they sounded like. Taken together, these results suggest that prior to learning the crossmodal objects, the perirhinal cortex had a default bias toward representing the visual shape information and was not representing sound information of the crossmodal objects. After crossmodal learning, however, the visual shape bias in perirhinal cortex was no longer present. That is, with crossmodal learning, the representations within perirhinal cortex started to look less like the visual features that comprised the crossmodal objects, providing evidence that the perirhinal representations were no longer predominantly grounded in the visual modality.” – pg. 13

      “Importantly, the initial visual shape bias observed in the perirhinal cortex was attenuated by experience (Figure 5, Supplemental Figure S5), suggesting that the perirhinal representations had become abstracted and were no longer predominantly grounded in a single modality after crossmodal learning. One possibility may be that the perirhinal cortex is by default visually driven as an extension to the ventral visual stream,10,11,12 but can act as a polymodal “hub” region for additional crossmodal input following learning.” – pg. 19

      (4) The most compelling evidence the authors provide for their theoretical claims is the finding that, in the perirhinal cortex, the unimodal feature representations on Day 2 do not correlate with the multimodal objects they comprise on Day 4. This suggests that the learned multimodal object representations are not combinations of their unimodal features. If unimodal features are not decodable within the congruent object representations, this would support the authors' explicit integrative hypothesis. However, the analyses provided do not go all the way in convincing the reader of this claim. First, the analyses reported do not differentiate between congruent and incongruent objects. If this result in the perirhinal cortex reflects the formation of new multimodal object representations, it should only be true for congruent objects but not incongruent objects. Since the analyses combine congruent and incongruent objects it is not possible to know whether this was the case. Second, just because feature representations on Day 2 do not correlate with multimodal object patterns on Day 4 does not mean that the object representations on Day 4 do not contain featural information. This could be directly tested by correlating feature representations on Day 4 with congruent vs. incongruent object representations on Day 4. It could be that representations in the perirhinal cortex are not stable over time and all representations-including unimodal feature representations-shift between sessions, which could explain these results yet not entail the existence of abstracted object representations.

      We thank the reviewer for this suggestion and have conducted the two additional analyses. Specifically, we split the congruent and incongruent conditions and also investigated correlations between unimodal representations on Day 4 with crossmodal object representations on Day 4. There was no significant interaction between modality and congruency in any ROI across or within learning days. One possible explanation for these findings is that both congruent and incongruent crossmodal objects are represented differently from their underlying unimodal features, and all of these representations can transform with experience.

      However, the new analyses also revealed that perirhinal cortex was the only region without a modality-specific bias after crossmodal learning (e.g., Day 4 Unimodal Feature runs x Day 4 Crossmodal Object runs; now shown in Supplemental Figure S5). Overall, these results are consistent with the notion of a crossmodal integrative code in perirhinal cortex that has changed with experience and is different from the component unimodal features. Nevertheless, we explore alternative interpretations for how the crossmodal code emerges with experience in the discussion.

      “To examine whether these results differed by congruency (i.e., whether any modality-specific biases differed as a function of whether the object was congruent or incongruent), we conducted exploratory linear mixed models for each of the five a priori ROIs across learning days. More specifically, we correlated: 1) the voxel-wise activity for Unimodal Feature Runs before crossmodal learning to the voxel-wise activity for Crossmodal Object Runs before crossmodal learning (Day 2 vs. Day 2), 2) the voxel-wise activity for Unimodal Feature Runs before crossmodal learning to the voxel-wise activity for Crossmodal Object Runs after crossmodal learning (Day 2 vs Day 4), and 3) the voxel-wise activity for Unimodal Feature Runs after crossmodal learning to the voxel-wise activity for Crossmodal Object Runs after crossmodal learning (Day 4 vs Day 4). For each of the three analyses described, we then conducted separate linear mixed models which included modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object) and congruency (congruent vs. incongruent)….There was no significant relationship between modality and congruency in any ROI between Day 2 and Day 2 (F1,346-368 between 0.00 and 1.06, p between 0.30 and 0.99), between Day 2 and Day 4 (F1,346-368 between 0.021 and 0.91, p between 0.34 and 0.89), or between Day 4 and Day 4 (F1,346-368 between 0.01 and 3.05, p between 0.082 and 0.93). However, exploratory analyses revealed that perirhinal cortex was the only region without a modality-specific bias and where the unimodal feature runs were not significantly correlated to the crossmodal object runs after crossmodal learning (Supplemental Figure S5).” – pg. 14

      “Taken together, the overall pattern of results suggests that representations of the crossmodal objects in perirhinal cortex were heavily influenced by their consistent visual features before crossmodal learning. However, the crossmodal object representations were no longer influenced by the component visual features after crossmodal learning (Figure 5, Supplemental Figure S5). Additional exploratory analyses did not find evidence of experience-dependent changes in the hippocampus or inferior parietal lobes (Supplemental Figure S4c-e).” – pg. 14

      “The voxel-wise matrix for Unimodal Feature runs on Day 4 were correlated to the voxel-wise matrix for Crossmodal Object runs on Day 4 (see Figure 5 in the main text for an example). We compared the average pattern similarity (z-transformed Pearson correlation) between shape (blue) and sound (orange) features specifically after crossmodal learning. Consistent with Figure 5b, perirhinal cortex was the only region without a modality-specific bias. Furthermore, perirhinal cortex was the only region where the representations of both the visual and sound features were not significantly correlated to the crossmodal objects. By contrast, every other region maintained a modality-specific bias for either the visual or sound features. These results suggest that perirhinal cortex representations were transformed with experience, such that the initial visual shape representations (Figure 5a) were no longer grounded in a single modality after crossmodal learning. Furthermore, these results suggest that crossmodal learning formed an integrative code different from the unimodal features in perirhinal cortex, as the visual and sound features were not significantly correlated with the crossmodal objects. * p < 0.05, ** p < 0.01, *** p < 0.001. Horizontal lines within brain regions indicate a significant main effect of modality. Vertical asterisks denote pattern similarity comparisons relative to 0.” – Supplemental Figure S5

      “We found that the temporal pole and perirhinal cortex – two anterior temporal lobe structures – came to represent new crossmodal object concepts with learning, such that the acquired crossmodal object representations were different from the representation of the constituent unimodal features (Figure 5, 6). Intriguingly, the perirhinal cortex was by default biased towards visual shape, but that this initial visual bias was attenuated with experience (Figure 3c, 5, Supplemental Figure S5). Within the perirhinal cortex, the acquired crossmodal object concepts (measured after crossmodal learning) became less similar to their original component unimodal features (measured at baseline before crossmodal learning); Figure 5, 6, Supplemental Figure S5. This is consistent with the idea that object representations in perirhinal cortex integrate the component sensory features into a whole that is different from the sum of the component parts, which might be a mechanism by which object concepts obtain their abstraction…. As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation.” – pg. 18

      In sum, the authors have collected a fantastic dataset that has the potential to answer questions about the formation of multimodal object representations in the brain. A more precise delineation of different theoretical accounts and additional analyses are needed to provide convincing support for the theory that “explicit integrative” multimodal object representations are formed during learning.

      We thank the reviewer for the positive comments and helpful feedback. We hope that our changes to our wording and clarifications to our methodology now more clearly supports the central goal of our study: to find evidence of crossmodal integrative coding different from the original unimodal feature parts in anterior temporal lobe structures. We furthermore agree that future research is needed to delineate the structure of the integrative code that emerges with experience in the anterior temporal lobes.

      Reviewer #3 (Public Review):

      This paper uses behavior and functional brain imaging to understand how neural and cognitive representations of visual and auditory stimuli change as participants learn associations among them. Prior work suggests that areas in the anterior temporal (ATL) and perirhinal cortex play an important role in learning/representing cross-modal associations, but the hypothesis has not been directly tested by evaluating behavior and functional imaging before and after learning cross- modal associations. The results show that such learning changes both the perceived similarities amongst stimuli and the neural responses generated within ATL and perirhinal regions, providing novel support for the view that cross-modal learning leads to a representational change in these regions.

      This work has several strengths. It tackles an important question for current theories of object representation in the mind and brain in a novel and quite direct fashion, by studying how these representations change with cross-modal learning. As the authors note, little work has directly assessed representational change in ATL following such learning, despite the widespread view that ATL is critical for such representation. Indeed, such direct assessment poses several methodological challenges, which the authors have met with an ingenious experimental design. The experiment allows the authors to maintain tight control over both the familiarity and the perceived similarities amongst the shapes and sounds that comprise their stimuli so that the observed changes across sessions must reflect learned cross-modal associations among these. I especially appreciated the creation of physical objects that participants can explore and the approach to learning in which shapes and sounds are initially experienced independently and later in an associated fashion. In using multi-echo MRI to resolve signals in ventral ATL, the authors have minimized a key challenge facing much work in this area (namely the poor SNR yielded by standard acquisition sequences in ventral ATL). The use of both univariate and multivariate techniques was well-motivated and helpful in testing the central questions. The manuscript is, for the most part, clearly written, and nicely connects the current work to important questions in two literatures, specifically (1) the hypothesized role of the perirhinal cortex in representing/learning complex conjunctions of features and (2) the tension between purely embodied approaches to semantic representation vs the view that ATL regions encode important amodal/crossmodal structure.

      There are some places in the manuscript that would benefit from further explanation and methodological detail. I also had some questions about the results themselves and what they signify about the roles of ATL and the perirhinal cortex in object representation.

      We thank the reviewer for their positive feedback and address the comments in the below point-by-point responses.

      (A) I found the terms "features" and "objects" to be confusing as used throughout the manuscript, and sometimes inconsistent. I think by "features" the authors mean the shape and sound stimuli in their experiment. I think by "object" the authors usually mean the conjunction of a shape with a sound---for instance, when a shape and sound are simultaneously experienced in the scanner, or when the participant presses a button on the shape and hears the sound. The confusion comes partly because shapes are often described as being composed of features, not features in and of themselves. (The same is sometimes true of sounds). So when reading "features" I kept thinking the paper referred to the elements that went together to comprise a shape. It also comes from ambiguous use of the word object, which might refer to (a) the 3D- printed item that people play with, which is an object, or (b) a visually-presented shape (for instance, the localizer involved comparing an "object" to a "phase-scrambled" stimulus---here I assume "object" refers to an intact visual stimulus and not the joint presentation of visual and auditory items). I think the design, stimuli, and results would be easier for a naive reader to follow if the authors used the terms "unimodal representation" to refer to cases where only visual or auditory input is presented, and "cross-modal" or "conjoint" representation when both are present.

      We thank the reviewer for this suggestion and agree. We have replaced the terms “features” and “objects” with “unimodal” and “crossmodal” in the title, text, and figures throughout the manuscript for consistency (i.e., “crossmodal binding problem”). To simplify the terminology, we have also removed the localizer results.

      (B) There are a few places where I wasn't sure what exactly was done, and where the methods lacked sufficient detail for another scientist to replicate what was done. Specifically:

      (1) The behavioral study assessing perceptual similarity between visual and auditory stimuli was unclear. The procedure, stimuli, number of trials, etc, should be explained in sufficient detail in methods to allow replication. The results of the study should also minimally be reported in the supplementary information. Without an understanding of how these studies were carried out, it was very difficult to understand the observed pattern of behavioral change. For instance, I initially thought separate behavioral blocks were carried out for visual versus auditory stimuli, each presented in isolation; however, the effects contrast congruent and incongruent stimuli, which suggests these decisions must have been made for the conjoint presentation of both modalities. I'm still not sure how this worked. Additionally, the manuscript makes a brief mention that similarity judgments were made in the context of "all stimuli," but I didn't understand what that meant. Similarity ratings are hugely sensitive to the contrast set with which items appear, so clarity on these points is pretty important. A strength of the design is the contention that shape and sound stimuli were psychophysically matched, so it is important to show the reader how this was done and what the results were.

      We agree and apologize for the lack of sufficient detail in the original manuscript. We now include much more detail about the similarity rating task. The methodology and results of the behavioral rating experiments are now shown in Supplemental Figure S1. In Figure S1a, the similarity ratings are visualized on a multidimensional scaling plot. The triangular geometry for shape (blue) and sound (red) indicate that the subjective similarity was equated within each unimodal feature across individual participants. Quantitatively, there was no difference in similarity between the congruent and incongruent pairings in Figure S1b and Figure S1c prior to crossmodal learning. In addition to providing more information on these methods in the Supplemental Information, we also now provide a more detailed description of the task in the manuscript itself. For convenience, we reproduce these sections below.

      “Pairwise Similarity Task. Using the same task as the stimulus validation procedure (Supplemental Figure S1a), participants provided similarity ratings for all combinations of the 3 validated shapes and 3 validated sounds (each of the six features were rated in the context of every other feature in the set, with 4 repeats of the same feature, for a total of 72 trials). More specifically, three stimuli were displayed on each trial, with one at the top and two at the bottom of the screen in the same procedure as we have used previously27. The 3D shapes were visually displayed as a photo, whereas sounds were displayed on screen in a box that could be played over headphones when clicked with the mouse. The participant made an initial judgment by selecting the more similar stimulus on the bottom relative to the stimulus on the top. Afterwards, the participant made a similarity rating between each bottom stimulus with the top stimulus from 0 being no similarity to 5 being identical. This procedure ensured that ratings were made relative to all other stimuli in the set.”– pg. 28

      “Pairwise similarity task and results. In the initial stimulus validation experiment, participants provided pairwise ratings for 5 sounds and 3 shapes. The shapes were equated in their subjective similarity that had been selected from a well-characterized perceptually uniform stimulus space27 and the pairwise ratings followed the same procedure as described in ref 27. Based on this initial experiment, we then selected the 3 sounds from the that were most closely equated in their subjective similarity. (a) 3D-printed shapes were displayed as images, whereas sounds were displayed in a box that could be played when clicked by the participant. Ratings were averaged to produce a similarity matrix for each participant, and then averaged to produce a group-level similarity matrix. Shown as triangular representational geometries recovered from multidimensional scaling in the above, shapes (blue) and sounds (orange) were approximately equated in their subjective similarity. These features were then used in the four-day crossmodal learning task. (b) Behavioral results from the four-day crossmodal learning task paired with multi-echo fMRI described in the main text. Before crossmodal learning, there was no difference in similarity between shape and sound features associated with congruent objects compared to incongruent objects – indicating that similarity was controlled at the unimodal feature-level. After crossmodal learning, we observed a robust shift in the magnitude of similarity. The shape and sound features associated with congruent objects were now significantly more similar than the same shape and sound features associated with incongruent objects (p < 0.001), evidence that crossmodal learning changed how participants experienced the unimodal features (observed in 17/18 participants). (c) We replicated this learning-related shift in pattern similarity with a larger sample size (n = 44; observed in 38/44 participants). *** denotes p < 0.001. Horizontal lines denote the comparison of congruent vs. incongruent conditions. – Supplemental Figure S1

      (2) The experiences through which participants learned/experienced the shapes and sounds were unclear. The methods mention that they had one minute to explore/palpate each shape and that these experiences were interleaved with other tasks, but it is not clear what the other tasks were, how many such exploration experiences occurred, or how long the total learning time was. The manuscript also mentions that participants learn the shape-sound associations with 100% accuracy but it isn't clear how that was assessed. These details are important partly b/c it seems like very minimal experience to change neural representations in the cortex.

      We apologize for the lack of detail and agree with the reviewer’s suggestions – we now include much more information in the methods section. Each behavioral day required about 1 hour of total time to complete, and indeed, participants rapidly learned their associations with minimal experience. For example:

      “Behavioral Tasks. On each behavioral day (Day 1 and Day 3; Figure 2), participants completed the following tasks, in this order: Exploration Phase, one Unimodal Feature 1-back run (26 trials), Exploration Phase, one Crossmodal 1-back run (26 trials), Exploration Phase, Pairwise Similarity Task (24 trials), Exploration Phase, Pairwise Similarity Task (24 trials), Exploration Phase, Pairwise Similarity Task (24 trials), and finally, Exploration Phase. To verify learning on Day 3, participants also additionally completed a Learning Verification Task at the end of the session. – pg. 27

      “The overall procedure ensured that participants extensively explored the unimodal features on Day 1 and the crossmodal objects on Day 3. The Unimodal Feature and the Crossmodal Object 1-back runs administered on Day 1 and Day 3 served as practice for the neuroimaging sessions on Day 2 and Day 4, during which these 1-back tasks were completed. Each behavioral session required less than 1 hour of total time to complete.” – pg. 27

      “Learning Verification Task (Day 3 only). As the final task on Day 3, participants completed a task to ensure that participants successfully formed their crossmodal pairing. All three shapes and sounds were randomly displayed in 6 boxes on a display. Photos of the 3D shapes were shown, and sounds were played by clicking the box with the mouse cursor. The participant was cued with either a shape or sound, and then selected the corresponding paired feature. At the end of Day 3, we found that all participants reached 100% accuracy on this task (10 trials).” – pg. 29

      (3) I didn't understand the similarity metric used in the multivariate imaging analyses. The manuscript mentions Z-scored Pearson's r, but I didn't know if this meant (a) many Pearson coefficients were computed and these were then Z-scored, so that 0 indicates a value equal to the mean Pearson correlation and 1 is equal to the standard deviation of the correlations, or (b) whether a Fisher Z transform was applied to each r (so that 0 means r was also around 0). From the interpretation of some results, I think the latter is the approach taken, but in general, it would be helpful to see, in Methods or Supplementary information, exactly how similarity scores were computed, and why that approach was adopted. This is particularly important since it is hard to understand the direction of some key effects.

      The reviewer is correct that the Fisher Z transform was applied to each individual r before averaging the correlations. This approach is generally recommended when averaging correlations (see Corey, Dunlap, & Burke, 1998). We are now clearer on this point in the manuscript:

      “The z-transformed Pearson’s correlation coefficient was used as the distance metric for all pattern similarity analyses. More specifically, each individual Pearson correlation was Fisher z-transformed and then averaged (see 61).” – pg. 32

      (C) From Figure 3D, the temporal pole mask appears to exclude the anterior fusiform cortex (or the ventral surface of the ATL generally). If so, this is a shame, since that appears to be the locus most important to cross-modal integration in the "hub and spokes" model of semantic representation in the brain. The observation in the paper that the perirhinal cortex seems initially biased toward visual structure while more superior ATL is biased toward auditory structure appears generally consistent with the "graded hub" view expressed, for instance, in our group's 2017 review paper (Lambon Ralph et al., Nature Reviews Neuroscience). The balance of visual- versus auditory-sensitivity in that work appears balanced in the anterior fusiform, just a little lateral to the anterior perirhinal cortex. It would be helpful to know if the same pattern is observed for this area specifically in the current dataset.

      We thank the reviewer for this suggestion. After close inspection of Lambon Ralph et al. (2017), we believe that our perirhinal cortex mask appears to be overlapping with the ventral ATL/anterior fusiform region that the reviewer mentions. See Author response image 1 for a visual comparison:

      Author response image 1.

      The top four figures are sampled from Lambon Ralph et al (2017), whereas the bottom two figures visualize our perirhinal cortex mask (white) and temporal pole mask (dark green) relative to the fusiform cortex. The ROIs visualized were defined from the Harvard-Oxford atlas.

      We now mention this area of overlap in our manuscript and link it to the hub and spokes model:

      “Notably, our perirhinal cortex mask overlaps with a key region of the ventral anterior temporal lobe thought to be the central locus of crossmodal integration in the “hub and spokes” model of semantic representations.9,50 – pg. 20

      (D) While most effects seem robust from the information presented, I'm not so sure about the analysis of the perirhinal cortex shown in Figure 5. This compares (I think) the neural similarity evoked by a unimodal stimulus ("feature") to that evoked by the same stimulus when paired with its congruent stimulus in the other modality ("object"). These similarities show an interaction with modality prior to cross-modal association, but no interaction afterward, leading the authors to suggest that the perirhinal cortex has become less biased toward visual structure following learning. But the plots in Figures 4a and b are shown against different scales on the y-axes, obscuring the fact that all of the similarities are smaller in the after-learning comparison. Since the perirhinal interaction was already the smallest effect in the pre-learning analysis, it isn't really surprising that it drops below significance when all the effects diminish in the second comparison. A more rigorous test would assess the reliability of the interaction of comparison (pre- or post-learning) with modality. The possibility that perirhinal representations become less "visual" following cross-modal learning is potentially important so a post hoc contrast of that kind would be helpful.

      We apologize for the lack of clarity. We conducted a linear mixed model to assess the interaction between modality and crossmodal learning day (before and after crossmodal learning) in the perirhinal cortex as described by the reviewer. The critical interaction was significant, which is now clarified in the text as well as in the rescaled figure plots.

      “To investigate this effect in perirhinal cortex more specifically, we conducted a linear mixed model to directly compare the change in the visual bias of perirhinal representations from before crossmodal learning to after crossmodal learning (green regions in Figure 5a vs. 5b). Specifically, the linear mixed model included learning day (before vs. after crossmodal learning) and modality (visual feature match to crossmodal object vs. sound feature match to crossmodal object). Results revealed a significant interaction between learning day and modality in the perirhinal cortex (F1,775 = 5.56, p = 0.019, η2 = 0.071), meaning that the baseline visual shape bias observed in perirhinal cortex (green region of Figure 5a) was significantly attenuated with experience (green region of Figure 5b). After crossmodal learning, a given shape no longer invoked significant pattern similarity between objects that had the same shape but differed in terms of what they sounded like. Taken together, these results suggest that prior to learning the crossmodal objects, the perirhinal cortex had a default bias toward representing the visual shape information and was not representing sound information of the crossmodal objects. After crossmodal learning, however, the visual shape bias in perirhinal cortex was no longer present. That is, with crossmodal learning, the representations within perirhinal cortex started to look less like the visual features that comprised the crossmodal objects, providing evidence that the perirhinal representations were no longer predominantly grounded in the visual modality.” – pg. 13

      We note that not all effects drop in Figure 5b (even in regions with a similar numerical pattern similarity to PRC, like the hippocampus – also see Supplemental Figure S5 for a comparison for patterns only on Day 4), suggesting that the change in visual bias in PRC is not simply due to noise.

      “Importantly, the change in pattern similarity in the perirhinal cortex across learning days (Figure 5) is unlikely to be driven by noise, poor alignment of patterns across sessions, or generally reduced responses. Other regions with numerically similar pattern similarity to perirhinal cortex did not change across learning days (e.g., visual features x crossmodal objects in A1 in Figure 5; the exploratory ROI hippocampus with numerically similar pattern similarity to perirhinal cortex also did not change in Supplemental Figure S4c-d).” – pg. 14

      (E) Is there a reason the authors did not look at representation and change in the hippocampus? As a rapid-learning, widely-connected feature-binding mechanism, and given the fairly minimal amount of learning experience, it seems like the hippocampus would be a key area of potential import for the cross-modal association. It also looks as though the hippocampus is implicated in the localizer scan (Figure 3c).

      We thank the reviewer for this suggestion and now include additional analyses for the hippocampus. We found no evidence of crossmodal integrative coding different from the unimodal features. Rather, the hippocampus seems to represent the convergence of unimodal features, as evidenced by …[can you give some pithy description for what is meant by “convergence” vs “integration”?]. We provide these results in the Supplemental Information and describe them in the main text:

      “Analyses for the hippocampus (HPC) and inferior parietal lobe (IPL). (a) In the visual vs. auditory univariate analysis, there was no visual or sound bias in HPC, but there was a bias towards sounds that increased numerically after crossmodal learning in the IPL. (b) Pattern similarity analyses between unimodal features associated with congruent objects and incongruent objects. Similar to Supplemental Figure S3, there was no main effect of congruency in either region. (c) When we looked at the pattern similarity between Unimodal Feature runs on Day 2 to Crossmodal Object runs on Day 2, we found that there was significant pattern similarity when there was a match between the unimodal feature and the crossmodal object (e.g., pattern similarity > 0). This pattern of results held when (d) correlating the Unimodal Feature runs on Day 2 to Crossmodal Object runs on Day 4, and (e) correlating the Unimodal Feature runs on Day 4 to Crossmodal Object runs on Day 4. Finally, (f) there was no significant pattern similarity between Crossmodal Object runs before learning correlated to Crossmodal Object after learning in HPC, but there was significant pattern similarity in IPL (p < 0.001). Taken together, these results suggest that both HPC and IPL are sensitive to visual and sound content, as the (c, d, e) unimodal feature-level representations were correlated to the crossmodal object representations irrespective of learning day. However, there was no difference between congruent and incongruent pairings in any analysis, suggesting that HPC and IPL did not represent crossmodal objects differently from the component unimodal features. For these reasons, HPC and IPL may represent the convergence of unimodal feature representations (i.e., because HPC and IPL were sensitive to both visual and sound features), but our results do not seem to support these regions in forming crossmodal integrative coding distinct from the unimodal features (i.e., because representations in HPC and IPL did not differentiate the congruent and incongruent conditions and did not change with experience). * p < 0.05, ** p < 0.01, *** p < 0.001. Asterisks above or below bars indicate a significant difference from zero. Horizontal lines within brain regions in (a) reflect an interaction between modality and learning day, whereas horizontal lines within brain regions in reflect main effects of (b) learning day, (c-e) modality, or (f) congruency.” – Supplemental Figure S4.

      “Notably, our perirhinal cortex mask overlaps with a key region of the ventral anterior temporal lobe thought to be the central locus of crossmodal integration in the “hub and spokes” model of semantic representations.9,50 However, additional work has also linked other brain regions to the convergence of unimodal representations, such as the hippocampus51,52,53 and inferior parietal lobes.54,55 This past work on the hippocampus and inferior parietal lobe does not necessarily address the crossmodal binding problem that was the main focus of our present study, as previous findings often do not differentiate between crossmodal integrative coding and the convergence of unimodal feature representations per se. Furthermore, previous studies in the literature typically do not control for stimulus-based factors such as experience with unimodal features, subjective similarity, or feature identity that may complicate the interpretation of results when determining regions important for crossmodal integration. Indeed, we found evidence consistent with the convergence of unimodal feature-based representations in both the hippocampus and inferior parietal lobes (Supplemental Figure S4), but no evidence of crossmodal integrative coding different from the unimodal features. The hippocampus and inferior parietal lobes were both sensitive to visual and sound features before and after crossmodal learning (see Supplemental Figure S4c-e). Yet the hippocampus and inferior parietal lobes did not differentiate between the congruent and incongruent conditions or change with experience (see Supplemental Figure S4).” – pg. 20

      (F) The direction of the neural effects was difficult to track and understand. I think the key observation is that TP and PRh both show changes related to cross-modal congruency - but still it would be helpful if the authors could articulate, perhaps via a schematic illustration, how they think representations in each key area are changing with the cross-modal association. Why does the temporal pole come to activate less for congruent than incongruent stimuli (Figure 3)? And why do TP responses grow less similar to one another for congruent relative to incongruent stimuli after learning (Figure 4)? Why are incongruent stimulus similarities anticorrelated in their perirhinal responses following cross-modal learning (Figure 6)?

      We thank the author for identifying this issue, which was also raised by the other reviewers. The reviewer is correct that the key observation is that the TP and PRC both show changes related to crossmodal congruency (given that the unimodal features were equated in the methodological design). However, the structure of the integrative code is less clear, which we now emphasize in the main text. Our findings provide evidence of a crossmodal integrative code that is different from the unimodal features, and future studies are needed to better understand the structure of how such a code might emerge. We now more clearly highlight this distinction throughout the paper:

      “By contrast, perirhinal cortex may be involved in pattern separation following crossmodal experience. In our task, participants had to differentiate congruent and incongruent objects constructed from the same three shape and sound features (Figure 2). An efficient way to solve this task would be to form distinct object-level outputs from the overlapping unimodal feature-level inputs such that congruent objects are made to be orthogonal from the representations before learning (i.e., measured as pattern similarity equal to 0 in the perirhinal cortex; Figure 5b, 6, Supplemental Figure S5), whereas non-learned incongruent objects could be made to be dissimilar from the representations before learning (i.e., anticorrelation, measured as patten similarity less than 0 in the perirhinal cortex; Figure 6). Because our paradigm could decouple neural responses to the learned object representations (on Day 4) from the original component unimodal features at baseline (on Day 2), these results could be taken as evidence of pattern separation in the human perirhinal cortex.11,12 However, our pattern of results could also be explained by other types of crossmodal integrative coding. For example, incongruent object representations may be less stable than congruent object representations, such that incongruent objects representation are warped to a greater extent than congruent objects (Figure 6).” – pg. 18

      “As one solution to the crossmodal binding problem, we suggest that the temporal pole and perirhinal cortex form unique crossmodal object representations that are different from the distributed features in sensory cortex (Figure 4, 5, 6, Supplemental Figure S5). However, the nature by which the integrative code is structured and formed in the temporal pole and perirhinal cortex following crossmodal experience – such as through transformations, warping, or other factors – is an open question and an important area for future investigation. Furthermore, these anterior temporal lobe structures may be involved with integrative coding in different ways. For example, the crossmodal object representations measured after learning were found to be related to the component unimodal feature representations measured before learning in the temporal pole but not the perirhinal cortex (Figure 5, 6, Supplemental Figure S5). Moreover, pattern similarity for congruent shape-sound pairs were lower than the pattern similarity for incongruent shape-sound pairs after crossmodal learning in the temporal pole but not the perirhinal cortex (Figure 4b, Supplemental Figure S3a). As one interpretation of this pattern of results, the temporal pole may represent new crossmodal objects by combining previously learned knowledge. 8,9,10,11,13,14,15,33 Specifically, research into conceptual combination has linked the anterior temporal lobes to compound object concepts such as “hummingbird”.34,35,36 For example, participants during our task may have represented the sound-based “humming” concept and visually-based “bird” concept on Day 1, forming the crossmodal “hummingbird” concept on Day 3; Figure 1, 2, which may recruit less activity in temporal pole than an incongruent pairing such as “barking-frog”. For these reasons, the temporal pole may form a crossmodal object code based on pre-existing knowledge, resulting in reduced neural activity (Figure 3d) and pattern similarity towards features associated with learned objects (Figure 4b).” – pg. 18

      This work represents a key step in our advancing understanding of object representations in the brain. The experimental design provides a useful template for studying neural change related to the cross-modal association that may prove useful to others in the field. Given the broad variety of open questions and potential alternative analyses, an open dataset from this study would also likely be a considerable contribution to the field.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Comment 1.1: “Did the UKB or HCHS datasets have information on accurate markers of insulin resistance, such as HbA1c or HOMA-IR (if fasting glucose was not available)? Looking at that data would allow us to determine the contribution of insulin resistance to the observed cortical phenotype.”

      Reply 1.1: We appreciate the insightful suggestion from the reviewer. In response, we incorporated the HbA1c into our analysis, enhancing its sensitivity to potential effects of insulin resistance. Subsequently, our analysis was reperformed, integrating HbA1c alongside non-fasting blood glucose in the PLS. This addition did not alter our main results, i.e., that of the PLS, virtual histology, and network contextualization analysis. Notably, as a result of the inclusion of HbA1c, the second latent variable now accounted for a greater shared variance (22.13%), with HbA1c showing the highest loading among MetS component variables. The manuscript has been thoroughly revised to incorporate these results.

      Comments 1.2: “(Results, p.13, 291-292) "A correlation matrix relating all considered MetS component measures is displayed in supplementary figure S12. Please clarify in this figure labels whether this was non-fasting glucose. If this is non-fasting glucose, it is not a MetS-related risk factor. The reader might be misled into thinking that fasting-glucose has a weak correlation, while its contribution (and the effect of insulin resistance) was not studied here.”

      “Table S8 and Table S9: Is the glucose metric here measured following fasting? If not, this should not be listed as a metabolic syndrome criterion. Or it should be specified that it isn't fasted glucose, otherwise, it sounds misleading.”

      Reply 1.2: We thank the reviewer for bringing this ambiguity to our attention. The initial analysis included only non-fasting plasma glucose in the PLS, as fasting plasma glucose data was unavailable for UKB and HCHS participants. Following your suggestion in reply 1.1, we have now incorporated HbA1c, a more indicative marker of insulin resistance. We retained non-fasting blood glucose in our analysis, recognizing its relevance as a diagnostic variable for type 2 diabetes mellitus, although it is less informative than fasting plasma glucose, HbA1c, or HOMA-IR. This decision is substantiated by the significant correlation found between non-fasting plasma glucose and HbA1c in our sample (r=.49).

      To enhance clarity, we have revised the methods section to explicitly mention that the study investigates non-fasting blood glucose. The revised sentence reads: “Here, we related regional cortical thickness and subcortical volumes to clinical measurements of MetS components, i.e., obesity (waist circumference, hip circumference, waist-hip ratio, body mass index), arterial hypertension (systolic blood pressure, diastolic blood pressure), dyslipidemia (high density lipoprotein, low density lipoprotein, total cholesterol, triglycerides) and insulin resistance (HbA1c, non-fasting blood glucose).”

      Additionally, we have updated the caption of supplementary figure S13 (formerly supplementary figure S12) to clearly indicate the investigation of non-fasting plasma glucose. The table detailing diagnostic MetS criteria (supplementary table S2) has also been amended to clarify the absence of fasting plasma glucose data in our study and to indicate that only data on antidiabetic therapy and diagnosis of type 2 diabetes mellitus were used as criteria for insulin resistance in the case-control analysis.

      Comment 1.3: “I do not understand how the authors can claim there is a deterministic relationship there if all the results are only correlational or comparative. Can the differences in functional connectivity and white matter fiber tracts observed not be caused by the changes in cortices they relate to? How can the authors be sure the network organisation is shaping the cortical effects and not the opposite (the cortical changes influence the network organisation)? This should be further discussed or explained.”

      Reply 1.3: We agree with the reviewer's comment on the non-causative nature of our data and have accordingly revised the discussion section to reflect a more cautious interpretation of our findings. We have carefully reframed our language to avoid any implications of causality, ensuring the narrative aligns with the correlational nature of our data. Nevertheless, we believe that exploring causal interpretations can offer valuable clinical insights. Therefore, while moderating our language, we have maintained certain speculative discussions regarding potential causative pathomechanistic pathways.

      Comment 1.4: “The hippocampus is also an area where changes have consistently been observed. Why did the authors limit their analysis to the cortex.”

      Reply 1.4: We appreciate this reviewer comment. In response, we have added volumes of Melbourne Subcortical Atlas parcels (including the hippocampus) to the analysis. Corresponding results are now shown in figure 2. The subcortical bootstrap ratios indicated that higher MetS severity was related to lower volumes across all investigated subcortical structures.

      Comment 1.5: “Which field ID of the UK biobank are the measures referring to? If possible, please specify the Field ID for each of the UKB metrics used in the study.”

      Reply 1.5: We thank the reviewer for the recommendation. The Field IDs used in our study are now listed in supplementary figure S1.

      Comment 1.6: “Several Figures were wrongly annotated, making it hard to follow the text.”

      Reply 1.6: Thank you for bringing the annotation issues to our awareness. We have thoroughly edited all annotations which should now correctly reference the figure content.

      Reviewer 2

      Comment 2.1: “Do the authors have the chance to see how the pattern relates to changes in cognitive function in the UKBB and possibly HCHS? This could help to provide some evidence about the directionality of the effect.” Reply 2.1: Thank you for your suggestion. We acknowledge the potential value of investigating gray matter morphometric data alongside longitudinal information on cognitive function. Although we concur with the significance of this approach, we are constrained by the ongoing processing of the UKB's imaging follow-up data and the pending release of the HCHS follow-up data. Consequently, our current analysis cannot incorporate this aspect for now. We plan to explore the relationship between MetS, cognition and brain morphology using longitudinal data as soon as it becomes available.

      Comment 2.2: “Also, you could project new data onto the component and establish a link with cognition in a third sample which would be even more convincing. I can offer LIFE-Adult study for this aim.”

      Reply 2.2: We are grateful for your recommendation to enhance our study's robustness by including a third sample to establish a cognitive link. While we recognize the merit of such a sensitivity analysis, we believe that our current dataset, derived from two large, independent cohorts, is sufficiently comprehensive for the scope of our current analysis. However, we are open to considering this approach in future studies and appreciate your offer of the LIFE-Adult study. We would welcome further conversation with you regarding future joint projects.

      Comment 2.3: “The sentences (p.17, ll.435 ff) seem to repeat: "Interestingly, we also observed a positive relationship between cortical thickness and MetS in the superior frontal, parietal and occipital lobe. Interpretation of this result is, however, less intuitive. We also noted a positive MetS-cortical thickness association in superior frontal, parietal and occipital lobes, a less intuitive finding that has been previously reported [60,61].”

      Reply 2.3: Thank you for making us aware of this duplication. We have deleted the first part of the section. It now reads “We also noted a positive MetS-cortical thickness association in superior frontal, parietal and occipital lobes, a less intuitive finding that has been previously reported.”

      Comment 2.4: “I would highly appreciate empirical evidence for the claim in ll. 442 "In support of this hypothesis, the determined cortical thickness abnormality pattern is consistent with the atrophy pattern found in vascular mild cognitive impairment and vascular dementia" Considering the previous reports about the co-localization of obesity-associated atrophy and AD neurodegeneration (Morys et al. 2023, DOI: 10.3233/JAD-220535), that most dementias are mixed and that MetS probably increases dementia risk through both AD and vascular mechanisms, I feel such "binary" claims on VaD/AD-related atrophy patterns should be backed up empirically.”

      Reply 2.4: Thank you for highlighting the need for clarity in differentiating between vascular and Alzheimer's dementia. We recognize the intricate overlap in dementia pathologies. Acknowledging the prevalence of mixed dementia and the influence of MetS on both AD and vascular mechanisms, we realize our original statement might have implied a specificity to vascular dementia, which was not intended.

      To address your concern, we have revised our statement to avoid an exclusive focus on vascular pathology, ensuring a more balanced representation of dementia types. Additionally, we have included Morys et al. 2023 as a reference. The section now reads: “In support of this hypothesis, the determined brain morphological abnormality pattern is consistent with the atrophy pattern found in vascular mild cognitive impairment, vascular dementia and Alzheimer’s dementia.”

      Comment 2.5: “I wonder how specific the cell-type results are to this covariance pattern. Maybe patterns of CT (independent of MetS) show similar associations with one or more of the reported celltypes? Would it be possible to additionally show the association of the first three components of general cortical thickness variation with the cell type densities?”

      Reply 2.5: Thank you for your query regarding the specificity of the cell-type results to the observed covariance pattern. To address this, we have conducted a virtual histology analysis of the first three latent variables of the main analysis PLS. The findings of this extended analysis have been detailed in the supplementary Figure S21. The imaging covariance profile of latent variable 2 was significantly associated with the density of excitatory neurons of subtype 3. The imaging covariance profile linked to latent variable 3 showed no significant association of cell type densities. Possibly, latent variable 3 represents only a noise component as it explained only 2.12% of shared variance. We hope this addition provides a clearer understanding of the specificity of our main results.

      Comment 2.6: “I agree that this multivariate approach can contribute to a more holistic understanding, yet I would like to see the discussion expanded on how to move on from here. Should we target the MetS more comprehensively or would it be best to focus on obesity (being the strongest contributor and risk factor for other "downstream" conditions such as T2DM)? A holistic approach is somewhat at odds with the in-depth investigation of specific mechanisms.”

      Reply 2.6: We value your suggestion to elaborate on the implications of our findings. Our study indicates that obesity may have the most pronounced impact on brain morphology among MetS components, suggesting it as a key contributor to the clinical-anatomical covariance pattern observed in our analysis. This highlights obesity as a primary target for future research and preventive strategies. However, we believe that our results warrant further validation, ideally through longitudinal studies, before drawing definitive clinical conclusions.

      Additionally, our study endorses a comprehensive approach to MetS, highlighting the importance of considering the syndrome as a whole to gain broader insights. We want to clarify, however, that such an approach is meant to complement, rather than replace, the study of individual cardiometabolic risk factors. The broad perspective our study adopts is facilitated by its epidemiological nature, which may not be as applicable in experimental settings that are vital for deriving mechanistic disease insights.

      To reflect these points, we have expanded the discussion in our manuscript to include a more detailed consideration of these implications and future research directions.

      Comment 2.7: “Please report the number of missing variables.”

      Reply 2.7: Thank you for your request to report the number of missing variables. We would like to direct your attention to table 1, where we have listed the number of available values for each variable in parentheses. To determine the number of missing variables, one can subtract these numbers from the total sample size.

      Comment 2.8: “Was the pattern similar in pre-clinical (pre-diabetes, pre-hypertension) vs. clinical conditions?“

      Reply 2.8: Thank you for your interest in the applicability of our findings across different MetS severity levels. Our analysis employs a continuous framework to encompass the entire range of vascular and cardiometabolic risks, including those only mildly affected by MetS. The linear relationship we observed between MetS severity and gray matter morphology patterns, as illustrated in Figure 2d, supports the interpretation that our findings apply to the entire spectrum of MetS severities.

      Comment 2.9: “How did you deal with medication (anti-hypertensive, anti-diabetic, statins..)?”

      Reply 2.9: Information on medication was considered for defining MetS for the case-control sensitivity analysis but was not included in the PLS. Detailed information can be found in table 1.

      Comment 2.10: “It would be really interesting to determine the genetic variations associated with the latent component. Have you considered doing a GWAS on this, potentially in the CHARGE consortium or with UKBB as discovery and HCHS as replication sample?”

      Reply 2.10: Thank you for your valuable suggestion regarding the implementation of a GWAS. We agree that incorporating a GWAS would provide significant insights, but we also recognize that it extends beyond the scope of our current analysis. However, we are actively planning a follow-up analysis. This subsequent analysis will encompass a comprehensive examination of both genetic variation and imaging findings in the context of MetS.

      Comment 2.11: “Please provide more information on which data fields from UKBB were used exactly (e.g. in github repository).”

      Reply 2.11: We appreciate your recommendation. The details regarding the Field IDs used in our study have been included as supplementary table S1.

      Reviewer 3

      Comments 3.1: “After a thorough review of the methods and results sections, I found no direct or strong evidence supporting the authors' claim that the identified latent variables were related to more severe MetS to worse cognitive performance. While a sub-group comparison was conducted, it did not adequately account for confounding factors such as educational level.”

      “Page 18-19 lines 431-446: the fifth paragraph in the discussion section. - As previously mentioned in the "Weaknesses" section, this study did not conduct a direct association analysis between MetS and cognitive levels without considering subgroup comparisons. Hence, I recommend the content of this paragraph warrants careful reconsideration.”

      Reply 3.1: We acknowledge the reviewer's constructive feedback regarding our analysis of cognitive data. We have performed a mediation analysis relating the subject-specific clinical PLS score of latent variable 1 representing MetS severity and cognitive test performances and testing for mediating effects of the imaging PLS score capturing the MetS-related brain morphological abnormalities. The imaging score was found to statistically mediate the relationship between the clinical PLS score and executive function and processing speed, memory, and reasoning test performance. These findings highlight brain structural differences as a relevant pathomechanistic correlate in the relationship of MetS and cognition. Corresponding information can now be found in figure 3, methods section 2.6.2, result section 3.3 and discussion section 4.2.

      Moreover, we would like to apologize for any confusion caused by previous unclear presentation. Our study further incorporates association analyses between MetS, brain structure, and cognition using MetS components, regional brain morphological measures, and cognitive performance data in a PLS to investigate whether cognitive measures contribute to the latent variable. These analyses were separately performed on the UK Biobank and HCHS datasets, due to their distinct cognitive assessments. We adjusted for age, sex, and education in the subgroup analyses by removing their effects from the input variables. These relationships are detailed in supplementary figures S16b and S17b, with loadings close to zero for age, sex, and education, confirming effective deconfounding.

      In sum, we greatly appreciate the suggestion to conduct a mediation analysis, which has substantially enhanced the strength and relevance of our analysis.

      Comment 3.2: “I would suggest the authors provide a more comprehensive description of the metrics used to assess each MetS component, such as obesity (incorporating parameters like waist circumference, hip circumference, waist-hip ratio, and body mass index) and arterial hypertension (detailing metrics like systolic and diastolic blood pressure), etc.”

      Reply 3.2: Thank you for your suggestion regarding a more detailed description of the metrics for assessing each component of MetS. We would like to point out that the specific metrics used, including those for obesity (such as waist circumference, hip circumference, waist-hip ratio, and body mass index) and arterial hypertension (including systolic and diastolic blood pressure), are comprehensively detailed in table 1 of our manuscript. We hope this table provides the clarity and specificity you are seeking regarding the MetS assessment metrics in our study.

      Comment 3.3: “I recommend the inclusion of an additional, detailed flowchart to further illustrate the procedure of virtual histology analysis. This would enhance the clarity of the methodological approach and assist readers in better comprehending the analysis method.”

      Reply 3.3: Thank you for your suggestion. Recognizing the challenges in visually representing many of our analysis steps, we have instead supplemented our manuscript with additional references. These references provide a clearer understanding of our virtual histology approach, particularly focusing on the processing of regional microarray expression data.

      The corresponding sentence reads: “Further details on the processing steps covered by ABAnnotate can be found elsewhere (https://osf.io/gcxun) [42]”

      Comment 3.4: “Why were both brain hemispheres used instead of solely utilizing the left hemisphere as the atlas, especially considering that the Allen Human Brain Atlas (AHBA) only includes gene data for the right hemisphere for two subjects?”

      Reply 3.4: Thank you for your query regarding our decision to use both brain hemispheres instead of solely the left hemisphere, especially considering the Allen Human Brain Atlas (AHBA) predominantly featuring gene data from the left hemisphere. Given the AHBA's limited spatial coverage of expression data in the right hemisphere, our approach involved mirroring the existing tissue samples across the left-right hemisphere boundary using the abagen toolbox,1 a practice supported by findings that suggest minimal lateralization of microarray expression.2,3 Further details are provided in previous work employing ABAnnotate.4 These studies are now referenced in our methods section.

      Comment 3.5: “The second latent variable was not further discussed. If this result is deemed significant, it warrants a more detailed discussion. "

      Reply 3.5: Thank you for the suggestion. We have added a paragraph to the discussion that discusses the second latent variable in greater detail. It reads: “The second latent variable accounted for 22.33% of shared variance and linked higher insulin resistance and lower dyslipidemia to lower thickness and volume in lateral frontal, posterior temporal, parietal and occipital regions. The distinct covariance profile of this latent variable, compared to the first, likely indicates a separate pathomechanistic connection between MetS components and brain morphology. Given that HbA1c and blood glucose were the most significant contributors to this variable, insulin resistance might drive the observed clinicalanatomical relationship.”

      Comment 3.6: “I suggest appending positive MetS effects after "..., insular, cingulate and temporal cortices;" for two reasons: a). The "positive MetS effects" might represent crucial findings that should not be omitted. b). Including both negative and positive effects ensures that subsequent references to "this pattern" are more precise.”

      Reply 3.6: We concur with the notion that the positive MetS effects should be highlighted as well. We modified the first discussion paragraph now mentioning them.

      Comment 3.7: “I would appreciate further clarification on this sentence and the use of the term "uniform" in this context. Does this suggest that despite the heterogeneity in the physiological and pathological characteristics of the various MetS components (e.g., obesity, hypertension), their impacts on cortical thickness manifest similarly? How is it that these diverse components lead to "uniform" effects on cortical thickness? Does this observation align with or deviate from previous findings in the literature?”

      Reply 3.7: Thank you for highlighting the ambiguity in our previous explanation. We agree that the complexity of the relationship between MetS components and brain morphology requires clearer articulation. To address this, we have revised the relevant sentence for better clarity. It now reads: „This finding indicates a relatively uniform connection between MetS and brain morphology, implying that the associative effects of various MetS components on brain structure are comparatively similar, despite the distinct pathomechanisms each component entails.“

      Comment 3.8: “Figure 1 does not have the labels "c)" and "d)". ”

      Reply 3.8: Thank you. We have modified figure 1 and made sure that the caption correctly references its content.

      Comment 3.10: “Incorrect figure/table citation:

      • Page 18 line 418: "(figure 2b and 1c)" à (figure 2b and 2c).

      • Page 18 line 419: "(supplementary figures S8 and S12-13)" à (supplementary figures S11 and S1516).

      • In the supplementary material, "Text S5 - Case-control analysis" section contains several figure or table citation errors. Please take a moment to review and correct them.”

      Reply 3.10: Thank you for bringing this to our attention. We have corrected the figure and table citation errors.

      Comment 3.11: “Page 8 line 184: The more commonly used term is "insulin resistance" rather than "insuline resistance.”

      Reply 3.11: We now use “insulin resistance” throughout the manuscript.

      Comment 3.12: “Nevertheless, variations in gene sets may introduce a degree of heterogeneity in the results (Seidlitz, et al., 2020; Martins et al., 2021). Consequently, further validation or exploratory analyses utilizing different gene sets can yield more compelling results and conclusions.”

      Reply 3.12: Thank you for your insightful comment regarding the potential heterogeneity introduced by variations in gene sets. We agree that exploring different gene sets could indeed enhance the robustness and generalizability of our findings. However, we think conducting a comprehensive methodological analysis of the available cell-type specific gene sets is a substantial effort and warrants its own investigation to thoroughly implement it and assess its implications. We also like to highlight that we are adhering to previous practices in our analysis setup.4,5

      References

      (1) Markello RD, Arnatkeviciute A, Poline JB, Fulcher BD, Fornito A, Misic B. Standardizing workflows in imaging transcriptomics with the abagen toolbox. Jbabdi S, Makin TR, Jbabdi S, Burt J, Hawrylycz MJ, eds. eLife. 2021;10:e72129. doi:10.7554/eLife.72129

      (2) Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489(7416):391-399. doi:10.1038/nature11405

      (3) Hawrylycz M, Miller JA, Menon V, et al. Canonical genetic signatures of the adult human brain. Nat Neurosci. 2015;18(12):1832-1844. doi:10.1038/nn.4171

      (4) Lotter LD, Saberi A, Hansen JY, et al. Human cortex development is shaped by molecular and cellular brain systems. Published online May 5, 2023:2023.05.05.539537. doi:10.1101/2023.05.05.539537

      (5) Lotter LD, Kohl SH, Gerloff C, et al. Revealing the neurobiology underlying interpersonal neural synchronization with multimodal data fusion. Neuroscience & Biobehavioral Reviews. 2023;146:105042. doi:10.1016/j.neubiorev.2023.105042

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a useful characterization of the biochemical consequences of a disease-associated point mutation in a nonmuscle actin. The study uses solid and well-characterized in vitro assays to explore function. In some cases the statistical analyses are inadequate and several important in vitro assays are not employed.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The authors first perform several important controls to show that the expressed mutant actin is properly folded, and then show that the Arp2/3 complex behaves similarly with WT and mutant actin via a TIRF microscopy assay as well as a bulk pyrene-actin assay. A TIRF assay showed a small but significant reduction in the rate of elongation of the mutant actin suggesting only a mild polymerization defect.

      Based on in silico analysis of the close location of the actin point mutation and bound cofilin, cofilin was chosen for further investigation. Faster de novo nucleation by cofilin was observed with mutant actin. In contrast, the mutant actin was more slowly severed. Both effects favor the retention of filamentous mutant actin. In solution, the effect of cofilin concentration and pH was assessed for both WT and mutant actin filaments, with a more limited repertoire of conditions in a TIRF assay that directly showed slower severing of mutant actin.

      Lastly, the mutated residue in actin is predicted to interact with the cardiomyopathy loop in myosin and thus a standard in vitro motility assay with immobilized motors was used to show that non-muscle myosin 2A moved mutant actin more slowly, explained in part by a reduced affinity for the filament deduced from transient kinetic assays. By the same motility assay, myosin 5A also showed impaired interaction with the mutant filaments.

      The Discussion is interesting and concludes that the mutant actin will co-exist with WT actin in filaments, and will contribute to altered actin dynamics and poor interaction with relevant myosin motors in the cellular context. While not an exhaustive list of possible defects, this is a solid start to understanding how this mutation might trigger a disease phenotype.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      • Potential assembly defects of the mutant actin could be more thoroughly investigated if the same experiment shown in Fig. 2 was repeated as a function of actin concentration, which would allow the rate of disassembly and the critical concentration to also be determined.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for wt (Figure 5A, Table 1).

      • The more direct TIRF assay for cofilin severing was only performed at high cofilin concentration (100 nM). Lower concentrations of cofilin would also be informative, as well as directly examining by the TIRF assay the effect of cofilin on filaments composed of a 50:50 mixture of WT:mutant actin, the more relevant case for the cell.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      • The more appropriate assay to determine the effect of the actin point mutation on class 5 myosin would be the inverted assay where myosin walks along single actin filaments adhered to a coverslip. This would allow an evaluation of class 5 myosin processivity on WT versus mutant actin that more closely reflects how Myo5 acts in cells, instead of the ensemble assay used appropriately for myosin 2.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. We expect only a small incremental gain in knowledge about the extent of changes by performing additional experiments with an inverted assay geometry, given that under physiological conditions the motor properties of Myo5A and other cytoskeletal myosins are modulated by other factors such as the presence of tropomyosin isoforms and other actin binding proteins.

      Reviewer #2 (Public Review):

      Greve et al. investigated the effects of a disease-associated gamma-actin mutation (E334Q) on actin filament polymerization, association of selected actin-binding proteins, and myosin activity. Recombinant wildtype and mutant proteins expressed in sf9 cells were found to be folded and stable, and the presence of the mutation altered a number of activities. Given the location of the mutation, it is not surprising that there are changes in polymerization and interactions with actin binding proteins. Nevertheless, it is important to quantify the effects of the mutation to better understand disease etiology.

      We thank the reviewer for the positive evaluation of our work.

      Some weaknesses were identified in the paper as discussed below.

      • Throughout the paper, the authors report average values and the standard-error-of-the-mean (SEM) for groups of three experiments. Reporting the SEM is not appropriate or useful for so few points, as it does not reflect the distribution of the data points. When only three points are available, it would be better to just show the three different points. Otherwise, plot the average and the range of the three points.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was inaccurate. We corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E showed the mean ± SEM. As suggested by the reviewer, we corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The description and characterization of the recombinant actin is incomplete. Please show gels of purified proteins. This is especially important with this preparation since the chymotrypsin step could result in internally cleaved proteins and altered properties, as shown by Ceron et al (2022). The authors should also comment on N-terminal acetylation of actin.

      We added an additional figure showing the purification strategy for the recombinant cytoskeletal γ –actin WT and p.E334Q protein with exemplary SDS-gels from different stages of purification (Figure 1 – figure supplement 1).

      In a previous paper, we reported the mass spectrometric analysis of the post-translational modifications of recombinant human β- and γ-cytoskeletal actin produced in Sf-9 cells. (Müller et al., 2013, Plos One). Recombinant actin showing complete N-terminal processing resulting in cleavage of the initial methionine and acetylation of the following aspartate (β-actin) or glutamate (γ-actin) is the predominant species in the analyzed preparations (> 95 %). While the recombinant actin in the 2013 study was produced tag-free and purified by affinity chromatography using the column-immobilized actin-binding domain of gelsolin (G4-G6), we have no reason to assume that the purification strategy using the actin-thymosin-β4 changes the efficiency of the N-terminal processing in Sf-9 cells. This is supported by our, yet unpublished, mass-spectrometric studies on recombinant human α-cardiac actin purified using the actin- thymosin-β4 fusion construct, which revealed actin species with an acetylated aspartate-3. This N-terminal modification of α-cardiac actin is catalyzed by the same actinspecific acetyltransferase (NAA80) as the acetylation of asparate-2 or glutamate-2 in cytoskeletal actin isoforms (Varland et al., 2019, Trends in Biochemical Sciences). Furthermore, additional studies that used the actin-thymosin-β4 fusion construct for the production of recombinant human cytoskeletal actin isoforms in Pichia pastoris reported robust N-terminal acetylation, when the actin was co-produced with NAA80 (In contrast to Sf-9 cells, NAA80 is not endogenously expressed in Pichia pastoris) (Hatano et al., 2020, Journal of Cell Science).

      We therefore, added the following statement to the manuscript:

      “Purification of the fusion protein by immobilized metal affinity chromatography, followed by chymotrypsin–mediated cleavage of C–terminal linker and tag sequences, results in homogeneous protein without non–native residues and native N-terminal processing, which includes cleavage of the initial methionine and acetylation of the following glutamate. “

      • The authors do not use the best technique to assess actin polymerization parameters. Although the TIRF assay is excellent for some measurements, it is not as good as the standard pyrene-actin assays that provide critical concentration, nucleation, and polymerization parameters. The authors use pyrene-actin in other parts of the paper, so it is not clear why they don't do the assays that are the standard in the actin field.

      The polymerization rate of individual filaments observed in TIRFM experiments showed only minor changes, as did the bulk-polymerization rate of 2 µM actin in pyrene-actin based experiments. Therefore, we decided not to perform additional pyrene-actin based experiments, in which we titrate the actin concentration, as we expect only very small changes to the critical concentration. Instead, we focused on the disturbed interaction with ABPs, as we assume these defects to be more relevant in an in vivo context. Using pyrene-based bulkexperiments, we did determine the rate of dilution-induced depolymerization of mutant filaments and compare them with the values determined for WT (Figure 5A, Table 1).

      • The authors' data suggest that, while the binding of cofilin-1 to both the WT and mutant actins remains similar, the major defect of the E334Q actin is that it is not as readily severed/disassembled by cofilin. What is missing is a direct measurement of the severing rate (number of breaks per second) as measured in TIRF.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Figure 4 shows that the E334Q mutation increases rather than decreases the number of filaments that spontaneously assemble in the TIRF assay, but it is unclear how reduced severing would lead to increased filament numbers, rather, the opposite would be expected. A more straightforward approach would be to perform experiments where severing leads to more nuclei and therefore enhances the net bulk assembly rate.

      Figure 4 shows polymerization experiments that were started from ATP-G-actin in the presence of cofilin-1. These experiments show clearly that, especially at the higher cofilin-1 concentration (100 nM), the filament number is strongly increased in experiments performed with mutant actin. Inspection of the corresponding videos of these TIRFM experiments suggest that the increased number of filaments must result from an increased number of de novo nucleation events and not primarily from a mutation-induced change in severing susceptibility. The observation of a cofilin-stimulated increase in the de novo nucleation efficiency of actin was initially described by Andrianantoandro & Pollard (2006, Molecular Cell) using TIRFMbased experiments and is thought to arise from the stabilization of thermodynamically unfavorable actin dimers and trimers by cofilin. While the exact role of this cofilin-mediated effect in vivo is not completely clear, it is thought to contribute to cofilin-meditated actin dynamics synergistically with cofilin-mediated severing. It is therefore necessary, to clearly distinguish between the two effects of cofilin in vitro: stimulation of de novo nucleation and stimulation of filament disassembly. Our data indicated that the E334Q mutation affects these two effects differentially, as we state in the abstract and in the discussion.

      Abstract: “E334Q differentially affects cofilin-mediated actin dynamics by increasing the rate of cofilin-mediated de novo nucleation of actin filaments and decreasing the efficiency of cofilin-mediated filament severing.”

      Discussion: “Cofilin-mediated severing and nucleation were previously proposed to synergistically contribute to global actin turnover in cells (Andrianantoandro & Pollard, 2006; Du & Frieden, 1998). Our results show that the mutation affects these different cofilin functions in actin dynamics in opposite ways. Cofilin-mediated filament nucleation is more efficient for p.E334Q monomers, while cofilin-mediated severing of filaments containing p.E334Q is significantly reduced. The interaction of both actin monomers and actin filaments with ADF/cofilin proteins involves several distinct overlapping reactions. In the case of actin filaments, cofilin binding is followed by structural modification of the filament, severing and depolymerizing the filament (De La Cruz & Sept, 2010). Cofilin binding to monomeric actin is followed by the closure of the nucleotide cleft and the formation of stabilized “long-pitch” actin dimers, which stimulate nucleation (Andrianantoandro & Pollard, 2006)”.

      We interpret the reviewer's suggestion to mean that additional pyrene-actin-based bulk polymerization experiments should be performed to investigate the bulk-polymerization rate of ATP-G-actin in the presence of cofilin-1. In our understanding, these experiment would not provide additional value as 1) An observed increase of the bulk-polymerization rate cannot be directly correlated to a change of the efficiency of de novo nucleation or severing and 2) the effect of the mutation on cofilin-mediated filament disassembly was extensively analyzed in other experiments starting from preformed actin filaments. Moreover, our results are consistent with in silico modelling and normal mode analysis of the WT and mutant actin-cofilin complex.

      • Figure 5 A: in the pyrene disassembly assay, where actin is diluted below its critical concentration, cofilin enhances the rate of depolymerization by generating more free ends. The E334Q mutation leads to decreased cofilin-induced severing and therefore lower depolymerization. While these data seem convincing, it would be better to present them as an XY plot and fit the data to lines for comparison of the slopes.

      We now present the data as suggested by the reviewer. Furthermore, we determined the apparent second-order rate constant for cofilin-induced F-actin depolymerization (kc) to quantify the observed differences between WT, mutant and heterofilaments, as suggested by the reviewer.

      The paragraph describing these results was changed accordingly:

      “The observed rate constant values are linearly dependent on the concentration of cofilin–1 in the range 0–40 nM, with the slope corresponding to the apparent second– order rate constant (kC) for the cofilin-1 induced depolymerization of F–actin. In experiments performed with p.E334Q filaments, the value obtained for kC was 4.2-fold lower (0.81 × 10-4 ± 0.08 × 10-4 nM-1 s-1) compared to experiments with WT filaments (3.42 × 10-4 ± 0.22 × 10-4 nM-1 s-1). When heterofilaments were used, the effect of the mutation was reduced to a 2.2-fold difference compared to WT filaments (1.54 × 10-4 ± 0.11 × 10-4 nM-1 s-1).”

      • Figure 5 B and C: the cosedimentation data do not seem to help elucidate the underlying mechanism. While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance. Importantly, example gels from these experiments should be shown, if not the complete set included in the supplement. In B, the higher cofilin concentrations would be expected to stabilize the filaments and thus the curve should be Ushaped.

      We do not completely agree with the reviewer on this point. We think the co-sedimentation experiments are useful, as they show that cofilin-1 efficiently binds to mutant filaments, but is less efficient in stimulating disassembly in these endpoint-experiments. This information is not provided by the analysis of the effect of cofilin-1 on the bulk-depolymerization rate and adds to our understanding of the defect of the actin-cofilin interaction for the mutant.

      While we agree with the reviewer on the point that co-sedimentation experiments must be repeated several times to produce reliable data, we cannot fully grasp the reasoning behind the statement “While the authors report statistical significance, differences are small, especially for gel densitometry measurements where the error is high, which suggests that there may be little biological significance.”. We interpret this statement as advice to be cautious when extrapolating the observed perturbances of cofilin-mediated actin dynamics in vitro to the in vivo context. We think we are cautious about this throughout the manuscript.

      The author expects a U-shape curve, as high cofilin concentrations are reported to stabilize actin filaments by completely decorating the filament before severing-prone boundaries between cofilin-decorated and undecorated regions are generated. We have also performed these experiment with cytoskeletal β-actin and human cofilin-1 and never observed this U shape. This indicates that significant filament disassembly also happens at high cofilin concentrations, most likely directly after mixing of F-actin and cofilin. We cannot rule out that the incubation time plays an important role and that the U-shape only appears after longer incubation times. We also want to direct the reviewer to the publication “A Mechanism for Actin Filament Severing by Malaria Parasite Actin Depolymerizing Factor 1 via a Low Affinity Binding Interface” (Wong et al. 2013, JBC) in which comparable co-sedimentation experiments were performed (Figure 5E-G) with rabbit skeletal α-actin and human cofilin-1 and also no Ushaped curves were observed, even at higher molar excess of cofilin-1 compared to our experiments and with longer incubation times (1 hour vs. 10 minutes).

      We now included an exemplary gel showing co-sedimentation experiments performed with WT, mutant actin and different concentrations of cofilin at pH 7.8 in the manuscript (Figure 5 – figure supplement 2)

      • Figure 5 D: these data show that the binding of cofilin to WT and E334Q actin is approximately the same, with the mutant binding slightly more weakly. It would be clearer if the two plots were normalized to their respective plateaus since the difference in arbitrary units distracts from the conclusion of the figure. If the difference in the plateaus is meaningful, please explain.

      As suggested by the reviewer, we normalized the data for a better understanding of the message conveyed.

      • Figure 6: It is assumed that the authors are trying to show in this figure that cofilin binds both actins approximately the same but does not sever as readily for E334Q actin. The numerous parameters measured do not directly address what the authors are actually trying to show, which presumably is that the rate of severing is lower for E334Q than WT. It is therefore puzzling why no measurement of severing events per second per micron of actin in TIRF is made, which would give a more precise account of the underlying mechanism.

      The severing rate as measured in TIRF is dependent on a number of parameters in a nonlinear manner. Therefore, we opted to show the combination of images directly showing the progress of the reaction and graphs summarizing the concomitant changes in cofilin clusters, actin filaments, actin-related fluorescence intensity and cofilin-related fluorescence intensity.

      • Actin-activated steady-state ATPase data of the NM2A with mutant and WT actin would have been extremely useful and informative. The authors show the ability to make these types of measurements in the paper (NADH assay), and it is surprising that they are not included for assessing the myosin activity. It may be because of limited actin quantities. If this is the case, it should be indicated.

      Indeed, the measurement of the steady-state actin-activated ATPase with recombinant cytoskeletal actin is very material-intensive and therefore costly, as a complete titration of actin is required for the generation of meaningful data. Since the vast majority of our assays involving a myosin family member were performed with NM2A-HMM, we decided to perform a full actin titration of the steady-state actin-activated ATPase of NM2A-HMM with WT and mutant filaments. The results of these experiments are now shown in Figure 8C. The panel showing the results used for determining the dissociation rate constants (k-A) for the interaction of NM2C-2R with p.E334Q or WT γ –actin in the absence of nucleotide was moved to the supplement (Figure 8 – figure supplement 2).

      We added the following paragraph to the Material and Methods section concerning the Steady-State ATPase assay:

      “For measurements of the basal and actin–activated NM2A–HMM ATPase, 0.5 µM MLCKtreated HMM was used. Phalloidin–stabilized WT or mutant F-actin was added over the range of 0–25 µM. The change in absorbance at 340 nm due to oxidation of NADH was recorded in a Multiskan FC Microplate Photometer (Thermo Fisher Scientific, Waltham, MA, USA). The data were fitted to the Michaelis-Menten equation to obtain values for the actin concentration at half-maximal activation of ATP-turnover (Kapp) and for the maximum ATP-turnover at saturated actin concentration (kcat).”

      Furthermore, we added a description of the results of the experiments to the Results section of the manuscript:

      “Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at half-maximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1).”

      • (line 310) The authors state that they "noticed increased rapid dissociation and association events for E334Q filaments" in the motility assay. This observation motivates the authors to assess actin affinities of NM2A-HMM. Although differences in rigor and AM.ADP affinities are found between mutant and WT actins, the actin attachment lifetimes (many minutes) are unlikely to be related to the rapid association and dissociation event seen in the motility assay. Rather, this jiggling is more likely to be related to a lower duty ratio of the myosins, which appears to be the conclusion reached for the myosin-V data. These points should be clarified in the text.

      We changed the text in accordance with the reviewer’ suggestion. It reads now: Cytoskeletal –actin filaments move with an average sliding velocity of 195.3 ± 5.0 nm s–1 on lawns of surface immobilized NM2A–HMM molecules (Figure 8A, B). For NM2A-HMM densities below about 10,000 molecules per μm2, the average sliding speed for cytoskeletal actin filaments drops steeply (Hundt et al, 2016). Filaments formed by p.E334Q actin move 5fold slower, resulting in an observed average sliding velocity of 39.1 ± 3.2 nm/s. Filaments copolymerized from a 1:1 mixture of WT and p.E334Q actin move with an average sliding velocity of 131.2 ± 10 nm s–1 (Figure 8A, B). When equal densities of surface-attached WT and mutant filaments were used, we observed that the number of rapid dissociation and association events increased markedly for p.E334Q filaments (Figure 8 – video supplement 7– 9).

      Using a NADH-coupled enzymatic assay, we determined the ability of p.E334Q and WT filaments to activate the ATPase of NM2A-HMM over the range of 0-25 µM F-actin (Figure 8C). While we observed no significant difference in Kapp, indicated by the actin concentration at halfmaximal activation, in experiments with p.E334Q filaments (2.89 ± 0.49 µM) and WT filaments (3.20 ± 0.74 µM), we observed a 28% slower maximal ATP turnover at saturating actin concentration (kcat) with p.E334Q filaments (0.076 ± 0.005 s-1 vs. 0.097 ± 0.002 s-1). To investigate the impact of the mutation on actomyosin–affinity using transient–kinetic approaches, we determined the dissociation rate constants using a single–headed NM2A–2R construct (Figure 8D). …..

      • (line 327) The authors report that the 1/K1 value is unchanged. There are no descriptions of this experiment in the paper. I am assuming the authors measured the ATP-induced dissociation of actomyosin and determined ATP affinity (K1) from this experiment. If this is the case, they should describe the experiment and show the data, provide a second-order rate constate for ATP binding, and report the max rate of dissociation (k2). This is a kinetic experiment done frequently by this group, so the absence of these details is surprising.

      In the previous version of the manuscript, the method used to determine 1/K1 (ATP-induced dissociation of the actomyosin complex) was described in the Material and Methods paragraph “Transient kinetic analysis of the actomyosin complex” and the values obtained for 1/K1 were given in Table 1. We now included the experimental data as an additional figure in the manuscript (Figure 8 – figure supplement 3). Furthermore, we also give the maximal dissociation rate k+2 and the apparent second-order rate constant for ATP-binding (K1k+2) for the WT and mutant actomyosin complex in Table 1. Therefore, we changed the paragraph in the Results section concerning this experiment to:

      “The apparent ATP–affinity (1/K1), the maximal dissociation rate of NM2A from F-actin in the presence of ATP (k+2), and the apparent second-order rate constant of ATP binding (K1k+2) showed no significant differences for complexes formed between NM2A and WT or p.E334Q filaments (Table 1, Figure 8 – figure supplement 3).”

      and the section in the Material and Methods to:

      “The apparent ATP–affinity of the actomyosin complex was determined by mixing the apyrase–treated, pyrene–labeled, phalloidin–stabilized actomyosin complex with increasing concentrations of ATP at the stopped–flow system. Fitting an exponential function to the individual transients yields the ATP–dependent dissociation rate of NM2A–2R from F–actin (kobs). The kobs–values were plotted against the corresponding ATP concentrations and a hyperbola was fitted to the data. The fit yields the apparent ATP–affinity (1/K1) of the actomyosin complex and the maximal dissociation rate k+2.

      The apparent second–order rate constant for ATP binding (K1k+2) was determined by applying a linear fit to the data obtained at low ATP concentrations (0 – 25 µM).”

      For a better understanding of the numerous rate and equilibrium constants, we have now included a figure showing the kinetic reaction scheme of the myosin ATPase cycle (Figure 8 – figure supplement 1).

      Recommendations for the authors:

      Reviewer #1:

      • The subdomains of actin are mislabeled in Fig. 1A.

      The labeling of the subdomains has been corrected.

      • Additional experimental data addressing the 3 weaknesses noted in the public review would be informative but are not essential in my opinion. Examining the effect of cofilin on severing by the TIRF assay in more detail and using a processivity assay for myosin V (immobilized actin) would be the two aspects I would most value.

      The TIRF assay for cofilin severing was performed initially over the cofilin concentration range from 20 to 250 nM. The results obtained in the presence of 100 nM cofilin allow a particularly informative depiction of the differences observed with mutant and WT actin. This applies to the image series showing the changes in filament length, cofilin clusters, and filament number as well as to the graphs showing time dependent changes in the number of filaments and total actin fluorescence. We have not included the results for a 50:50 mixture of WT:mutant actin because its attenuating effect is documented in several other experiments in the manuscript.

      Our results with Myo5A show a less productive interaction with mutant actin filaments as indicated by a 1.7-fold reduction in the average sliding velocity and an increase in the optimal Myo5A-HMM surface density from 770 to 3100 molecules per µm2. These results indicate a reduction in binding affinity and coupling efficiency, with a likely impact on processivity. Given that Myo5A is only one of many cytoskeletal myosin motors and that the motor properties of all myosins are modulated by the presence of tropomyosin isoforms and other actin binding proteins, we expect only a small incremental gain in knowledge by performing additional experiments with an inverted assay geometry.

      Reviewer #2:

      • The authors should address the concerns regarding the statistical methodologies.

      We have gone through the manuscript carefully to correct any errors in the statistics, as explained below.

      Figure 1B, 5B, 5C, 5D, 8D, 9B, and 8 – figure supplement 2 all show the mean ± SD, as also correctly reported for Figure 8E and 8F in the figure legend. The statement, that these figures show the mean ± SEM was wrong and we corrected this mistake for all the listed figures. Furthermore, we now give the exact N for every experiment in the figure legend.

      Figure 2C, 2E, 2F, 4B, 5A, 6B-E indeed showed the mean ± SEM. As the reviewer rightly points out, this is not the appropriate way to deal with such sample sizes. We therefore corrected the figures to show the mean ± SD.

      We still refer to the mean ± SEM in Figure 2B, where elongation rates for more than 100 filaments were recorded, and in Figure 8B, where sliding velocities for several thousand actin filaments were measured.

      • The authors should present the actin titration of the steady state ATPase activity for at least one of the myosins, or preferably all of them.

      An actin titration of the steady state ATPase activity of NM-2A has been included in the revised version of the manuscript (Fig 8C).

      • The authors should consider the use of pyrene-actin in measuring the assembly/disassembly of actin.

      Values for the rate of actin assembly/disassembly measured with pyrene-actin are given in Table 1. Based on the small changes observed, we did not determine the critical actin concentration for the mutant construct.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We thank reviewer #1 for identifying the major caveats of the paper, and have split them out into separate comments below to address them.

      Comment 1) The caveats are that ecosystem processes beyond water availability are not investigated although they are brought into play in the title and in the paper

      Author response: We disagree that water availability is the only ecosystem process investigated in this study, as herbivory, plant mortality, and the maintenance of diversity in higher trophic levels are important processes within ecosystems. We have added text to the abstract and introduction clarifying that we consider these response measures to be ecosystem processes. Further language to this effect already exists in the abstract, methods, and discussion.

      Comment 2) That herbivory beyond leaf damage was not reported (there might be none, the reader needs to be shown the evidence for this)

      Author response: This is typically how herbivory is assessed in ecological studies, and our focus is on folivores. There may be additional herbivory in the form of fluid-sucking insects, shoot/root herbivory, etc., but these were not assessed. It would be interesting to assess these other forms of herbivory to see if they respond similarly with additional studies.

      Comment 3) That herbivore diversity is defined by leaf damage (authors need to give evidence that this is a valid inference)

      Author response: We thank reviewer #1 for pointing out the lack of written support for this claim. We have modified the methods (lines 138-139; 214-217) to clarify that this is a useful proxy for insect richness in the Piper system, and have added citations demonstrating it has been found to correlate well with insect richness in tropical forests.

      Comment 4) That the plots were isolated from herbivores beyond their borders

      Author response: This was not an assumption of the study. We have modified the methods (line 200) to make this clearer to the reader.

      Comment 5) That the effects of extreme climate events were isolated to Peru

      Author response: This was not an assumption of the study, rather it is an observation. While we consider it important to include observed climate differences between sites in the interpretation of our results, it was not necessary for there to be extreme climate events at other sites as we consider manipulated water availability to represent changes in precipitation that are expected to occur at these sites with climate change.

      Comment 6) That intraspecific variation in the host plants needs to be explained and interpreted in more detail

      Author response: We thank reviewer #1 for identifying that our current explanations needed development. We have modified the introduction to explore potential mechanisms relating intraspecific diversity to ecosystem function based on recent studies, and have modified the discussion to bring focus to why the effects of intraspecific differ from interspecific.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1) Pare this material down to simpler results. The most significant to me is the intraspecific variation in damage. Were this broken out and reported in some detail it could be quite interesting. I find the results to be a confusing blizzard of multiple factors that differ among sites; after reading the paper twice I could not recall the takeaway lesson beyond that drought wrecks the diversity of herbivores and sometimes even kills the host plant.

      Author response: We agree that the results are complicated given the variation in effects among sites, but this variation and complexity is important – and is in itself is one of the takeaway points. Unfortunately, nature is not simple. We have made several large edits to the results section, including the removal of methodological and otherwise redundant information, to hopefully bring the major takeaways into focus.

      Reviewer #2 (Public Review):

      Comment 1) This is an important and large experimental study examining the effects of plant species richness, plant genotypic richness, and soil water availability on herbivory patterns on Piper species in tropical forests.

      A major strength is the size of the study and the fact that it tackled so many potentially important factors simultaneously. The authors examined both interspecific plant diversity and intraspecific plant diversity. They crossed that with a water availability treatment. And they repeated the experiment across five geographically separated sites.

      The authors find that both water availability and plant diversity, intraspecific and interspecific, influence herbivore diversity and herbivory, but that the effects differ in important ways across sites. I found the study to be solid and the results to be very convincing. The results will help the field grapple with the importance of environmental change and biodiversity loss and how they structure communities and alter species interactions.

      Author response: We thank reviewer #2 for their kind words.

      Reviewer #2 (Recommendations For The Authors):

      Comment 1) I was confused about why the authors measured species diversity/richness as a proportion of the species pool. This means that the metric of richness decreases if species are added to the species pool but not the plot/experiment. I think I understand it, but I suggest the authors explain this choice.

      Author response: We thank reviewer #2 for pointing out that this was confusing. We have clarified the methods (lines 228-232) to explain that this choice was made to allow easier comparison between intra- and interspecific richness.

      Comment 2) One of the stronger estimated relationships was a positive effect of plant species richness on insect richness. I found it a little hard to interpret this relationship. Is this just because there are host species specialists? So, with more host species there are more herbivore species? Or does insect richness increase multiplicatively with increasing plant species richness? One way to look for this would be for the authors to examine the relationship between plant species richness and the average number of herbivore damage types per plant species.

      Author response: We agree that this is important for the reader to understand and have added text to the introduction and discussion sections explaining that this is the expectation based on theory and other empirical studies. We have additionally added text to the discussion (lines 386-388) pointing out that this pattern was not observed at all sites. While we agree that it would be interesting to explore if this effect was additive or multiplicative, we do not believe this is in the scope of the paper due to the methods used to measure insect richness.

      Comment 3) Unless I missed it, some important information about the models was missing. E.g., what distributions were assumed for each of the variables? Any transformations?

      Author response: We thank reviewer #2 for pointing this out, this information has been added to the methods (lines 272-274)

      Comment 4) Why is there no model with water addition affecting insect richness directly but not percent herbivory directly?

      Author response: While we originally decided to not include this model due to lack of theoretical support and low statistical performance, we have added references to this model (now model II) in the methods and results for consistency and to make model performance clearer to the reader. We have additionally moved supplemental table S1 to the main text to make the models and hypotheses tested by each model more accessible.

      Comment 5) Fig. 2. What are the percentages above the figures? Maybe PD values?

      Author response: These values are now clarified in the figure caption

      Comment 6) L364 "can differ dramatically" This is vague and confusing. Differ in what way? From each other? Did the authors really expect plant richness to have the same effect on herbivory and plant survival? What would it mean anyway for plant richness to have the same effect on herbivory and plant survival?

      Author response: We agree that the language here is confusing and thank reviewer #1 for drawing our attention to it. We have modified the discussion (lines 363-365) to clarify that the direction of effect of intraspecific richness can vary from the direction of effect of interspecific richness, rather than the effects on different response variables varying from each other.

      Comment 7) L 375 "only meaningful differences" This statement feels a little overly strong. It seems like there is a good argument for this, but there could be other things going on.

      Author response: We agree that the language here was unnecessarily strong, and have modified the discussion (lines 398-403) to focus on the lack of difference between methodologies at these two sites, and the observed differences in climate and community structure at each site.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Kohler et al analyze the impact of miR200c on cell motility in vitro and breast cancer metastasis in mouse models. The they show that miR200c represses metastasis to several different organs and propose that reduced motility is a significant cause of this. The experiments are generally sound and well performed. However, the insight gained with the study does not go much beyond what is already known about miR200c function in breast cancer. The experimental tools used in the study could provide the opportunity to reveal novel insights into the role of miR200c in metastasis. However, the investigators did not take full advantage of this and thus we are left with findings that are rather predictable based on the current literature. Details below.

      Major points:

      1. The primary weakness of this study is limited novelty. miR200c has been shown to regulate migration and invasion of breast cancer cells in several previous studies, and this includes analysis using the same breast cancer cell lines that Kohler et al use in the current study, MCF7 and MDA-MB-231 (Jurmeister et al Mol Cell Bio 2012; Zhang et al Genet Mol Res 2017) and a study by the same group (Ljepoja et al Plos One 2019). Moreover, previous studies have also shown that miR200c represses metastasis in two different claudin low triple negative breast cancer models, MDA-MB-231 and genetically-engineered p53 null transplantable model (Simpson et al Genes 2022, Knezevic et al Oncogene 2016). Of note, Kohler et al do analyze metastases not only in lungs, but also in liver, brain and spleen and this could be a source of novel insights depending on the scientific questions. Is the miR200c mediated repression of metastasis caused by the same mechanisms in all these organs, or is it context dependent? What about molecular mediators downstream of miR200c?
      2. The authors focus primarily on migration issues as the potential cause of miR200c mediated repression of metastasis. However, there is significant literature on the role of miR200c in cancer progression. miR200c has been associated with multiple cellular functions, including regulation of epithelial mesenchymal transition (EMT) by repressing key EMT transcription factors ZEB1 and ZEB2. EMT regulation of course may suggest an effect on cell motility, but also several other functions, such as stem cell activity, plasticity, survival under stress and many more. Indeed, in a clinical setting some may question the importance of migration, considering that breast cancer cells disseminate from the primary tumor early in the process and upon diagnosis the cells are likely already lodged in secondary organs. Therefore, it is probable that cell functions such as survival under stress, proliferation and plasticity would be of even higher importance compared to cell motility. I would think that miR200c functional studies need to go beyond cell motility to generate additional insights into its role in metastasis and reveal potentially actionable targets.
      3. The investigators use a dox inducible system to express miR200c in MDA-MB-231 mammary tumors in mice. The mice were treated with dox to induce miR200c when the tumors reached 200 mm3 in size. This is a rather early induction of miR200c and may not address the ability of miR200c to repress actively growing metastatic lesions. I think these experiments should also be done by waiting longer before miR200c induction. What happens if the tumors are allowed to grow to 500 mm3 or 750 mm3? This would really test the ability of miR200c to inhibit overt metastasis.

      Minor points:

      1. Although in some figures the plots/graphs show individual data points, this is not always the case. All box plots and bar graphs should show individual data points (biological replicates).
      2. Representative histological examples of the metastases in Figure 1C-1D should be shown.
      3. Presentation of the data in Figure 2C-2F is confusing. Statistics are also missing.

      Significance

      Although the study is technically sound, it suffers from limited novelty. Overall conclusions are predictable from previous studies. Of note, this study does provide somewhat more detailed analysis of migratory regulation by miR200c in cancer cells compared to previous reports. However, the study's advance is still quite modest.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors aimed to investigate how cells respond to dynamic combinations of two stresses compared to dynamic inputs of a single stress. They applied the two stresses - carbon stress and hyperosmotic stress - either in or out of phase, adding and removing glucose and sorbitol.

      Both a strength and a weakness, as well as the main discovery, is that the cells' hyperosmotic response strongly requires glucose. For in-phase stress, cells are exposed to hyperosmotic shock without glucose, limiting their ability to respond with the well-studied HOG pathway; for anti-phase stress, cells do have glucose when hyperosmotically shocked, but experience a hypo-osmotic shock when both glucose and sorbitol are simultaneously removed. Responding with the HOG pathway and so amassing intracellular glycerol amplifies the impact of this hypo-osmotic shock. Counterintuitively then, it is the presence of glucose rather than the stress of its absence that is deleterious for the cells.

      The bulk of the paper supports these conclusions with clean, compelling time-lapse microscopy, including extensive analysis of gene deletions in the HOG network and measurements of both division and death rates. The methodology the authors develop is powerful and widely applicable.

      Some discussion of the value of applying periodic inputs would be helpful. Cells are unlikely to have previously seen such inputs, and periodic stimuli may reveal behaviours that are rarely relevant to selection.

      We thank the referee for his review. To answer the reviewer’s last comment, our main objective was not to study conditions that are ecologically relevant, but rather to perturb the system in an original way to reveal new mechanisms and properties of the system. The main advantage of periodic inputs over more complex or unpredictible types of temporal fluctuations is that they can be defined with few parameters that are easy to interpret and to integrate in biophysical models. For instance, by using periodic inputs we were able to investigate how changing the phasing of two stresses impacted fitness while keeping other parameters constant (the duration of each stress was kept constant). We added two sentences at the beginning of the discussion to highlight the value of using periodic inputs.

      We do not fully agree with the reviewer’s statement that periodic stimuli may reveal behaviours that are rarely relevant to selection. Indeed, many parameters of natural environments are known to vary periodically, such as light, temperature, predation, tides. Even if the periodic stimuli we use are artificial, they can still be a valuable tool to reveal new molecular processes. For instance, null mutants have been invaluable to understand biological systems despite being unlikely to reveal behaviours relevant to selection.

      The authors' findings demonstrate the tight links that can exist between metabolism and the ability to respond to stress. Their study appears to have parted somewhat from their original aim because of the HOG pathway's reliance on glucose. It would be interesting to see if the cells behaviour is simpler in periodically varying sorbitol and a stress where there is little known connection to the HOG network, such as nitrogen stress.

      The use of periodic nitrogen stress is a very interesting suggestion from both reviewers. However, we think it represents a large amount of work that deserves its own study. In particular, it would require first identifying a relevant period at which nitrogen fluctuations have an impact on division rate similar to what we observed for glucose fluctuations before performing experiments in AS and IPS conditions.

      Nitrogen starvation is known to induce filamentous growth via activation of components of the HOG pathway (Cullen and Sprague, 2012), with potential cross-talk between filamentous growth and hyperosmotic stress response. Therefore, periodic osmotic stress and periodic nitrogen starvation may interact in a complex way.

      Reviewer #2 (Public Review):

      The authors have used microfluidic channels to study the response of budding yeast to variable environments. Namely, they tested the ability of the cells to divide when the medium was repeatedly switched between two different conditions at various frequencies. They first characterized the response to changes in glucose availability or in the presence of hyper-osmotic stress via the addition of sorbitol to the medium. Subsequently, the two stresses were combined by applying the alternatively or simultaneously (in-phase). Interestingly, the observed that the in-phase stress pattern allowed more divisions and low levels of cell mortality compared to the alternating stresses where cells were dividing slowly and many cells died. A number mutants in the HOG pathway were tested in these conditions to evaluate their responses. Moreover, the activation of the MAPK Hog1 and the transcriptional induction of the hyper-osmotic stress promoter STL1 were quantified by fluorescence microscopy.

      Overall, the manuscript is well structured and data are presented in a clear way. The time-lapse experiments were analyzed with high precision. The experiments confirm the importance of performing dynamic analysis of signal transduction pathways. While the experiments reveal some unexpected behavior, I find that the biological insights gained on this system remain relatively modest.

      In the discussion section, the authors mention two important behaviors that their data unveil: resource allocation (between glycolysis and HOG-driven adaptation) and regulation of the HOG-pathway based on the presence of glucose. These behaviors had been already observed in other reports (Sharifan et al. 2015 or Shen et al. 2023, for instance). I find that this manuscript does not provide a lot of additional insights into these processes.

      We thank the referee for his review. We agree with the reviewer that the interaction between glucose availability and osmotic stress response has been investigated in previous studies. However, this interaction was investigated using experimental procedures that differed from our approach in critical ways, and therefore the behaviors observed were not the same. In Sharifian et al. (2015), the authors identified a new negative feedback loop regulating Hog1 basal activity and described underlying molecular mechanisms. This feedback loop is unlikely to explain differences of cell fitness we observed in IPS and AS conditions, because 1) differences of division rate was still observed in hog1 mutant cells and 2) differences of death rate involve glycerol synthesis, which is independent of the feedback loop described in Sharifian et al. (2015). In Shen et al. (2023), the authors observed a stronger expression of Hog-responsive genes at lower glucose concentrations, which seems contradictory with our observation of very low pSTL1-GFP expression in absence of glucose. However, they did not use fluctuating conditions and they did not report expression of stress-response genes when glucose was totally depleted (the lower glucose concentration they used was 0.02%) as we did, which may explain the different outcomes. We added three sentences in the discussion to compare our findings to those of Shen et al. (2023).

      One clear evidence that is presented, however, is the link between glycerol accumulation during the sorbitol treatment and the cell death phenotype upon starvation in alternating stress condition. However, no explanations or hypothesis are formulated to explain the mechanism of resource allocation between glycolysis and HOG response that could explain the poor growth in alternating stresses or the lack of adaptation of Hog1 activity in absence of glucose.

      In the revised version of the manuscript, we included a new result section and a supplementary figure (Figure 4 – figure supplement 2) where we tested three hypotheses to explain the lower division rate observed in AS condition relative to IPS condition. We found no evidence supporting these hypotheses, and the mechanisms responsible for the reduced growth in AS condition therefore remains elusive.

      Another key question is to what extent the findings presented here can be extended to other types of perturbations. Would the use of alternative C-source or nitrogen starvation change the observed behaviors in dynamic stresses? If other types of stresses are used, can we expect a similar growth pattern between alternating versus in-phase stresses?

      As mentioned above in our response to the other reviewer, these are very interesting questions that we think go beyond the scope of our study due to the amount of work involved.

      Recommendations for the authors:

      Reviewer #1

      My comments are only minor.<br /> - More paragraphs would improve legibility.

      To improve legibility, we split the longer section of the Results in three paragraphs (page 12, section entitled “Osmoregulation is impaired under in-phase stresses but not under alternating stresses.” However, we kept it as one section with a single title for global coherency: each section of the results corresponds to one main figure and have one main conclusion.

      • I found AS and IPS confusing because what becomes important is whether sorbitol appears with glucose or not. For me, an acronym that makes that co-occurrence clear would be better or even better still no acronyms at all.

      We tried several alternative names for the two conditions in previous drafts of the manuscript. Based on colleagues feedback, AS and IPS acronyms appeared as a good compromise between concision and clarity. To avoid confusion, the two acronyms are precisely defined when they are first used in the Results section. We think it is more important to emphasize the co-occurrence (or not) of the two stresses, rather than the co-occurrence of glucose and sorbitol. Indeed, standard yeast medium contains glucose but no sorbitol, and therefore we defined the two periodic conditions based on differences from standard medium. Even though we avoided using acronyms as much as possible in the manuscript, the use of these two acronyms to refer to the dual fluctuations of the environment seemed essential for concision. Indeed, IPS and AS acronyms are used many times in the results (16 occurrences on page 12 alone), figures and figure legends.

      • I would consider moving some of Fig S2 to the main text: it helps clarify where Fig 2 is coming from and is referenced multiple times.

      We fully agree with the reviewer and we moved panels A-D from Figure S2 to the main Figure 2.

      • On page 10, "constantly facing a single stress that changes over time" is confusing. Perhaps "repetitively facing a single stress" instead?

      We agree this sentence could be wrongly interpreted the way it was written. We changed it to: “cells grow more slowly when facing periodic alternation of the two stresses (AS) than when facing periodic co-occurrence of these stresses (IPS)”.

      • Is there any knowledge on how cells resist hyperosmotic stress in the absence of glucose? That would help explain the IPS results.

      Based on comments from both reviewers, we surveyed the literature to flesh out the discussion of hypotheses that would help explain observed differences between AS and IPS conditions. We found few studies that investigated cell responses in the absence of glucose, and because of significant differences in the experimental approaches it remains difficult to explain our results from conclusions of these previous studies. For instance, Shen et al., 2023 described and modeled the hyperosmotic stress response at various glucose concentrations. They found that Hog1p relocation to the nucleus after hyperosmotic shock lasted longer at lower glucose concentration, which is consistent with our finding in absence of glucose. However, they did not include the absence of glucose in their experiments or periodic fluctuations of glucose concentration. In addition, their model ignores the impact of cell signaling processes involved in growth arrest in response to hyperosmotic stress or glucose depletion. It is therefore difficult to relate their conclusions to our results. We have developed the discussion of our study to include these hypotheses and to clarify what is explained or not in our IPS and AS results.

      There is knowledge on activation of the hyperosmotic stress pathway in response to glucose fluctuations, but not about the response to hyperosmotic stress in absence of glucose.

      • On page 11, Figure 5a should be Figure 4a.

      Correct.

      • I would explain the components of the HOG pathway in the caption of Fig 1 or in the text when you cite Fig 1a. They are described later, but an early overview would be useful.

      To give more context, we added the following sentences to the caption of Figure 1: “Yeast cells maintain osmotic equilibrium by regulating the intracellular concentration of glycerol. Glycerol synthesis is regulated by the activity of the HOG MAP kinase cascade that acts both in the cytoplasm (fast response) and on the transcription of target genes in the nucleus (long-term response). For simplicity, we only represented on the figure genes and proteins involved in this study.”

      • On page 16, I wasn't sure what "redirect metabolic fluxes against glycerol synthesis" meant.

      For more clarity, we modified this sentence to: “Since glucose is a metabolic precursor of glycerol, the absence of glucose may prevent glycerol synthesis and thereby fast osmoregulation."

      • For Fig 2, having a dot-dash and dash-dash lines rather than both dash-dash would be better.

      We made the proposed change, assuming the reviewer was referring to the gray dashed lines and not the colored ones.

      • In the caption of Fig 3, 2% glucose is 20 g/L.

      We thank the reviewer for catching this typo.

      • In the Materials and Methods Summary, adding how you estimated death rates would be helpful: they are not often reported.

      The calculation of death rates was explained in the Methods section. For more clarity, we modified the names of the parameters in the equation to make more explicit which ones refer to cell death.

      Reviewer #2 (Recommendations For The Authors):

      In Figure 2, it would be interesting to show individual growth rates of the perturbations at various frequencies as shown in Figures 3 c and d.

      We thank the reviewer for this suggestion. We added a new supplementary figure (Figure 2 – figure supplement 2) showing the temporal dynamics of division rates at three different frequencies of osmostress and glucose depletion. We did not include high frequencies (periods below 48 minutes) because the temporal resolution of image acquisition in our experiments (1 image every 6 minutes) was too low. Very interestingly, this new analysis suggests that the positive relationship between the frequency of glucose depletion and division rate is explained by a delay between glucose removal and growth arrest rather than a delay between glucose addition and growth recovery. We therefore added the following conclusion:

      “Under periodic fluctuations of 2% glucose, the division rate was lower during half-periods without glucose than during half-periods with glucose (Figure 2 – figure supplement 2d-f), as expected. However, this difference depended on the frequency of glucose fluctuations: the average division rate during half-periods without glucose was higher at high frequency (small period) than at low frequency (large period) of fluctuations (Figure 2 – figure supplement 2d-f). Therefore, the effect of the frequency of glucose availability on the division rate in 2% glucose is likely due to a delay between glucose removal and growth arrest: cell proliferation never stops when the frequency of glucose depletion is too fast.”

      According to Sharifan et al. 2015, I would have expected that Hog1 would not relocate in the nucleus in 0% glucose. I wonder if this is due to the use of sorbitol as a stressor or the presence of low levels of glucose in the medium. I would suggest performing some control experiments with NaCl as hyperosmotic agent and test the addition of 2-deoxy-glucose to completely block glycolysis.

      After careful reading of Sharifian et al. 2015, we fail to understand why the reviewer think Hog1 would be expected to not relocate to the nucleus after hyperosmotic stress in 0% glucose. In this previous study, the authors never combined glucose depletion with a strong hyperosmotic stress as we did in our study. They report the results of independent experiments where cells were exposed either to a single pulse of hyperosmotic stress (0.4 M NaCl) or to transient glucose starvation, but they did not combine these two stimuli. In this context, it is difficult to compare their results with ours. The fact that Sharifian et al. 2015 did not observe Hog1 nuclear relocation in 0% glucose (consistent with our result in Figure 6 – figure supplement 1a, yellow curve) is not inconsistent with our observation of Hog1 nuclear enrichment in 0% glucose + 1M sorbitol. One potential discrepancy between the two studies is the fact that they observed a small transient peak of Hog1 nuclear localization just after glucose is added back to the medium, while we failed to observe this peak in similar conditions (yellow curve in Figure 6 – figure supplement 1a). However, this could be simply explained by the temporal resolution of our experimental system: we image cells once every 6 minutes and the peak lasts less than 2 minutes in Sharifian et al. 2015. We added a sentence to discuss this minor point in the Results: “Although previous studies observed small transient (less than two minutes) peaks of Hog1-GFP nuclear localization after glucose was added back to the medium following glucose depletion (Sharifian et al., 2015, Piao et al., 2013), the temporal resolution in our experiments (one image every 6 minutes) may have been too low to detect these peaks.”.

      While we agree many additional experiments would be interesting, such as testing the effects of different stress factors or the non-metabolizable glucose analog 2-deoxy-D-glucose, we think this is beyond the scope of this study because such experiments are likely to open broad perspectives and to not be conclusive in a reasonable amount of time.

      When discussing Figure 7, the authors write that the HOG pathway is "overactivated" or "hyperactivated". I would refrain from using these terms because as seen in Figure 6, the Hog1 activity pattern, if anything, decreases as the number of alternative pulses increases. The high level of pSTL1mCitrine measured is mostly due to the long half-life of the fluorescent protein.

      We used the formulation “hyper-activation” of the HOG pathway because Mitchell et al. 2015 used it to refer to the same phenomenon in their seminal study. This "hyper-activation" refers to the fact that both the integral activation of Hog1p (sum of areas under Hog1 nuclear peaks) and the global activation of transcriptional targets is much higher during fast periodic hyperosmotic stress than during constant hyperosmotic stress. That being said, we understand the point made by the reviewer about the decreasing size of Hog1 peaks over time during repeated pulses of osmotic stress. Therefore, we slightly modified the text to refer to hyper-activation of pSTL1-mCitrine transcription or expression instead of hyper-activation of the HOG pathway. For coherency, we replaced all instances of “overactivation” by “hyper-activation”.

      Last but not least, the high level of pSTL1-mCitrine is both due to the long half-life of the protein and to the fact that pSTL1 transcription is never turned off due to high Hog1p activity under fast periodic osmostress.

      Minor comments:

      In the main text, I think it might be more intuitive to refer to doubling time in hours instead of division rates in 1/min which are harder to interpret.

      In an early draft of the manuscript, we made figures with either division rates or with doubling times (ln(2)/division rate) and we received mixed opinions from colleagues on what measure was more intuitive to interpret. Both measures are widely used in the literature, and we decided to use division rates in the final version of the figures because it was more directly related to population growth rate and to fitness. For instance, the population growth rate shown in Figure 5 is simply calculated by subtracting the death rate from the division rate. For coherency, we therefore reported division rates instead of doubling times in figures and results. However, to address the reviewer’s comment we included the doubling times (in addition to the division rates) when mentioning the most important results. For instance, page 12: “Strikingly, cells divided about twice as fast under IPS condition (1.67 x 10-3 division/min, corresponding to an average doubling time of 415 minutes) than under AS condition (9.4 x 10-4 division/min, corresponding to an average doubling time of 737 minutes)”.

      I found various capitalized version of "HOG /Hog pathway"

      We corrected this incoherency and used “HOG pathway” everywhere.

      Page 11. Figure 5a should refer to Figure 4a I believe.

      Correct.

      The methods are generally very thorough and precise. The explanation about the calculation of the division rate seems incomplete. For completeness, it would be good to mention the brand and model of valves used. In addition, it would be interesting to have an idea of the number of cells and microcolonies tracked in the various growth experiments.

      We are not sure why the reviewer found the explanation of the calculation of division rate incomplete. For more clarity, we modified the names of parameters in the equations to make them more explicit. We also added a reference to Supplementary File 1 that contains all R scripts used to calculate division rates and death rates. We included the brand and model of valves used, as requested. As for the number of cells tracked in the various experiments, we mentioned in the Methods: “we selected 25 positions (25 fields of view) of the motorized stage (Prior Scientific ProScan III) that captured 10 to 50 cells in each of the 25 growth chambers of the chip and were focused slightly below the median cell plane based on cell wall contrast.” To address the reviewer’s comment, we also included the range of number of tracked cells for each experiment in corresponding figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, we would like to thank you and all the reviewers for acknowledging the meaningful contribution of our manuscript to the field. Your useful comments helped us improve the manuscript's quality. We understood the key issues of the manuscript were the quantification of inference accuracy and applicability to methylome data. We here therefore present a revised version of the manuscript addressing all major comments.

      For each demographic inference we have added the root mean square error as demanded by the reviewers. These results confirm the previous interpretation of the graphs especially in recent times. We also added TMRCA inference analysis as requested by one reviewer as a proof of principle that integrating multiple markers can improve ARG inference.

      The discussion was rewritten to further discuss the challenges of application to empirical methylation data. We clarify that in the case epimutations are well understood and modelled, they can be integrated into a SMC framework to improve the approaches accuracy. When epimutations are not well understood, our approach can help understand the epimutations process through generations at the evolutionary time scale along the genome. Hence, in both cases our approach can be used to unveil marker evolution processes through generations, and/or deepen our understanding of the population past history. We hope our discussion underlies better how our approach is designed and can be used.

      eLife assessment

      This important study advances existing approaches for demographic inference by incorporating rapidly mutating markers such as switches in methylation state. The authors provide a solid comparison of their approach to existing methods, although the work would benefit from some additional consideration of the challenges in the empirical use of methylation data. The work will be of broad interest to population geneticists, both in terms of the novel approach and the statistical inference proposed.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyse multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Answer: We thank the reviewer 1 for his positive comments and acknowledging the future promises of our method as better and more reliable data will be available in different species. We appreciate the reviewer noticing the complete set of work undertaken here to integrate local and regional effects of methylation into a model containing as much knowledge of the epigenetics mutational processes as possible. Note that in Figure 2 of the manuscript we observed a gain of accuracy even when the rates are unknown. Our results thus suggests that the accuracy gain of additional marker with unknown rates is also possible, although it is most likely be scenario and rate dependent.

      At last, as noticed and highlighted by the very recent work of the Johannes lab (Yao et al. Science 2023) using phylogenetic methods, knowing the epimutation rate is essential at short time scale to avoid confounding effects of homoplasy. In our estimation of the coalescent trees, the same applies, though our model considers finite site markers. We now provide additional evidence for the potential gain of power to infer the TMRCA (Supplementary Table S7) when knowing or not the epimutation rates and revised the discussion to clarify the potential shortcomings/caveats for the analysis of real data.

      Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

      Answer: We thank the reviewer 2 for his positive comments. Indeed, surveys of CG methylation in other plant species show that its distribution is clearly bimodal (i.e. binary). This is not the case for non-CG methylation, such as CHG and CHH (where H=C,T,A). However, these later types of methylation contexts are also not heritable across generations and can therefore not be used as heritable molecular markers.

      Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems sound and reasonable. This is an exciting new avenue for thinking about inference from genomic data. I have a few concerns about the presentation and then also questions about the use of empirical methylation data sets.

      I think a more detailed description of demographic accuracy is warranted. For example, in L245 MSMC2 identifies the bottleneck (albeit smoothed) and only slightly overestimates recent size. In the same analysis the authors' approach with unknown mu infers a nonexistent population increase by an order of magnitude that is not mentioned.

      Answer: We thank the reviewer 3 for his positive comments and refer to our answer to reviewer 1 above. We added RMSE (Root Mean Square Error) analyses to quantify the inference accuracy. We apologize for not mentioning this last point. Thank you for pointing this out and we have now fixed it (line 245-253).

      Similarly, it seems problematic that (L556) the approach requiring estimation of site and region parameters (as would presumably be needed in most empirical systems like endangered nonmodel species mentioned in the introduction) does no better than using only SNPs. Overall, I think a more objective and perhaps quantitative comparison of approaches is warranted.

      Answer : See answer to reviewer 1 above, and more elaborate answers below. We provide now new RMSE analyses to quantify the accuracy of our demographic inference (Supplementary Tables 1,6,7,8,9,10). We also discuss the validity and usefulness of our approach when the epimutation rates are unknown. In short, the discussion was rewritten to further discuss the challenges of application to empirical methylation data. We clarify that in the case epimutations are well known and modelled (as much is known in A. thaliana for example), they can be integrated into a SMC framework to improve the accuracy of the method approach. When epimutations are not well understood and rates unknown, our approach can help understand the epimutational process through generations at the evolutionary time scale. Hence, whether makers are understood or not, our approach can be used to study the marker evolutionary processes through generations and/or to deepen our understanding of the population past history. We hope our discussion underlies better how our approach is designed and can be used.

      The authors simulate methylated markers at 2% (and in some places up to 20%). In many plant genomes a large proportion of cytosines are methylated (e.g. 70% in maize: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496265/). I don't know what % of these may be polymorphic, but this leads to an order of magnitude more methylated cytosines than there are SNPs. Couldn't this mean that any appreciable error in estimating methylation threatens to be of a similar order of magnitude to the SNP data? I would welcome the authors' thoughts here.

      Answer : The reviewer is correct and this is an interesting question. First, studies show that heritable epimutations in plants are restricted to CG dinucleotides that are located well outside of the target regions of de novo methylation pathways in plants. Most of these CGs tend of fall within so-called gene body methylated regions. While it is true that plant species can differ substantially in their proportion of methylation at the genome-wide scale, the number of gene body methylated genes (i.e. genic CG methylation) is relatively similar, and at least well within the same order of magnitude (Takuno et al. Nature Plants 2016, review in Muyle et al. Genome Biol Evol 2022). Moreover, spontaneous CG epimutations in gene body methylated regions has been shown to be neutral (van Der Graaf et al. 2015, Vidali et al. 2016, Yao et al. 2023), which is an ideal property for phylogentic and demographic inference.

      Second, CG methylation calls are sometimes affected by coverage or uncertainty. Stringent filtering for reliable SMP calls typically reduces the total proportion of CG sites that can be used as input for demographic inference. Here we only kept CG sites where the methylation information could be fully trusted after SMP calling (i.e. >99.9% posteriori certainty). Overall, this explains why the percentage of sites with methylation information is so small, and why we have decided to work on simulation with 2% of reliable methylated markers.

      Nevertheless, for the sake of generality, it may be that in some species such as maize a higher percentage of polymorphic methylated sites can be used, and the number of SMPs could be higher than that of SNPs when the effective population size is very small (due to past demographic history and/or life history traits). In this case, any error in the epimutation rate and variance due to the finite site model estimation (and homoplasy) are not corrected by the lack of SNPs and can lead to mis-inference.

      A few points of discussion about the biology of methylation might be worth including. For example, methylation can differ among cell types or cells within a tissue, yet sequencing approaches evaluate a pool of cells. This results in a reasonable fraction of sites having methylation rates not clearly 0 or 1. How does this variation affect the method? Similarly, while the authors cite literature about the stable inheritance of methylation, a sentence or so more about the time scale over which this occurs would be helpful.

      Answer: We thank reviewer 3 for asking those very interesting questions, which we further developed below and mention in the discussion (lines 716-722).

      For Arabidopsis thaliana:

      Following up on our previous comment above, the majority of the CG sites that serve as input to our approach are located in body methylated genes. Previous work has shown that CG methylation in these regions shows essentially no tissue and cellular heterogeneity (e.g. Horvath et al. 2019). This means that bulk methylation measurements only show limited susceptibility to measurement error. That said, to guard against any spurious SMPs call that could arise from residual measurement variation, we applied stringent filtering of CG methylation. We have kept sites where the methylation percentage is close to either 0% or 100% (the rest being removed from the analysis). We have used similar filtering strategies in previous studies of epimutational processes in mutation accumulation lines and long-lived perennials (work of the Johannes lab). In these later studies we found that the SMP calls sufficiently accurate for inferences of phylogenetic parameters in experimental settings (Sharyhary et al. Genome Biology 2021, Yao et al. Science, 2023).

      For other species:

      It is true that currently, evaluating the methylation state of a site from a pool of cells may be problematic for some species for two main reasons: 1) it will add noise to the signal and SMP calling could be erroneous, and 2) the methylation state used in analysis might originate from different tissues at different location of the genome/methylome. Overall, this will lead to spurious SMPs and can render the inference inaccurate (see Sellinger et al 2021 for the effect of spurious SNPs). Hence, caution is advised when calling SMPs in other species and for different tissues.

      Finally, in some species methylated cytosines have mutation rates an order of magnitude higher than other nucleotides. The authors mention they assume independence, but how would violation of this assumption affect their inference?

      Answer: Indeed, we assume the mutation and epimutation process to be independent thus the probability for a SNP to occur does not depend on the local methylation state. If this was the case, the mutation rate use would indeed be wrong to a degree function of the dependency between the processes. We suggest that by ignoring this dependence, we are in the same situation as ignoring the variation of mutation rate along the genome. We have previously documented the effect of ignoring this biological feature of genomes in Strüt et al 2023 and Sellinger et al 2021. The variation in mutation rate along the genome if too extreme and not accounted for can lead to erroneous inference results. However, this problem could be easily solved (modelled) by adapting the emission matrix. To correctly model this dependency, additional knowledge is needed: either the mutation and epimutation rates must be known to quantify the dependency, or the dependency must be known to quantify the resulting rates. As far as we know, these data are at the moment not available, but could maybe be obtained using the MA lines of A. thaliana (used in Yao et al. 2023).

      Recommendations for the authors:

      All three reviewers liked this approach and found it a valuable contribution. I think it is important to address reviewer 1/3 concerns about quantifying the accuracy of inference (the TMRCA approach from reviewer 1 sounds pretty reasonable), and reviewer 1 also highlights an intriguing point about model accuracy being worse when the mutation rate is known. Additionally, I think some discussion is warranted about challenges dealing with empirical methylation data (points from Rev 2 and 3 as well as Rev 1's question about inferred vs published rates of epigenetic mutation).

      Answer : We have added tables containing the root mean square error (RMSE) of every demographic inference in the manuscript to better quantify accuracy. We have below given the explanation on why accuracy in presence of site and region epimutations can in some cases decrease when real rates are known (because methylation state at the region level needs to be first inferred). We added evidence that accounting for methylation can improve the accuracy when recovering the TMRCA along the genome when the rates are known. We also have enhanced the discussion on the challenges of dealing with epimutations data for inference. As is suggested, we hope this study will generate an interest in tackling these challenges by applying the methods to various methylome datasets from different species.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      • For all of the simulated demographic inference results, only plots are presented. This allowsfor qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      Answer : We understand the concern of reviewer #1 for a more quantitative approach to compare the inference results. We agree that plots are not sufficient to fully grasp a method performance. To provide better supports to quantity approaches performance, we added Sup tables 1,6,8,9 and 10 containing the RMSE (in log10 for visibility) for all Figures. The root mean-squared error is calculated as in Sellinger 2021 and a description of how the root mean-squared error is calculated and now found in the method section lines 886-893.

      • 434: The discussion downplays the really odd result that inputting the true value of themutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour.

      Answer : There are unfortunately no errors in this plot and those results are perfectly normal and coherent, but we understand they can be confusing at first.

      As described in the method section and in the appendix, when accounting for regionlevel epimutations, our algorithm requires the regional methylation status which needs to be inferred as a first step from the data (real or simulated). Because region and single site epimutation events are occurring at similar rates in our simulated scenario, the methylation state of the region is very hard to correctly recover (e.g. there will be unmethylated site in methylated regions and methylated sites in unmethylated regions). In other words, the accuracy of the region estimation HMM procedure is decreased by the joint action of site and region epimutation processes.

      When subsequently applying the HMM for inference, as described in the appendix, the probabilities of two CG site being in the same or different methylation state depends on the methlylation state of the "region". Hence the mislabelling of the region methylation state is (to some extent) equivalent to spurious SMPs (or inaccurate SMP calling).

      If the true rates for site and region epimutations are given as input, the model forces the demography (and other inferred parameters) to fit the observed distribution of SMPs (given the inputted rates), resulting in the poor accuracy observed in the Figure (Now Supplementary Figure 7).

      Note: The estimated rates from real data in A. thaliana suffer from the same issue as the region and site epimutation rates are independently estimated, and the existence of regions first quantified using an independent HMM method (Denkena et al. 2022).

      However, when rates are freely inferred, they are inferred accordingly to the estimated methylation status of regions and SNPs. Therefore, even if the inferred rates are wrong, they are used by the SMC in a more consistent way.

      Note: When methylation rates violate the infinite site assumption, such as here, we first estimate the tree sequence along the genome using SNPs (i.e. DNA mutations). The algorithm then infers the epimutations rates given the inferred coalescent times and the observed methylation diversity.

      To summarise: when inputting rates to the model, if the model fails to correctly recover the region methylation status there will be conflicting information between SNPs and SMPs leading to accuracy loss. However if the rates are inferred this is realized with the help of SNPs, leading to less conflicting information and potentially smaller loss of accuracy. We apologize that the explanations were missing from the manuscript and have added them lines 449-460 and 702-716.

      A further argument is that if region and site epimutations occur at rates of at least two orders of magnitude difference, the inference results are better (and accurate) when the true rates are given. The reason is that one epimutational process overrides the other (see Supplementary Table 2). In that case one epimutation process is almost negligible and we fall back to results from Figure 5 or Supplementary Figure 6.

      • As noted at 580, all of the added power from integrating SMPs/DMRs should come fromimproved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases.

      Answer : We fully agree with reviewer 1. We have added a comparison in TMRCA inference as proof of principle between using or not using methylation sites. The results are written in Supplementary Table 7 and methodology is inspired by Schiffels 2014 and described at the end of the method section (line 894-907). Those results demonstrate the potential gain in accuracy when using methylation polymorphic. However, TMRCA (or ARG) inference is a very vast and complex subject in its own right. Therefore, we are developing a complete TMRCA/ARG inference investigation and an improve methodology than the one presented in this manuscript. To do so we are currently working on a manuscript focusing on this topic specifically. We hence consider further investigations of TMRCA/ARG inference beyond the scope of this current study.

      • A general remark on the derivations in Section 2 of the supplement: I checked theseformulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      Answer: We thank the reviewer for this very interesting suggestion! Unfortunately, it is a bit late to re-implement the algorithm and reshape the manuscript according to this suggestion. Speed is not yet an issue but will most likely become one in the future when integrating many different rates or when using a more complex SMC model. Hence, we added reviewer #1 suggestions to the discussion (line 648) and hope to be using it in our future projects.

      • Most (all?) of the SNP-only SMC methods allow for binning together consecutiveobservations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      Answer: This is a very good question. We do the binning exactly as described in Mailund 2013 & Terhorst 2017, and added this information in the method section (lines 801-809). However, as described in Terhorst 2017, one can only bin observation of the same "type" (to compute the Baum-Welch algorithm). Therefore, the computation time gain by binning is reduced when different markers spread along the genome in high proportion. This is the approach we used throughout the study when facing multiple markers as it had the best speed performance. As for example, when the proportion of site with methylated information is 1% or less, computation time is only slightly affected (i.e. same order of magnitude).

      However, the binning method presented in Mailund 2013 can be extended to observation of different types, but parameters need to be estimated through a full likelihood approach (as presented in Figure 2). In our study this approach did not have the best speed performance. However, as our study is the first of its kind, it remains sub-optimal for now. Hence, we did not further investigate the performance of our approach in presence of many multiple different genomic marker (e.g. 5 different markers each representing ~20% of the genome each). Currently, with SMC approaches a high proportion of sites contain the information "No SNPs", making the Baum welch algorithm described in Terhorst 2017 very efficient. But when further developing our theoretical approach, we expect that most of the sites in a genome analysis will contain some "information", which could render the full likelihood approach computationally more tractable.

      • 486: The assumed site and region (de)methylation rates listed here are several OOMdifferent from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533?

      Answer: We thank the reviewer for asking this question. We believe answering this question is indeed the most interesting aspect of our study. Beyond demographic inference, our study has indeed unveiled a discrepancy between rates inferred through biological experiment and our study through the use of SNPs and branch length. There are several reasons which could explained the discrepancy between both approaches:

      • Firstly, our underlying HMM hypotheses are certainly violated. We ignoredpopulation structure, variation of mutations and recombination rate along the genome as well as the effect of selection. Hence, the branch lengths used for methylation rate estimations are to some extent inaccurate. We note that this is especially likely for the short branches of coalescent tree originating from background selection events in the coding regions and which are especially observable when using the methylation sites with a higher mutation rate than SNPs (Yao et al. 2023) at body methylated genes.

      • Secondly, calling single methylation site polymorphism is not 100 % reliable. If theerror rate is 0.1%, as the study was conducted on ~10 generations a minimum epimutation rate of 10-4 is to be expected. However, because our approach works at the evolutionary time scale, we expect that it suffers less from this bias as the proportion of diversity originating from actual epimutations, and not SMP calling error, should be greater.

      • Thirdly, as mentioned above, recovering the methylation status of a region is veryhard. Hence false region status inference could affect our inference accuracy as shown in Supplementary Figure 4.

      • Lastly and most importantly, the reason behind this discrepancy is the modelling ofepimutation and methylation between sites and regions. As we discuss, the current combination of rates and models is still limited to describe the observed diversity along the genome (as we intend in SMC methods). This is in contrast to the recent study by Yao et al. where very few regions of polymorphic SMPs are chosen, which implicitly avoids the influence of the methylation region effect. A study just published by Biffra et al. (Cell reports 2023) also uses a functional model of methylation modelling using a mix of region and site epimutation, albeit not tuned for evolutionary analyses. Thus we suggest, in line with functional studies, that epimutations are not independent from the local methylation context and may tend to stabilize the methylation state of a region. Therefore, the estimated methylation rates show a discrepancy to the previously measured ones. Indeed, the biological experiment would reveal a fast epimutation rate because epimutations can actually be tracked at sites which can mutate, while region mutation rate is much slower. However, because the methylation state of a region is rather stable through time it would reduce the methylation diversity over long time scale, and these rates would differ between methylated or unmethylated regions (i.e. the methylation rate is higher in methylated regions). Our results are thus in agreement with the observation by Biffra et al. that region methylation modelling is needed to explain patterns of methylation across the genome.

      To solve the discrepancy, one would need to develop a theoretical region + site epimutation model capable of describing the observed diversity at the evolutionary time scale (possibly based on the Biffra et al. model within an underlying population evolution model), and then use this model to reanalyse the sequence data from the biological experiment (i.e. in de Graaf et al. 2015 & Denkena et al. 2022) to re-estimate the methylation region sizes and epimutation rates.

      Minor comments:

      • 189: "SMCtheo" first occurs here, but it's not mentioned until 247 that this is the newmethod being presented.

      Answer : Fixed

      • 199: Are the estimates in this section from a single diploid sequence? Or is it n=5 (diploid) as mentioned in the earlier section?

      Answer : Yes, those results were obtained with 5 diploid individuals. We added it in the Table 1 description.

      • 336: I'm confused by the wording: it sounds like the test rejects the null if there is positivecorrelation in the methylation status across sites. But then, shouldn't 339 read "if the test is significant" (not non-significant)?

      Answer : We apologize for the confusion and rewrote the sentence line 339-348, the choice of word was indeed misleading .

      • Fig. 6: for some reason fewer simulations were run for 10Mb (panels C nad D) than for100Mb (A and B). Since it's very difficult to tell what's happening on average in the 10Mb case, I suggest running the same number of simulations.

      Answer : Yes we understand your concern. Actually, the same number of simulations were run but we plotted only the first 3 runs as it was less visually confusing. We now have added the missing lines to the plot C and D.

      Typos:

      • 104: "or or"

      • 292: build => built

      • 388: fulfil

      • 683: sample => samples

      Answer : Many thanks to reviewer 1 for pointing out the typos. They are all now fixed.

      Reviewer #2 (Recommendations For The Authors):

      The authors may find some valuable information in Pisupati et al (2023) "On the causes of gene-body methylation variation in Arabidopsis thaliana" on interpreting epimutation rates.

      Answer: Many thanks for the recommended manuscript. We add it to the cited literature as it strongly supports our use of heritability or methylation. We also added the recent Biffra et al. paper.

      Reviewer #3 (Recommendations For The Authors):

      There are many places throughout the manuscript with minor grammatical errors. Please review these. A few noted below as I read:

      L104: extra "or"

      L123: built not build

      L 160 "relies" instead of "do rely"

      L161 "events"

      L 336 "from methylation data"

      L 378 "exists"

      L 379 "regions are on average shorter" instead of "there are shorter"

      L 338 "a regional-level"

      L 349 "," instead of "but"

      L 394 DMRs

      Table 1 legend: parentheses not brackets?

      Answer : Many thanks to reviewer #3 for finding those mistakes. They are all now fixed.

      I think a paragraph in the discussion of considerations of when to use this approach might be helpful to readers. Comparison to e.g. increased sample size in MSMC2, while not necessary, might be helpful here. It may often be the case that doubling the number of haplotypes with SNP data may be easier and cheaper estimating methylation accurately.

      Answer : We discuss (lines 691-698) that our approach is always useful by design, but cannot always be used for the same purpose. If the evolutionary properties of the used marker used are not understood, we suggest that our approach can be used to investigate the marker heritability process through generations. This could help to correctly design experiments aiming to study the marker heritability through lineages. And if the properties of the marker are well understood and modelled, it can be integrated into the SMC framework to improve inference accuracy.

      Other minor notes:

      L 486 "known" is a stretch. empirically estimated seems appropriate.

      Answer : Fixed

      L 573 ARG? You are not estimating the full ARG here.

      Answer : We apologize for the wrong choice of word and have rephrased the sentence.

      Fig. 2 is not super useful and could be supplemental.

      Answer : We moved Figure 2 to the appendix (now sup fig 1)

    2. Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalecent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes.

      Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. (See also major comment #1 below about the interpretation of these plots.) A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Major comments:<br /> - For all of the simulated demographic inference results, only plots are presented. This allows for qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      - 434: The discussion downplays the really odd result that inputting the true value of the mutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour. (Comment addressed in revision. Still, I find the explanation added at 449ff to be somewhat puzzling -- shouldn't the results of the regional HMM scan only improve if the true mutation rate is given?)

      - As noted at 580, all of the added power from integrating SMPs/DMRs should come from improved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases. (Comment addressed in revision via Supp. Table 7.).

      - A general remark on the derivations in Section 2 of the supplement: I checked these formulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      - Most (all?) of the SNP-only SMC methods allow for binning together consecutive observations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      - 486: The assumed site and region (de)methylation rates listed here are several OOM different from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533? (Comment addressed in revision.)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses: The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      We agree on the reviewers observation about the evidence on seasonal shift in the host use pattern in Cx. quinquefasciatus populations from southern latitudes. We include a paragraph in the Introduction section regarding this. Unfortunately, studies conducted in South America to understand host use by Culex mosquitoes are very limited, and there are virtually no studies on the seasonal feeding pattern. In Argentina, there is some evidence (Stein et al., 2013, Beranek, 2019) regarding the seasonal change in host use by Culex species, including Cx. quinquefasciatus, where the inclusion of mammals during the autumn has been observed. As part of a comprehensive study on characterising bridge vectors for SLE and WN viruses, our research group is currently working on the molecular identification of blood meals from engorged females to gain deeper insights into the seasonal feeding pattern of Culex mosquitoes. While the seasonal change in host use by Culex quinquefasciatus has not been reported in Argentina so far, there has been an observed increase in reported cases of SLE virus in humans between summer and fall (Spinsanti et al., 2008). It is based on this evidence that we hypothesise there is a seasonal change in host use by Cx. quinquefasciatus, similar to what occurs in the United States. This is also considering that both countries (Argentina and the United States) have regions with similar climatic conditions (temperate climates with thermal and hydrological seasonality). Since we work on the same species and in a similar temperate climate regimen, we assumed there is a seasonal shift in the host use by this mosquito species.

      Reviewer #1 (Recommendations for the authors):

      Abstract

      Line 23: fed on two different hosts.

      Accepted as suggested.

      I think the concluding statement should be rewritten to say that immediate reproductive outcomes do not explain the shift in host use pattern of Cx. quinquefasciatus mosquitoes from birds to mammals towards autumn.

      Accepted as suggested.

      Introduction

      No comments.

      Materials and Methods

      Please mention sample sizes in the text as well (n = ?) for each treatment.

      Accepted as suggested.

      Page 99: ......C. quinquefasciatus, since C. pipiens and its hybrids are present as well in Cordoba.

      Accepted as suggested.

      Results – Line 146: subsequently instead of posteriorly

      Accepted all changes as suggested.

      Line 148: were counted instead of was counted.

      Accepted all changes as suggested.

      Line 160: Subsequently instead of posteriorly

      Accepted all changes as suggested.

      Line 171: on fertility

      Accepted all changes as suggested.

      Line 174: there was an interaction effect on…

      Accepted all changes as suggested.

      Line 175: there were no differences in the number of eggs

      Accepted all changes as suggested.

      Discussion

      I think the first paragraph in the discussion section is redundant and should be deleted.

      The whole discussion was rewritten to be focused on our aims and results.

      Line 282: this sentence needs to be rewritten.

      Accepted as suggested.

      Line 299: at 28{degree sign}C

      Line 300: at 30{degree sign}C

      Sorry, but we are not sure about your comment here. We checked. Temperatures are written as stated, 28°C and 30°C.

      Line 363: I think the authors need to discuss more about the bigger question they were addressing. I think that the discussion section can be strengthened greatly by elaborating on whether there is evidence for a seasonal shift in host use pattern in Cx. quinquefasciatus in the southern latitudes. If yes, what alternate mechanisms they believe could be driving the seasonal change in host use in this species in the southern latitudes now that they show the 'deriving reproductive advantages' hypothesis to be not true for those populations.

      Thanks for this observation. We agree and so the Discussion section was restructured to align it with our results, as suggested.

      Reviewer #2 (Public Review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness in birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used a generalized linear mixed model to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity and fertility and a null model analysis via data randomization for hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite of that hypothesized. While this finding is interesting and worth reporting, there are significant issues with the experimental design and the conclusions that are drawn from the results, which are described below. These issues should be addressed to make the findings trustworthy.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) There is no replication built into this study. Egg lay is a highly variable trait, even within treatments, so it is important to see replication of the effects of treatment across multiple discrete replicates. It is standard practice to replicate mosquito fitness experiments for this reason. Furthermore, the sample size was particularly small for some groups (e.g. 15 egg rafts for the second gonotrophic cycle of mice in the autumn, which was the only group for which a decrease in fecundity and fertility was detected between 1st and 2nd gonotrophic cycles). Replicates also allow investigators to change around other variables that might impact the results for unknown reasons; for example, the incubators used for fall/summer conditions can be swapped, ensuring that the observed effects are not artefacts of other differences between treatments. While most groups had robust sample sizes, I do not trust the replicability of the results without experimental replication within the study.

      We agree egg lay is a variable trait and so we consider high numbers of mosquitoes and egg lay during experiments compared to our studies of the same topics. Evaluating variables such as fecundity, fertility, or other types of variables (collectively referred to as "life tables") is a challenging issue that depends on several intrinsic and extrinsic factors. Because all of this, in some experiments, sample sizes might not be very large, and in several articles, lower sample sizes could be found. For instance, in Richards et al. (2012), for Culex quinquefasciatus, during the second gonotrophic cycle, some experiments had 13 or even 6 egg rafts. For species like Aedes aegypti, the sample size for life table analysis is also usually small. As an example, Muttis et al. (2018) reported between 1 and 4 engorged females (without replicates). In addition, small sample size would be a problem if we would not have obtained any effect, which is not the case due to the fact that we were interested in finding an effect, regardless of the effect size. Because of this, we do find our sample sizes quite robust for our results.

      Regarding the need to repeat the experiments in order to give more robustness to the study we also agree. However, after a review of the literature (articles cited in the original manuscript), it is apparent that similar experiments are not frequently repeated as such. Examples of this are the studies of Richards et al. (2012), Demirci et al. (2014) or Telang & Skinner (2019), which even they manipulate several cages at a time as “replicates”, they are not true replicates because they summarise and manipulate all data together, and do not repeat the experiment several times. We see these “replicates” as a way of getting a greater N.

      As was stated by the reviewer, repetition is a resource and time-consuming activity that we are not able to do. Replicating the experiment poses a significant time and resources challenge. The original experiment took over three months to complete, and it is anticipated that a similar timeframe would be necessary for each replication (6 months in total considering two more replicates). Given our existing commitments and obligations, dedicating such an extensive period solely to this would impede progress on other crucial projects and responsibilities.

      Given the limitations of resources and time and the infrequent use of experimental replication in this type of studies, we performed a simulation-based analysis via a Monte Carlo approach. This approach involved generating synthetic data that mimics the expected characteristics of the original experiment and subsequently subjecting it to the same analysis routine. The main goal of this simulation was to evaluate the potential spuriousness and randomness of the results that might arise due to the experimental conditions. So, evaluating the robustness and confidence of our results and data.

      (2) Considering the hypothesis is driven by the host switching observed in the field, this phenomenon is discussed very little. I do not believe Cx. quinquefasciatus host switching has been observed in Argentina, only in the northern hemisphere, so it is possible that the species could have an entirely different ecology in Argentina. It would have been helpful to conduct a blood meal analysis prior to this experiment to determine whether using an Argentinian population was appropriate to assess this question. If the Argentinian populations don't experience host switching, then an Argentinian colony would not be the appropriate colony to use to assess this question. Given that this experiment has already been conducted with this population, this possibility should at least be acknowledged in the discussion. Or if a study showing host switching in Argentina has been conducted, it would be helpful to highlight this in the introduction and discussion.

      Thanks for this observation. We agree. However, we conducted the experiment beside host use data from Argentina since we used the mosquito species, and the centre region of Argentina (Córdoba) has a similar temperate weather regimen that those observed in the east coast of US.

      We are aware that few studies regarding host shifting in South America are available, some such that those conducted by Stein et al. (2013) and Beranek (2019) reported a moderate host switch for Culex quinquefasciatus in Argentina. We have already performed a study about seasonal host feeding patterns for this species. However, even though there are few studies regarding host shifting, our hypothesis is based mainly in the seasonality of human cases of WNV and SLEV, a pattern that has been demonstrated for our region, see for example the study of Spinsanti et al. (2008).

      We include a new paragraph in the Introduction and Discussion sections. Please see answers Reviewer #1.

      (3) The impacts of certain experimental design decisions are not acknowledged in the manuscript and warrant discussion. For example, the larvae were reared under the same conditions to ensure adults of similar sizes and development timing, but this also prevents mechanisms of action that could occur as a result of seasonality experienced by mothers, eggs, and larvae.

      We understand the confusion that may have arisen due to a lack of further details in the methodology. If we are not mistaken, you are referring to our oversight regarding the consideration of carry-over effects of larvae rearing that could potentially impact reproductive traits. When investigating the effects of temperature or other environmental factors on reproductive traits, it is possible to acclimate either larvae or adults. This is due to the significant phenotypic plasticity that mosquitoes exhibit throughout their entire ontogenetic cycle. In our study, we followed an approach similar to that of other authors where the adults are exposed to experimental conditions (temperature and photoperiod). For a similar approach you can refer to the studies conducted by Ferguson et al. (2018) for Cx. pipiens, Garcia Garcia & Londoño Benavides (2007) for Cx. quinquefasciatus or Christiansen-Jucht et al. (2014, 2015) for Anopheles gambiae.

      (4) There are aspects of the data analysis that are not fully explained and should be further clarified. For example, there is no explanation of how the levels of categorical variables were compared.

      The methodology and statistical analysis were expanded for a better understanding.

      (5) The results show the opposite trend as was predicted by the authors based on observed feeding switches from birds to mammals in the autumn. However, they only state this once at the end of the discussion and never address why they might have observed the opposite trend as was hypothesized.

      The discussion was restructured to focus on our results and our model.

      (6) Generally speaking, the discussion has information that isn't directly related to the results and/or is too detailed in certain parts. Meanwhile, it doesn't dig into the meaning of the results or the ways in which the experimental design could have influenced results.

      As mentioned above, the discussion was restructured to reflect our findings. We also included the effect that our design might have influenced our results. However, as stated above we do not fully agree that the design is inadequate for our analysis, we performed standard protocols followed by other researchers and studies in this research field.

      (7) Beyond the issue of lack of replication limiting trust in the conclusions in general, there is one conclusion reached at the end of the discussion that would not be supported, even if additional replicates are conducted. The results do not show that physiological changes in mosquitoes trigger the selection of new hosts. Host selection is never measured, so this claim cannot be made. The results don't even suggest that fitness might trigger selection because the results show that physiological changes are in the opposite direction as what would be hypothesized to produce observed host switches. Similarly, the last sentence of the abstract is not supported by the results.

      We agree with this observation. However, we did not evaluate the impact of fitness on host selection in this study. Instead, we aimed to investigate the potential influence of seasonality on mosquito fitness as a potential trigger for a shift in host selection. We agree that we have incorrectly used the term “host selection” when we should actually be discussing “host use change”. Our results indicate a seasonal alteration in mosquito fitness in response to temperature and photoperiod changes. Building upon this observation, we re-discussed our hypothesis and theoretical model to explain this seasonal shift in host use.

      (8) Throughout the manuscript, there are grammatical errors that make it difficult to understand certain sentences, especially for the results.

      All English grammar and writing of the manuscript was revised and corrected to be easily understood.

      This study is driven by an interesting question and has the potential to be a valuable contribution to the literature.

      Reviewer #2 (Recommendations for The Authors):

      I hope that the authors will consider the suggested revisions and experimental replication to improve the quality of the study and paper.

      This study tests a very interesting hypothesis. I understand that additional replicates are difficult to conduct, but I do believe that fitness studies absolutely require experimental replicates. Unless you are able to replicate the observed effects, I personally would not trust the results of this study. I hope that you will consider conducting replicates so that this important question can be answered in a more robust manner. Below, I expand upon some additional points in the public review and also provide more specific suggestions. I provided some copy-editing feedback, but was not able to point out all grammatical mistakes. I suggest that you use ChatGPT to help you edit the English. For example, you can feed ChatGPT your MS and ask it to bold the grammatical errors or you can ask it to edit grammatical errors and bold the sections that were edited. I understand that writing in a second language is very difficult (from personal experience!), so I view ChatGPT as a great tool to help even the playing field for publishing. Below are line item suggestions. Apologies that wording is curt, I was trying to be efficient in writing.

      20-21: I suggest that you emphasize that you are investigating the interactive effect.

      Accepted as suggested.

      22: they weren't "reared" (from larvae) in different conditions, they were "maintained" as adults

      Accepted as suggested.

      26-27: increased/decreased is a bit misleading since you did not evaluate these groups sequentially in time. It might be more accurate to describe it as less than/greater than. Also, if you say increased/decreased or less than/greater than, you should always say what you are comparing to. The same applies throughout the MS.

      Accepted as suggested.

      29-30: "finding the" is not correct here; could be "with the lowest..."

      Accepted as suggested.

      34-36: I do not think that your results suggest this, even if you were to replicate the results of this experiment. You haven't shown metabolic changes.

      We understand the point. Accepted as suggested.

      42-44: "one of the main responsible" should be "one of the main species responsible..."

      Accepted as suggested.

      48: I think that "host preference" is better than selection here; -philic denotes preference

      Accepted as suggested.

      50: "Moreover" isn't the correct transition word here

      Accepted as suggested.

      57: "could" isn't correct here; consider saying "... species sometimes feed primarily on mammal hosts, including humans, in certain situations."

      Accepted as suggested.

      58: Different isn't correct word here

      Accepted as suggested.

      60: delete "feeding"

      Accepted as suggested.

      66-68: I am not familiar with any blood meal analysis studies in the southern hemisphere that show host switching for Culex species between summer and autumn. If this hasn't been shown, then this critique of the host migration hypothesis doesn't make sense.

      There are some studies pointing this out (Stein et al., 2013, Beranek 2019), and unpublished data from us). However, our hypothesis has supported by epidemiological data observed in human population which indicate a seasonal activity pattern. It was explained in depth in the Introduction section.

      68: ensures is not the right word; I suggest "suggests"

      Accepted as suggested.

      68-70: this explanation isn't clear to me; please revise

      It will be revised. Accepted as suggested.

      70: change cares to care

      Accepted as suggested.

      76-77: can you explain how they were not supported by the data for the benefit of those who are not familiar with these papers please?

      Accepted as suggested.

      87-89: I suggest the following wording: "In the autumn, we expect a greater number of eggs (fecundity) and larvae (fertility) in mosquitoes after feeding on a mammal host compared to an avian host, and the opposite relationship in the summer."

      Accepted as suggested.

      99: edit for grammar

      Accepted as suggested.

      102: suggest: "...offered a blood meal from a restrained chicken twice a month"

      Accepted as suggested.

      107: powder

      Accepted as suggested.

      108: inbred? Is this the term you meant to use?

      Changed as suggested.

      109: "several" cannot be used to describe 20 generations; suggest using "over twenty generations"; also, it would be good to acknowledge in your discussion that lab adaptation could force evolution, especially since mosquitoes are kept at constant temperatures and fed with certain hosts (with easy access) in the lab. Also, it would be good to know when the experiments were conducted to know the lapse of time between the creation of the colony and the experiments.

      Accepted as suggested.

      110-111: Does humidity vary between summer and fall in Córdoba? If so, I suggest acknowledging in the discussion that if humidity differences are involved in a potential interaction between host species and seasonality, then this would not have been captured by your experimental design.

      Several variables change during seasons. We were interested in capturing the effects of temperature and photoperiod, since humidity is a variable difficult to control.

      113-116: I suggest combining into one sentence to make more concise.

      Accepted as suggested.

      135: You might be obscuring the true impact of seasonality by rearing the larvae under the same conditions. There may be signals that mothers/eggs/larvae receive that influence their behavior (e.g. I believe this is the case for diapause), so this limitation should also be acknowledged. I understand why you decided to do this to control for development time and size, but it is something that should be considered in the discussion.

      As it was explained above, Cx. quinquefasciatus do not suffer diapause in our country. Maintaining mosquitoes from adults was an approach selected by us based on other studies.

      138: edit: "with cotton pads soaked in... on plastic..."; what is plastic glass? Do you mean plastic dishes?

      Accepted as suggested.

      141: here and throughout paragraph, full should be "fully"

      Accepted as suggested.

      144: located should be "placed"

      Accepted as suggested.

      147: suggest editing to "at which point, they were fixed with 1 mL of 96% ethanol and the number of L1 larvae per raft was counted."

      Accepted as suggested.

      154-155: edit for grammar

      Accepted as suggested.

      157: Your GLM explanation doesn't say anything about how you made pairwise comparisons between your levels; did you use emmeans?

      This revised version includes a more detailed methodology and statistical analysis. Accepted as suggested.

      158-160: I don't understand why you took this approach - it seems strange to me to use this analysis, but I am not familiar with it, so it might be that I lack the knowledge to be able to adequately evaluate. Please provide more explanation so that readers can better understand this analysis. A citation for this kind of application of the analysis would be helpful.

      It was changed to be in accordance with the remaining analyses.

      173: replace neither with either

      Accepted as suggested.

      174: this applies throughout; edit to : "An interaction effect was observed..."

      Accepted as suggested.

      175: "it was not found" is grammatically incorrect; instead : "We did not find ..." or "no differences in... were detected", etc

      Accepted as suggested.

      183: "it was detected" is grammatically incorrect

      Accepted as suggested.

      185-186: "being this treatment... in terms of fitness": I do not understand what this means. Please rephrase

      Accepted as suggested.

      170-199: you should provide the effect sizes and p values in text and/or in the figure for the pairwise comparisons

      Accepted as suggested.

      193-196. These two sentences are confusing and I am not sure what you mean, especially in the first sentence.

      It was rewritten. Accepted as suggested.

      Figure 1: This figure is great and easy to read and interpret! Thank you for the comment! 218-219: it is important to state which mosquito species you are referring to here.

      Accepted as suggested.

      226-227: you definitely should acknowledge the small sample size here.

      Considered.

      227: "it was observed" should be "We observed" or "A greater hatching rate.... was observed."

      Accepted as suggested.

      228-229: is the result really comparable even though you took very different approaches to the analysis for these outcomes?

      Changed to be comparable.

      230-278: the discussion of these hypotheses is too long and detailed, especially since the comparison of mouse vs chicken wasn't your main question; you really wanted to understand this in the context of seasonality. I suggest cutting this down a lot and making room to dig into your results more, and also to discuss the potential impacts of your experimental design/limitations on the results.

      Discussion was changed to focus on our results and model. Accepted as suggested.

      281: Hoffman is an old citation; I suggest you cite a modern review.

      Accepted as suggested. We deleted it due to the re-writing of the manuscript.

      282: "It can be recognise".. I am not sure what you are trying to say here

      Accepted as suggested.

      1. After the first time you write a species name, you can abbreviate the genus in all future mentions unless it is at the beginning of a sentence.

      Accepted as suggested.

      303-305: Revise this sentence. E.g "Fewer studies are available regarding photoperiod and show mixed results; Mogi (1992) found that mid and long day lengths induced greater fecundity while Costanzo et al. (2015) did not find differences in fecundity by day length."

      Accepted as suggested.

      315-316: typically, unpublished data shouldn't be referenced; I'm not sure if eLife has a policy on this.

      We will check this with eLife guidelines. However, since the lack of evidence on this pattern we consider important to include this unpublished data.

      316: Aegypti should be lowercase

      Accepted as suggested.

      328-330: This sentence is redundant with the first sentence of the paragraph

      Accepted as suggested.

      321-336: You never reintroduced your hypothesis in your discussion. I suggest that you center your whole discussion more directly around the hypothesis that motivated the study. If you decide not to restructure your discussion, you should at least reintroduce your hypothesis here and discuss how your results do not support the hypothesis.

      Accepted as suggested.

      337-348: This paragraph is a bit confusing as you jump between fertility and hatchability

      Accepted as suggested.

      353: is viral transmission the right word to use here? I think you might mean bridge vector transmission to humans specifically?

      Accepted as suggested.

      357: you say "neither" but never define which traits you are referring to

      Accepted as suggested.

      361: I suggest "two variables previously analyzed separately..."

      Accepted as suggested.

      General: There is no statement about the availability of data; it is eLife policy to require all data to be publicly available. Also, it would be helpful to share your code to help understand how you conducted pairwise comparisons, etc.

      In the submission it was not mentioned anything about data availability. However, all data and scripts will be uploaded with the VOR if it is required.

      Recommendations for the authors:

      I found your study interesting and potentially promising. However, there are some fundamental problems with the study design and the hypothesis, including:

      <(1) Seasonality simulation - Seasonality is strongly associated with time, so it is unusual to simulate seasonal factors without accounting for time. The actual factors associated with seasonal change in reproductive output may be neither a difference in host blood meal nor temperature and photoperiod. It is therefore, odd to reduce seasonality to a difference in photoperiod and temperature in summer and autumn without even mentioning the time of year when the experiment was carried (except for the mention of February as the time the stock samples were collected from the wild).

      The temperature and photoperiod settings are established according to a representative day in both autumn and summer. To determine these settings, we utilized climate data spanning a 3-year period (2020-2022), encompassing the most frequently occurring temperatures and day lengths. The weather conditions remained notably consistent throughout this time frame, which is why the specific year was not mentioned. Moreover, including the year in laboratory experiment details is uncommon, as evident in various papers. This practice can be corroborated by referring to multiple sources (cited in the original manuscript). We mention this in the new version.

      (2) Hypothesis - While the hypothesis alludes to the 'reason' for seasonal host shift, the prediction is on the outcome of the interaction between blood meal type and season.

      It might be nicer to frame your hypothesis to be consistent with the aim, which is, testing the partial contributions of blood meal type, versus photoperiod and temperature to seasonal change in the reproductive output of Culex quinquefasciatus. A hypothesis like that can be accompanied by alternative predictions according to the expected individual and interactive effects of both factors.

      It was rewritten in the revised version to be consistent with our predictions and findings.

      Blood meal type, temperature, and photoperiod are all components of seasonality, so the strength of the study is its potential to decouple the effect of blood meal type from that of temperature and photoperiod on the seasonal reproductive output of Culex quinquefasciatus by comparing the two blood meal types under simulated summer and winter conditions. Ideally, this should have been over a natural summer and winter because a natural time difference captures the effect of other seasonal factors other than temperature and photoperiod.

      Furthermore, the hypothesis stemmed from field observations, while the study itself was conducted under laboratory conditions using a local population of Culex quinquefasciatus from Argentina. It remains uncertain whether there is supporting evidence for a seasonal shift in host usage in Culex quinquefasciatus from the stock population. Discussing the field observations within the stock population would provide valuable insights.

      It was considered in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study seeks to disentangle the different selective forces shaping the evolutionary dynamics of transposable elements (TEs) in the wild grass Brachypodium distachyon. Using haplotype-length metrics, and genetic and environmental differentiation tests, the authors present in large parts convincing evidence that positive selection on TE polymorphisms is rare, and that the distribution of TE ages points to purifying selection being the main force acting on TE evolution in this species. A caveat of this study, as of other studies that seek to assess TE insertion polymorphisms with short reads, is that the rates of false negatives and false positives are difficult to estimate, which may have major effects on the interpretation. This study will be relevant for anyone interested in the role of TEs in evolution and adaptation.

      Thank you for considering our manuscript for publication in eLife. We appreciate the constructive comments and suggestions of the reviewers. We have addressed the raised issues by the reviewers. Below, we provide a more detailed response to each of the reviewer comments.

      Public Reviews:

      Reviewer #1:

      The study presented in this manuscript presents very convincing evidence that purifying selection is the main force shaping the landscape of TE polymorphisms in B. distachyon, with only a few putatively adaptive variants detected, even though most conclusions are based on the 10% of polymorphisms contributed by retrotransposons. That first conclusion is not novel, however, as it had already been clearly established in natural A. thaliana strains (Baduel et al. Genome Biol 2021) and in experimental D. simulans lines (Langmüller et al. NAR 2023), two studies that the authors do not mention, or improperly mention. In contrast to the conclusions reached in A. thaliana, however, Horvath et al. report here a seemingly deleterious effect of TE insertions even very far away from genes (>5kb), a striking observation for a genome of relatively similar size. If confirmed, as a caveat of this study is the lack of benchmarking of the TE polymorphisms calls by a pipeline known for a high rate of false positives (see detailed Private Recommendations #1), this set of observations would make an important addition to the knowledge of TE dynamics in the wild and questioning our understanding of the main molecular mechanisms through which TEs can impact fitness.

      Thank you for your positive evaluation of our paper. We have now adjusted the manuscript to include the mentioned studies (Line 330-333) and to address the issue of false positive and false negative calls. The detailed responses to all the raised points are below.

      Reviewer #2:

      Summary:

      Transposable elements are known to have a strong potential to generate diversity and impact gene regulation, and they are thought to play an important role in plant adaptation to changing environments. Nevertheless, very few studies have performed genome-wide analyses to understand the global effect of selection on TEs in natural populations. Horvath et al. used available whole-genome re-sequencing data from a representative panel of B. distachyon accessions to detect TE insertion polymorphisms (TIPs) and estimate their time of origin. Using a thorough combination of population genomics approaches, the authors demonstrate that only a small amount of the TE polymorphisms are targeted by positive selection or potentially involved in adaptation. By comparing the age-adjusted population frequencies of TE polymorphisms and neutral SNPs, the authors found that retrotransposons are affected by purifying selection independently of their distance to genes. Finally, using forward simulations they were able to quantify the strength of selection acting on TE polymorphisms, finding that retrotransposons are mainly under moderate purifying selection, with only a minority of the insertions evolving neutrally.

      Strengths:

      Horvath et al., use a convincing set of strategies, and their conclusions are well supported by the data. I think that incorporating polymorphism's age into the analysis of purifying selection is an interesting way to reduce the possible bias introduced by the fact that SNPs and TEs polymorphisms do not occur at the same pace. The fact that TE polymorphisms far from genes are also under purifying selection is an interesting result that reinforces the idea that the trans-regulatory effect of TE insertions might not be a rare phenomenon, a matter that may be demonstrated in future studies.

      Weaknesses:

      TEs from different classes and orders strongly differ in multiple features such as size, the potential impact of close genes upon insertion, insertion/elimination ratio (ie, MITE/TIR excision, solo-LTR formation), or insertion preference. Given such diversity, it is expected that their survival rates on the genome and the strength of selection acting on them could be different. The authors differentiate DNA transposons and retrotransposons in some of the analyses, the specificities of the most abundant plant TE types (ie, LTR/Gypsy, LTR/Copia, MITE DNA transposons) are not considered.

      The authors used a short-read-based approach to detect TIPs and TAPs. It is known that detecting TE polymorphisms is challenging and can lead to false negatives, depending on the method used and the sequencing coverage. The methodology used here (TEPID) has been previously applied to other species, but it is unclear if the sensitivity of the TIP/TAP caller is equivalent to that of the SNP caller and how these potential differences may affect the results.

      Thank you for your positive evaluation of our paper. We have now adjusted the manuscript and the discussion to include the mentioned points on the different TE superfamilies and the reliability of the TE calls. The detailed responses to all the raised points are below.

      Private Recommendations:

      Reviewer #1:

      (1) TE polymorphisms (presence and absence variants) were called from short-read sequencing data using a pipeline (TEPID, Stuart et al. eLife 2016) that is known to have a low specificity as well as a low sensitivity in its detection of presence variants (Baduel et al. MIMB 2021). An assessment of the rate of false positives and false negatives in the data presented in this study and how it varies across TE superfamilies is therefore of crucial importance as it may bias all downstream analyses, especially if it impacts the identification of polymorphisms contributed by retrotransposons, as these are the basis of most conclusions of the manuscript. Nonetheless, the fact that the PCA of the polymorphisms contributed by DNA transposons is less able to distinguish genetic clades than with those contributed by retrotransposons, suggests the issue of false positives is most preeminent for DNA transposons. However, high rates of false positives may explain why no significant increase in TE frequency is detected within selective sweep regions, a result that runs against the expectation of hitch-hiking of neutral or weakly deleterious polymorphisms which the authors claim is the category of many TE polymorphisms. Furthermore, given that the reference genome belongs to the B_east clade, and the TEPID is better at calling absence than presence it may bias analyses in this clade (where clade-specific insertions will take the form of absence in other clades which are well detected) compared to other clades (where clade-specific insertions will be presence polymorphisms and may be missed). A benchmark of TE polymorphism calls could be done by de novo assembling one genome from each clade or by cross-checking at least the presence variant calls from TEPID with those made with another of the many TE calling pipelines available.

      We agree with this issue raised by both reviewers regarding the effects of false negative and false positive TE calls. We also think that some reasonable follow-ups should be done to check the potential impact of the false negative and false positive TE calls on the presented results, without turning the manuscript in a method comparison paper as this is not the main goal of this study. Therefore, we generated a subsample of our dataset that included only accession with an average genome wide mapping coverages of at least 20x, as the false negative TE call rate is correlated with the mapping coverage and a high mapping coverage is expected to lead to a reduction in the false negative TE call rates. We then used this subsample to check if our results would change if our dataset had a lower false negative TE call rate. However, reducing the rate of false negative calls through the use of only higher coverage samples did not change our results and interpretations.

      Re-running the ANCOVA analyses revealed similar results regarding the accumulation of TEs in selective sweep regions. This was added to the main text Line 143-148: “Similar results were obtained when investigating the number of fixed TE polymorphisms (Additional file 2: Table S1) and the allele frequency of TE polymorphisms (Additional file 2: Table S2) in high iHS regions using a subset of our dataset with an expected lower false negative TE call rate, that only included samples with a genome-wide mapping coverage of at least 20x (see Discussion and Materials and Methods for more details).” and in Additional file 2: Table S1 and S2.

      Further, we re-ran the age-adjusted SFS based on this subset of our dataset and found that the results and conclusions from the age-adjusted SFS were not only driven by false negative TE calls. This was also included in the text Line 338-349: “One caveat of the approach used in this study is that TE calling pipelines based on short-reads tend to have higher false positive and false negative call rates than SNP calling pipelines, which is also the case for the TEPID TE calling pipeline used here [57, 59]. A high false negative TE calling rate however might bias our TE frequency estimates toward lower frequencies, which could drive the observed patterns in the age-adjusted SFS. To assess if the false negative TE calling rate in our study substantially affected our results, we re-run the age-adjusted SFS on a subset of our dataset only including samples with a genome-wide mapping coverage of at least 20x, as higher mapping coverages are expected to reduce the false negative call rate [27, 59]. Using the TE allele frequencies estimated based on this subset of our data to estimate  frequency revealed similar results of the age-adjusted SFS based on the whole dataset (Additional file 1: Fig. S9), indicating that our observation of retrotransposons evolving under purifying selection is not solely driven by a high false negative TE calling rate.” and in Additional file 1: Fig. S9.

      The details of this analyses have been added to the materials and methods Line 493-498: “Mapping coverage is known to influence false discovery rate [27, 59]. To investigate the impact of false positive and false negative TE calls on our results, we down sampled the TE dataset to only include TEs that have been called in samples that had at least an average mapping coverage of 20x. The allele frequencies of TEs present in our high coverage dataset was recalculated only considering samples with at least an average mapping coverage of 20x. This second TE dataset was then used to check if using a dataset with a higher mapping coverage and presumably a lower false TE calling rate impacted our results.”

      (2) If confirmed, the observation that retrotransposons located more than 5kb away from genes appear to be also affected by purifying selection (L209) is indeed surprising. The authors should add a comparison with SNPs at the same distance from genes to strengthen the claim and make sure it is not the result of mapping artifacts, such as alignment quality dropping far away from genes.

      We added a comparison of the age-adjusted SFS of SNPs and retrotransposons more than 5 kb away from genes to evaluate if the observed shape of the age-adjusted SFS of retrotransposons more than 5 kb away from genes were due to artefacts. The results are included on line 383-389: “Finally, we tested whether TE polymorphisms located more than 5 kb away from genes are evolving under purifying selection could be due to mapping or other artefacts by comparing the shape of the age-adjusted SFS of retrotransposons and SNPs more than 5 kb away from genes. However, the age-adjusted SFS of SNPs 5 kb away from genes differs from the one of retrotransposons (Additional file 1: Fig. S10), indicating that the shape of the age-adjusted SFS of retrotransposons more than 5 kb away from genes is not likely to be the result of artefacts in regions of the genome far away from genes.” and Additional file 1: Fig. S10.

      (3) The authors' claim that most TE polymorphisms are under weak to moderate purifying selection (L273) relies on the comparison of the age of polymorphisms in the oldest age bin with forward simulations. However, the conclusions from these comparisons cannot be extrapolated to the fitness effects of all TE polymorphisms as variants in the oldest age bin are de facto a biased sample of the variants of a category, a point the authors highlight.

      We adjusted the mentioned paragraph to better highlight this point. Line 390-397: “To further ascertain the strength of purifying selection, we used forward simulation and showed that simulations assuming a moderately weak selection pressure (S = -5 or S = -8) against TE polymorphisms best fitted our observed data. In theory, no TE polymorphisms under strong purifying selection should be present in a natural population, as such mutations are expected to be quickly lost, especially in a predominantly selfing species where most loci are expected to be homozygous. Therefore, it is not surprising that TE polymorphisms which persist in B. distachyon are under weak to moderate selection, as also shown, for example, for the L1 retrotransposons in humans [27] or the BS retrotransposon family in Drosophila melanogaster [62].”

      L220-228 for high-effect SNPs. Indeed, the most deleterious TE polymorphisms would be purged very quickly and never contribute to variants in the oldest age bin. Unless new arguments can be made to support this claim, this conclusion should be rephrased to claim instead that even the oldest TE polymorphisms are still mostly non-neutral and under weak to moderate purifying.

      This has been adjusted. Line 231-232: “. Hence, even the oldest retrotransposon polymorphisms seem to be mostly non-neutral and are affected by purifying selection.”

      L214: replace smaller with more negative for clarity.

      Done.

      L233: Given the discussion L220-228, the oldest age bin seems to be biased in its composition and thus not useful for comparisons. The sentence should therefore be rephrased to reflect that DNA transposon polymorphisms appear to be actually less deleterious than high-effect SNPs in S9A and B based on the penultimate age bin.

      This has been fixed.

      Reviewer #2:

      • I wonder if false negative detection could artificially increase the evidence for purifying selection by increasing the amount of low-frequency variants. This could be easily checked if long-read data or genome assembly is available for any of the samples in the collection, by comparing the TIP/TAP prediction with the actual sequence.

      We agree with this point from the reviewers that false negative calls can lead to misinterpretations of the observed low-frequencies of the TEs. (But see response to the first comment of reviewer #1). Unfortunately, long-read data from the sample used here are not available to estimate false negative call rates. However, to check if the observed results are manly driven by high false negative rates, we re-run the age-adjusted SFS based on samples with at least 20x mapping coverage, which should result in the reduction the false negative TE calling rate. The results and conclusions from this second analyses were included in the text Line 338-349: “One caveat of the approach used in this study is that TE calling pipelines based on short-reads tend to have higher false positive and false negative call rates than SNP calling pipelines, which is also the case for the TEPID TE calling pipeline used here [57, 59]. A high false negative TE calling rate however might bias our TE frequency estimates toward lower frequencies, which could drive the observed patterns in the age-adjusted SFS. To assess if the false negative TE calling rate in our study substantially affected our results, we re-run the age-adjusted SFS on a subset of our dataset only including samples with a genome-wide mapping coverage of at least 20x, as higher mapping coverages are expected to reduce the false negative call rate [27, 59]. Using the TE allele frequencies estimated based on this subset of our data to estimate  frequency revealed similar results of the age-adjusted SFS based on the whole dataset (Additional file 1: Fig. S9), indicating that our observation of retrotransposons evolving under purifying selection is not solely driven by a high false negative TE calling rate.” and in Additional file 1: Fig. S9.

      • Supplementary Figure S1. DNA transposons are much worse at separating the samples in comparison to LTR-retrotransposons. Doesn´t this suggest that these two classes have very different dynamics in the population and maybe different intensities of the selection forces acting on them? Could this profile be explained as DNA transposons being older and likely more fixed in all the clades, whereas retrotransposons are more recent and more specific to some populations? Another possibility might be that some B. distachyon DNA transposons had an unusually high excision rate. In any case, in my opinion, this reinforces the need to study the different TE orders in more detail.

      Indeed, different TE orders and superfamilies can have different excision rates, age distributions and be under different selective regimes. To investigate the possibility that different TE orders are affected by very different selective regimes, we split our TE dataset into the four different TE types: Copia, Ty3, Helitron and MITE. We than re-run the age-adjusted SFS analyses and added our results to the text Line 422-430: “To further examine our conclusion on purifying selection, we investigated the selective regime affecting different retrotransposons and DNA-transposons superfamilies. Thereby, we generated age-adjusted SFS for the four most common TE superfamilies Copia, Ty3 (also known under the name Gypsy, but we will avoid using this name because of its problematic nature see [71]), Helitron and MITE and found similar deviations of the  frequency from 0 in the four investigated TE superfamilies (Additional file 1: Fig. S12–S15). These results indicate that our conclusion on the broad effect of purifying selection is not driven by a single TE superfamily but is at least common among the four most numerous TE superfamilies.” and in Additional file 1: Fig. S12- S15.

      • Line 112: "most TE polymorphisms in our dataset were young and only a few were very old". Does this change substantially among TE orders/superfamilies?

      Indeed, there are some differences in the age distribution of the TEs depending on the superfamilies, However, the differences are no substantial as the age bins in the age-adjusted SFS of the different TE superfamilies are fairly similar. See Additional file 1: Fig. S12-S15.

      • Figure 2. Is difficult to read, especially lower panels. I think the grey border of the boxplots makes visualization difficult.

      The gray borders have been removed.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head-restrained mice running down a virtual linear path. Mice were trained to collect water rewards at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in the ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90 s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.

      Strengths:

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis of the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.

      Weaknesses:

      Aspects of the methodology, data analysis, and interpretation diminish the overall significance of the findings, as detailed below.

      The LC axonal recordings are well-powered, but the DA axonal recordings are severely underpowered, with recordings taken from a mere 7 axons (compared to 87 LC axons). Additionally, 2 different calcium indicators with differential kinetics and sensitivity to calcium changes (GCaMP6S and GCaMP7b) were used (n=3, n=4 respectively) and the data pooled. This makes it very challenging to draw any valid conclusions from the data, particularly in the novelty experiment. The surprising lack of novelty-induced DA axon activity may be a false negative. Indeed, at least 1 axon (axon 2) appears to be showing a novelty-induced rise in activity in Figure 3C. Changes in activity in 4/7 axons are also referred to as a 'majority' occurrence in the manuscript, which again is not an accurate representation of the observed data.

      The reviewer points out a weakness in the analysis of VTA axons in our dataset. The relatively low n (currently 7) comes from the fact that VTA axons in the CA1 region of the hippocampus are very sparse and very difficult to record from (due to their sparsity and the low level of baseline fluorescence inherent in long range axon segments). This is the reason they have not been recorded from in any other lab outside of our lab. LC axons, on the other hand, are more abundant in CA1. In the paper when comparing VTA versus LC axons we deal with the mismatch in n by downsampling the LC axons to match the VTA axons and repeated this 1000 times to create a distribution. However, because the VTA axon n is relatively low, it is possible that we have not sampled the VTA axon population sufficiently and therefore have a biased population in our dataset. The issue is that it takes months for the baseline expression of GCaMP to reach sufficient levels to be able to record from VTA axons, and it is typical to find only a single axon in a FOV per animal. There are additional reasons why mice and/or axon recordings do not reach criteria and cannot be included in the dataset (these exclusion criteria are reported in the Methods section). For instance, out of the 54 DAT-Cre mice injected, images were never conducted in 36 for lack of expression or because mice failed to reach behavioral criteria. Another 11 mice were excluded for heat bubbles that developed during imaging, z-drift of the FOV, or bleaching of the GCaMP signal.

      However, we do have n=2 additional VTA axon recordings that we will add to the dataset to bring the n up from 7 to 9. We plan on re-analyzing the data with n=9 VTA axons and making comparisons to down-sampled LC axons as described above. This boost in n will increase the power of our VTA axon analysis. To more formally test whether this is sufficient for statistical tests, we plan to utilize the G*power power-analysis tool to compute statistical power for each of the different tests we use. We will report this in the next version of the paper. However, the n=2 additional axons were nor recorded in the novel environment, so the next version will remain at n=7 for the novel environment analysis. We agree with the reviewer that the lack of the novelty induced DA axon activity may be a false negative, and so we will adjust the description of our results and discussion accordingly.

      During the data collection of VTA axon activity we tried two variants of GCaMP: 6s and 7b, to see if one would increase the success rate of finding and recording from VTA axons. Given the long time-course of these experiments and the low yield in success, we pooled the GCaMP variants together to increase statistical power. Because the 2 additional VTA DA axons that were recorded from expressed GCaMP6s, the next version of the paper will have n=5 GCaMP6s, and n=4 GCaMP7b VTA DA axons, which will allow us to compare the activity of the two sensors in the familiar environment. The reviewer correctly pointed out that the sensors themselves could confound our results, and so they should not be pooled unless we can show they do not produce different signals in the axons. We will make this comparison and report the findings in the next version of the paper. If we find no significant differences, we will pool the data. If differences are detected, we will keep these axons separate for subsequent analysis and comparisons to LC axons.

      The authors conducted analysis on recording data exclusively from periods of running in the novelty experiment to isolate the effects of novelty from novelty-induced changes in behavior. However, if the goal is to distinguish between changes in locus coeruleus (LC) axon activity induced by novelty and those induced by motion, analyzing LC axon activity during periods of immobility would enhance the robustness of the results.

      This is indeed true, and this suggested analysis could further support our conclusions regarding the LC novelty signal. For the next version of the paper, we will use the periods of immobility to analyze and isolate any novelty induced activity in LC axons. However, following exposure to the novel environment, mice spend much less time immobile, therefore there may not be sufficient periods of immobility close in time to the exposure to the novel environment (which is when the novelty signal occurs). We plan to analyze mouse behavior during the early exposure to the novel environment for immobility and check whether we have enough of this behavior to perform the suggested analysis.

      The authors attribute the ramping activity of the DA axons to the encoding of the animals' position relative to reward. However, given the extensive data implicating the dorsal CA1 in timing, and the remarkable periodicity of the behavior, the fact that DA axons could be signalling temporal information should be considered.

      This is a very good point. We agree that the VTA DA axons could be signaling temporal information, as we have previously shown that these axons also exhibit ramping activity when you average their activity by time to reward (Krishnan et. al., 2022). We will conduct this analysis on this dataset. We have not, however, conducted any experiments designed to separate out time from distance, such as the experiments conducted in Kim et. al., 2020. Therefore, we cannot determine whether this is due to proximity in space to reward or time to reward. We will clarify in our text that by proximity, we mean either place or time, and cannot conclude which feature of the experience drives the VTA axon signal.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Kim, HyungGoo R., Athar N. Malik, John G. Mikhael, Pol Bech, Iku Tsutsui-Kimura, Fangmiao Sun, Yajun Zhang, et al. A Unified Framework for Dopamine Signals across Timescales. Cell 183, no. 6 (2020).

      The authors should explain and justify the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments.

      LC axon activity was recorded on a 3m track to match the track length from an experiment we recently published (Dong et al., 2021) in which mice were exposed to a novel 3m track while populations of CA1 pyramidal cells were recorded. In that paper we described the time course of place field formation on the novel track. We wanted to test if LC axons signaled novelty (as we hypothesized) and whether the time course of LC axon activity matched the time course of place field formation. We briefly discuss this in the Discussion section of this paper and hypothesize that LC axons in CA1 could open a window of plasticity in which new place fields can form.

      VTA axons were recorded on a 2m track (same VR tracks as LC axons were recorded on) to match another recent paper from our lab in which reward expectation was manipulated (Krishnan et al, 2022). In that study CA1 populations of pyramidal cells were recorded during the reward expectation experiment. To match the experience during recordings of VTA axons in CA1 to test how reward expectation may influence axon signaling along the track, we also used a 2m track. The idea was to check how VTA dopaminergic inputs to CA1 may influence CA1 population dynamics along the track.

      Although the tracks were identical for LC and VTA recordings for both the familiar and novel tracks in terms of visual cues and design, the track lengths are different (simply modulated by gain control of the rotary encoder). To account for this we normalized the lengths for our comparison analysis. This normalization allows for a direct comparison of the patterns of activity across the two types of axons, controlling for the potential confound introduced by the different track lengths. By adjusting the data to a common scale, we could assess the relative changes in activity levels at matched spatial bins, ensuring that any observed differences or similarities are due to the intrinsic properties of the axons rather than differences in track lengths. However, the different lengths do make the animal’s experience slightly different. This is somewhat offset by the observations in our study that none of the LC or VTA axon signals would be expected to be majorly influenced by variations in track length. For instance, LC axons are associated with velocity and a pre-motion initiation signal, neither of which would be influenced by track length. VTA axons are also associated with velocity, which would not influence a direct comparison to LC axon velocity signals as mice reach maximal velocity very rapidly along the track. VTA axons do ramp up in activity as they approach the reward zone, and this signal could be modulated by track length (or maybe not if the signal is encoding time to reward rather than distance). However, LC axons show no ramping to reward signals, so a comparison across axons recorded on different track lengths for this analysis is justified.

      However, to add rigor to comparisons of axon dynamics recorded along 2m and 3m tracks, we plan to plot axon activity of both sets of axons by time to reward, and actual (un-normalized) distance from reward.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Dong, C., Madar, A. D. & Sheffield, M.E. Distinct place cell dynamics in CA1 and CA3 encode experience in new environments. Nat Commun 12, 2977 (2021).

      Reviewer #2 (Public Review):

      Summary:

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.

      The main findings were as follows:

      • In a familiar environment, the activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.
      • VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.
      • In contrast, the activity of LC axons ramped up before the initiation of movement on the Styrofoam wheel.
      • In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral, and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward, and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.

      Strengths:

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that the activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.

      Weaknesses:

      (1)The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.

      (2) Some aspects of the methodology would benefit from clarification.<br /> First, to help others to better scrutinize, evaluate, and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data.

      We thank the reviewer for helping us formalize the scientific rigor of our study. There are ten ARRIVE Guidelines and we have addressed most of them in our study already. However, there is an opportunity to add detail. We have listed below all ten points and how we have or will address each one.

      (1) Experimental design - we go into great depth explaining the experimental set-up, how we used the autofluorescent blebs as imaging controls, how we controlled for different sample sizes between the two populations, and the statistical tests used for comparisons. We also carefully accounted for animal behavior when quantifying and describing axon dynamics both in the familiar and novel environments.

      (2)Sample size - We state both the number of ROIs and mice for each analysis. Wherever we state how many axons had a certain kind of activity, we will also state the number of mice we saw this activity in. For the next version of the paper, we plan to conduct a power analysis using G*power to assess the power of our sample sizes for statistical analysis.

      (3) Inclusion/exclusion criteria - Out of the 36 NET-Cre mice injected, 15 were never recorded for either failing to reach behavioral criteria, or a lack of visible expression in axons. Out of the 54 DAT-Cre mice injected, images were never conducted in 36 for lack of expression or failing to reach behavioral criteria. Out of the remaining 21 NET-CRE, 5 were excluded for heat bubbles, z-drift, or bleaching, while 11 DAT-Cre were excluded for the same reasons. This was determined by visually assessing imaging sessions, followed by using the registration metrics output by suite2p. This registration metric conducted a PCA on the motion-corrected ROIs and plotted the first PC. If the PC drifted largely, to the point where no activity was apparent, the video was excluded from analysis.

      (4) Randomization - Already included in the paper is a description of random down sampling of LC axons to make statistical comparisons with VTA axons. LC axons were selected pseudo-randomly (only one axon per imaging session) to match VTA sampling statistics. This randomization was repeated 1000 times and comparisons were made against this random distribution.

      (5) Blinding-masking - no blinding/masking was conducted as no treatments were given that would require this. We will include this statement in the next version.

      (6) Outcomes - We defined all outcomes measured, such as those related to animal behavior and related axon signaling.

      (7) Statistical methods - None of the reviewers had any issues regarding our description of statistical methods, which we described in detail in this version of the paper.

      (8) Experimental animals - We described that DAT- Cre mice were obtained through JAX labs, and NET-Cre mice were obtained from the Tonegawa lab (Wagatsuma et al. 2017)

      (9) Experimental procedure - Already listed in detail in Methods section.

      (10) Results - Rigorously described in detail for behaviors and related axon dynamics.

      Wagatsuma, Akiko, Teruhiro Okuyama, Chen Sun, Lillian M. Smith, Kuniya Abe, and Susumu Tonegawa. “Locus Coeruleus Input to Hippocampal CA3 Drives Single-Trial Learning of a Novel Context.” Proceedings of the National Academy of Sciences 115, no. 2 (January 9, 2018): E310–16. https://doi.org/10.1073/pnas.1714082115.

      Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?

      A detailed response to this is written above for a similar comment from reviewer 1.

      Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Figure 3a, but as <0.2 cm/s for the imaging data analysis in Figure 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.

      This is an error leftover from before we converted velocity from rotational units of the treadmill to cm/s. This will be corrected in the next version of the paper.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Figure 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Figure 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Figure 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the noveltyinduced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.

      This is a great point. The decreased velocity in the novel environment could lead to a diminished novelty response in LC axons. We will add a discussion point on this in the next version. This could also be the case for VTA axons, so will add a discussion point that the lack of novelty signaling seen in VTA axons could be due to reduced velocity masking this signal.

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.

      Mice receive their water reward through a waterspout that is immobile and positioned directly in front of their mouth (which is also immobile as they are head fixed) and water delivery is triggered by a solenoid when the mice reach the end of the virtual track. Therefore, because the waterspout remains in the same place relative to the mouse, and the water reward is not delivered until they reach the end of the virtual track, there is nothing for the mice to detect. We will update the paper to make this clearer.

      Additionally, on the initial laps with no reward, the ramping activity is still present (Krishnan et al, 2022) indicating this activity is not directly related to the presence/absence of water but is instead caused by reward expectation.

      Reviewer #3 (Public Review):

      Summary:

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine the activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during the approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength of their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of learning and memory. The conclusions of this manuscript are mostly well supported by the data, but some additional analysis and/or experiments may be required to fully support the author's conclusions.

      Weaknesses:

      (1) During teleportation between familiar to novel environments the authors report a decrease in the freezing ratio when combining the mice in the two experimental groups (Figure 3aiii). A major conclusion from the manuscript is the difference in VTA and LC activity following environment change, given VTA and LC activity were recorded in separate groups of mice, did the authors observe a similar significant reduction in freezing ratio when analyzing the behavior in LC and VTA groups separately?

      In response to this comment, we will analyze the freezing ratios in DAT-Cre and NET-Cre mice separately. However, other members of the lab have seen the same result in other mouse strains (See Dong et al. 2021), so we do not expect to see a difference (but it is certainly worth checking).

      (2) The authors satisfactorily apply control analyses to account for the unequal axon numbers recorded in the LC and VTA groups (e.g. Figure 1). However, given the heterogeneity of responses observed in Figures 3c, 4b and the relatively low number of VTA axons recorded (compared to LC), there are some possible limitations to the author's conclusions. A conclusion that LC-CA1 axons, as a general principle, heighten their activity during novel environment presentation, would require this activity profile to be observed in some of the axons recorded in most all LC-CA1 mice.

      We agree with the reviewer’s point here. To help avoid this problem, when downsampling LC axons to compare to VTA axons, we matched the sampling statistics of the VTA axons/mice (i.e. only one LC axon was taken from each mouse to match the VTA dataset).

      However, in the next version of the paper we will also report the number of mice that we see a significant novel response in. We will also add the number of mice with significant activity for each of the measures in the familiar environment (e.g. how many mice had axons positively correlated with velocity).

      Additionally, if the general conclusion is that VTA-CA1 axons ramp activity during the approach to reward, it would be expected that this activity profile was recorded in the axons of most all VTA-CA1 mice. Can the authors include an analysis to demonstrate that each LC-CA1 mouse contained axons that were activated during novel environments and that each VTA-CA1 mouse contained axons that ramped during the approach to reward?

      As stated above, we will add the number of mice that had each activity type we reported here.

      (3) A primary claim is that LC axons projecting to CA1 become activated during novel VR environment presentation. However, the experimental design did not control for the presentation of a familiar environment. As I understand, the presentation order of environments was always familiar, then novel. For this reason, it is unknown whether LC axons are responding to novel environments or environmental change. Did the authors re-present the familiar environment after the novel environment while recording LC-CA1 activity?

      This is an important point to address. While we never varied the presentation order of the familiar vs novel environments, we did record the activity of LC axons in some of the mice in a dark environment (no VR cues) prior to exposure to the familiar environment. We will look at these axons to address whether they respond to initial exposure to the familiar environment. This will allow us to check whether they are responding to environmental change or novelty. We will add this analysis to the next version of the paper.

    1. ABSTRACTAs genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.108) as part of our Spatial Omics Methods and Applications series (https://doi.org/10.46471/GIGABYTE_SERIES_0005), and has published the reviews under the same license as follows:

      **Reviewer 1. Chunquan Li **

      1. Page 1, Lines 14-16. The authors indicate that “it is crucial to thoroughly investigate the batch effects in the dataset before integrating and processing the data”. The term “thoroughly” may be not accurate enough. The current method can alleviate the batch effects, but it can’t thoroughly solve the related problems. In addition, this work proposes a batch evaluation tool, such “reasonably evaluate the batch effects” may be more accurate than “thoroughly investigate the batch effects”.
      2. In Figure 1, does the first box is “integrated datasets”?
      3. Page 5, Line 168, and Page 6, Lines 169-175, the content of these two paragraphs is similar, with some redundant descriptions. It is recommended to organize and write them into one paragraph.
      4. There is Table 1 in the table list, but Table 1 is missing in the main text.
      5. Page 8, Discussion section, it is better to discuss the differences between the proposed tool and a similar tool “batchQC”, especially the advantages of the proposed tool.
      6. Some other minor issues: Page 1, Line 22, “to do so” should be “to do it”. Page 3, Line 100, Ref. [13] should be cited when it first appears on Line 97. Page 4, Line 114 and Page 5, Line 146, “UMAP” should be given its full name when it first appears and abbreviated directly in the following text. The variable should be in italics, such as “p” on Page 4, Line 119, “H” on Page 6, Line 184.

      Reviewer 2. W. Evan Johnson and Howard Fan

      Is the source code available, and has an appropriate Open Source Initiative license (https://opensource.org/licenses) been assigned to the code?

      Yes. However, the code could use substantial improvements.

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      No. The manuscript is missing a section describing the software and its implementation.

      Is there enough clear information in the documentation to install, run and test this tool, including information on where to seek help if required?

      Yes. But it took a while to get it installed.

      Have any claims of performance been sufficiently tested and compared to other commonly-used packages?

      No. I think the most glaring deficiency in the paper is the lack of comparison with other methods. For example, there is no comparison of the tools available in BatchEval compared to other methods, such as BatchQC. Also, they mention that BatchQC might not work on larger datasets, but they perform no performance evaluation for BatchEval, and no comparison with BatchQC to demonstrate improved performance.

      Are there (ideally real world) examples demonstrating use of the software?

      Yes. Missed opportunity--I think the most exciting thing I observed from the paper was that the example data were from spatial transcriptomics data! To my knowledge, existing batch effect methods are not directly adapted to manage these data (although they did mention tools like BatchQC cannot handle large datasets, which may be true). But they don’t mention anything about batch adjustment/evaluation in spatial data in the manuscript. I feel that if the authors address this niche it would increase the value/impact of their work!

      Additional Comments:

      This review was conducted and written by Evan Johnson, who developed the competing BatchQC software.

      The authors provide an interesting toolkit for assessing batch effects in genomics data. The paper was clear and well-written, albeit I had a few concerns (see below). We were also able to download the associated software and test it out (comments below as well).

      I think the most exciting thing I observed from the paper was that the example data were from spatial transcriptomics data! To my knowledge, existing batch effect methods are not directly adapted to manage these data (although they did mention tools like BatchQC cannot handle large datasets, which may be true). But they don’t mention anything about batch adjustment/evaluation in spatial data in the manuscript. I feel that if the authors address this niche it would increase the value/impact of their work!

      In addition, this toolkit is written in Python, while BatchQC and other tools are written in R, so this is an advantage of the method as well—it addresses an audience that uses Python for gene expression analysis (not as big as the R community, but substantial). Their Python toolkit might also be more accessible to implementation in a pipeline workflow (for a core or large project) than R-based tools like BatchQC—this might be important to mention this as well.

      I think the most glaring deficiency in the paper is the lack of comparison with other methods. For example, there is no comparison of the tools available in BatchEval compared to other methods, such as BatchQC. Also, they mention that BatchQC might not work on larger datasets, but they perform no performance evaluation for BatchEval, and no comparison with BatchQC to demonstrate improved performance.

      Similarly, the authors claim: “Manimaran [10] has developed user-friendly software for evaluating batch effects. However, the software does not take into account nonlinear batch effects and may not be able to provide objective conclusions.” I don’t understand what the authors mean by “may not be able to provide objective conclusions” – BatchQC provides – several visual and numerical evaluations of batch effect – more so than even the proposed BatchEval does. Did the authors mean something else, maybe that the lack of non-linear correction may lead to less accurate conclusions?

      A related concern: does BatchEval provide non-linear adjustments? I may have missed this, but it seems that BatchEval is not providing non-linear adjustments either. Also, regarding non-linear adjustments, the authors should show in an example the problems with a lack non-linear adjustments and show that pre-transforming the data before using BatchQC does not perform as well as the non-linear BatchEval adjustments.

      In Equation 10, should “batchScore” be BatchEvalScore?

      Also, in the bottom of Figure on page 15, should the “BatchQCScore” also be BatchEvalScore??

      The manuscript is missing a section describing the software and its implementation.

      I asked my research scientist, who recently graduated with his PhD in Bioinformatics, to assess the software and examples. First of all, much of the software is named “BatchQC”. I think this is confusing, since the method is really named BatchEval and it will be confused with BatchQC which is another existing/competing software. Furthmore, it took him a significant effort to install the BatchEval software and get is working on our cluster. I would recommend the authors make their software more accessible and easier to install.

      The output of the software was a nice .html report diagnosing the batch effects in the data—very useful (attached is a combined .pdfs of the .htmls that we generated). We were also able to generate a report for the harmony adjusted example using their code. One major disadvantage was that these reports are separate files, and this could get very complicated comparing cases using multiple batch effect methods that will all be in separate reports (refer to a recent single cell batch comparison that compared more than a dozen methods – Tran et al. Genome Biology, 2020 – it would be hard to use BatchEval for this comparison).

      Also, it seems that the user is required to conduct the batch correction themselves, BatchEval does not help with the correction except for their example code for Harmony.

      Finally, on comparing the raw and Harmony adjusted datasets, inspection of the visual assessments (e.g. PCA) show some improvement—although not a perfect correction. But must of the numerical assessments are still the sample. The BatchEvalScore in both cases leads to the conclusion “Need to do batch effect removal”. What’s missing is the difference or improvement that Harmony makes on its correction. Maybe this is just because Harmony doesn’t fully remove the batch effects? Or is there something not working in the code? Might be good to see another example where the batch effect correction improves the BatchEvalScore significantly.

      Additional Files: https://gigabyte-review.rivervalleytechnologies.com/journal/gx/download-files?YXJ0aWNsZT00NDImZmlsZT0xNzEmdHlwZT1nZW5lcmljJnZpZXc9dHJ1ZQ~~

      Re-review:

      I find this paper to be much improved in this version. The authors have clearly worked hard to address my concerns and have addressed them in a satisfactory manner. I fully support the publication of this paper, and I believe their tools are a nice addition to the field.

    1. Reviewer #3 (Public Review):

      Summary:<br /> In this study, the authors collected a large set of data on root traits and root-associated microbes in the root endosphere and rhizosphere in order to integrate these important organisms in the root economics spectrum. By sampling a relatively large set of species from the subtropics along an elevation gradient, they tested whether microbial functions covary with root traits and root trait axes and if so, aimed to discuss what this could tell us about the (belowground) functioning of trees and forests.

      Strengths:<br /> The strengths of this study lie mostly in the impressive dataset set the authors compiled: they sampled belowground properties of a relatively large number of tree species from an understudied region: i.e., the subtropics, where species-level root data are notoriously scarce. Secondly, their extensive sampling of associated microbes to integrate them in the root economics space is an important quality, because of the strong associations between roots and fungi and bacteria: soil microbes are directly related to root form (e.g., mycorrhizal fungi and root diameter and SRL), and function (e.g., taking up soil nutrients from various sources). Thirdly, the PCA figures (Figures 2 and 3) look very nice and intuitive and the paper is very well written.

      Weaknesses:<br /> That said, this study also has several methodological weaknesses that make the results, and therefore the impact of this study difficult to evaluate and interpret.

      (1) Design: The design of this study needs further explanation and justification in the Introduction and Methods sections in order to understand the ecological meaning of the results. Root traits and microbial community composition differ with their environment, and therefore (likely) also with elevation. Elevation is included in the redundancy analysis as a main effect, but without further environmental information, its impact is not ecologically meaningful. What is the rationale for including an elevation gradient in the design and as a main effect in the analyses? Do environmental conditions vary across altitudes and how, and if so, how would this impact the data?

      What is the rationale behind sampling endosphere and rhizosphere microbial communities - why do both? And why also include pathogens - what are their expected roles in the RES? What do we know about this already? The introduction needs a more extensive literature review of these additional variables that are included in the analyses.

      (2) Units of replication and analysis in the model: What are the units of replication and analyses, e.g., how many trees were sampled per species, how many species or trees per elevation, and how many plots per elevation? Were all 11 plots at different elevations and if so, which ones? The level of analysis for the redundancy analyses is not entirely clear: L. 404 mentions that the analyses were done 'across the rhizosphere and root tissue samples', but is that then at the individual-tree level? If so, it seems that these analyses should then also account for dependencies between trees from the same species and phylogeny (as (nested) covariates or random factors). With the information provided, I cannot tell whether there was sufficient replication for statistical interpretations.

      (3) PCA: The results of the parallel analyses are not described: which components were retained? Because the authors aim to integrate microbial functions in a root economics space, I recommend first demonstrating the existence of a root economics space across the 52 subtropical species before running a PCA that includes the microbial traits. The PCA shown in this study does not exactly match the RES and this could be because traits of these species covary differently, but may also simply result from including additional traits to the PCA.

      Also, the PCA's shown are carried out at the individual-tree level. I would recommend, however, including the species-level PCA's in the main text, because the individual-level PCA may not only reflect species-inherent ecological strategies (that e.g., the RES by Bergmann et al. 2020 describe) but also plasticity (Figures 2 and 3 both show an elevation effect that may be partly due to plasticity). While the results here are rather similar, intraspecific differences in root traits may follow different ecological principles and therefore not always be appropriate to compare with an interspecific RES (see for example Weemstra & Valverde-Barrantes, 2022, Annals of Botany).

      I could not deduce whether tree species in the "fungal PCA" (Figure 2) were assigned as AM or EcM based on Table 1, or based on their observed fungal community composition. In the former case, the fungal functional guild gradient (from EcM to saprotrophs and AM) is partially an artificial one, because EcM tree species are not AM species (according to Table 1) and therefore, by definition, constitute a tradeoff or autocorrelation. And, as the authors also discuss, AM tree species may host EcM fungal species. Before I can evaluate the ecological meaning of PC1, and whether or not it really represents a mineral/organic nutrient gradient, information is needed on which data are used here.

      I do not agree with the term 'gradient of bacterial guilds' (i.e., PC1 in Figure 3). All but 1 bacterial 'function' positively loaded on PC1 and 'fermentation' was only weakly negatively correlated with PC1. I do not think this constitutes a 'bacterial gradient'.

      (4) Soil samples: Were they collected from the surrounding soil of each tree (L. 341), or from the root zone (L. 110). The former seems to refer to bulk soil samples, but the latter could be interpreted as rhizosphere soils. It is therefore not entirely clear whether these are the same soil samples, and if so, where they were sampled exactly.

      Aims:<br /> The authors aimed to integrate endospheric and rhizospheric microbial and fungal community composition in the root economics space. Owing to statistical concerns (i.e., lacking parallel analysis results and the makeup of the PCs (AM versus EcM classification), I am not sure the authors succeeded in this. Besides that, the interpretation of the axes seems rather oversimplified and needs some consideration.

      Root N is discussed as an important driver of fungal functional composition. Indeed, it was one of the significant variables in the redundancy models predicting microbial community composition, but its contribution to community composition was small (2 - 3 %), and the mechanistic interpretation was rather speculative. Specifically, the role of root N in root (and tree) functioning remains highly uncertain: the link with respiration and exudation is increasingly demonstrated but its actual meaning for nutrient uptake is not well understood (Freschet et al. 2021. New Phytologist). If and how root economics (represented by root N) and the fungal-driven nutrient economy (EcM versus AM, saprotrophs) can indeed be integrated into a unified framework (L. 223 - 224) seems a relevant question that is worth pursuing based on this paper, but in my opinion, this study does not clearly answer it, because the statistical analyses might need further work (or explanation) and underlying mechanisms are not well explained and supported by evidence.

      In addition, the root morphology axis was indeed independent of the "fungal gradient", but this is in itself not an interesting finding. What is interesting, but not discussed is that, generally, AM species are expected to have thicker roots than EcM tree species (Gu et al. 2014 Tree Physiology; Kong et al. 2014 New Phytologist). I am therefore curious to see why this is not the case here? Did the few EcM species sampled just happen to have very thick roots? Or is there a phylogenetic effect that influences both mycorrhizal type and root thickness that is not accounted for here (Baylis, 1975; Guo et al., 2008 New Phytologist; Kubisch et al., 2015 Frontiers in Plant Science; Valverde-Barrantes et al., 2015 Functional Ecology; 2016 Plant and Soil)?

      I also do not agree with the conclusion that this integrated framework 'explained' tree distributions along the elevation gradient. First of all, it is difficult to interpret because the elevation gradient is not well explained (e.g., in terms of environmental variation). Secondly, the framework might coincide with the framework, but the framework does not explain it: an environmental gradient probably underlies the elevation gradient that may be selected for species with certain root traits or mycorrhizal types, but this is not tested nor clearly demonstrated by the data. It thus remains rather speculative, and it should be more thoroughly explained based on the data observed. Similarly, I do not understand from this study how root traits like root N can influence the abundance of EcM and pathogenic fungi (L. 242 - 243). Which data show this causality? It seems a strong statement, but not well supported (or explained).

      Impact:<br /> The data collected for this study are timely, valuable, and relevant. Soilborne microbes (fungi and bacteria; symbionts and pathogens) play important roles in root trait expressions (e.g., root diameter) and below-ground functioning (e.g., resource acquisition). They should therefore not be excluded from studies into the belowground functioning of forests, but they mostly are. This dataset therefore has the potential to improve our understanding of this subject. Making these data publicly available in large-scale datasets that have recently been initiated (e.g., FRED) will also allow further study in comparative (with other biomes) or global (across biomes) studies.

      Technically, the methodology seems sound, although I lack the expertise to judge the Molecular Methods (L. 349 - 397). However, owing to some statistical uncertainties mentioned above (that the authors might well clarify or improve) and the oversimplified discussion, I am hesitant to determine the impact of the contents of this work. Statistical improvements and/or clearer explanation/justification of statistical choices made can make this manuscript highly interesting and impact, however.

      Context:<br /> As motivated above, I am not sure to what extent the EcM - AM/saprotroph presents a true ecological tradeoff. However, if it does, this work would fit very well in the context of the mycorrhizal-associated nutrient economy (Phillips et al. 2013 New Phytology). This theory postulates that EcM trees generally produce low-quality litter (associated with 'slow traits') that can be more readily accessed by EcM but not AM fungi, thereby slowing down nutrient cycling rates at their competitive advantage, and vice versa for AM tree species. This study did not aim to test the MANE, so it was beyond its scope to study litter quality, and the number of EcM and AM species was unbalanced (8 EcM versus 44 AM species): nonetheless, the denser roots of EcM species and higher root N of AM species indicates that the MANE may also apply to this subtropical forest and may be an interesting impetus for future work on this topic. It might also offer one way to bridge the root economics space and the MANE.

      What I also found interesting is the sparse observations of EcM fungal taxa in the root endosphere of species typically identified as AM hosts (L. 212 - 214). While their functionality remains to be tested (fungal structures in the endosphere were not studied here), this observation might call for renewed attention to classifying species as AM, EcM, or both.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We appreciate the positive and constructive comments of the reviewers on our paper. Below please find our point-by-point response to their comments.

      Reviewer #1:

      Main comments:

      1) The expression levels of many genes, including some major TFs (like CEBPa or HNF4) in isolated primary hepatocytes greatly differ from that in normal liver. This is due to the disruption of cell-cell contacts. For this reason, single nuclei sequencing is more reliable and it is the preferred method. It is not indicated how many biological replicates were used and what level of variability was observed between different preparations.

      We thank the reviewer for pointing out the immediate response of hepatocytes to dissociation, including in expression of CEBPa or HNF4 (this reviewer) and stress-related genes (reviewer 3), which we were aware of.

      Unfortunately, however no perfect method exists to explore only hepatocytes in the context of the liver and single nuclei RNA-seq, which was not available at the start of our study, also has its limitations (e.g. substantial ambient RNA contamination, a lower median number of genes detected and potential for biases and higher doublet rates due to increased amplification steps (PMID: 34515767)).

      Importantly, in our current study, we were interested in exploring gene regulatory networks in hepatocytes by the combination of RNA-seq and ATAC-seq. In our hands, data that we obtained from single cell ATAC-seq was far too shallow and noisy to predict gene regulatory networks. Hence, we needed to rely on pure populations of hepatocytes to perform our studies with bulk ATAC-seq, for which we optimized perfusion and subsequent density gradient centrifugation. While we succeeded in obtaining a very pure hepatocyte population, we agree with the reviewer that due to dissociation-associated changes the results that we obtain might not fully reflect the events happening in hepatocytes in the liver.

      To address this issue brought up by reviewer 1 and 3, i) we will better indicate our rationale within the manuscript, and the limitations as indicated by both reviewer 1 and 3; ii) to provide an overview of potential changes that were induced by the perfusion procedure that we applied, we will compare the hepatocyte RNA-seq transcriptomes that we obtained with in vivo liver RNA-seq, with specific attention to transcription factors and stress-related genes (see reviewer 3, point 1); iii) we will better separate in the figures data obtained from hepatocytes versus data obtained from liver (see also point 2 from this reviewer).

      Additionally, we will indicate how many replicated were used, and the level of variability between different preparations (donors).

      2) The regulome studies involved analysis of ENCODE data sets (ChIP-seq), while the RNA-seq data were obtained in the current work. Due to the different source of the data (e.g primary hepatocytes used for ENCODE consortia members and this study) differences are expected. In the present study the cells were FACS-sorted immediately after isolation, while the ones used to produce ENCODE data sets were not subjected to sorting and were also probably cultured. This limits the accuracy of comparisons. Furthermore, the authors should indicate exactly which ENCODE data-sets were used.

      It is also unusual to observe broad distribution of the ATF3, JUND and EGR1 ChIP-seq reads over the PCK1 gene or the Alb gene (Fig S3). Peaks called by MACS should be indicated. Have the authors verified this distribution, e.g by ChIP-PCR or other means? It is quite unlikely that binding motifs are present all over the gene bodies. Is it possible that these factors interact with elongating RNA Pol-II complexes? What is the situation in other actively transcribing gene bodies?

      In the first paragraph of this comment, the reviewer rightfully points out that we use data from different sources in the first part of our study: scRNA-seq and ATAC-seq from perfusion-obtained hepatocytes (this study) and ENCODE ChIP-seq data which, in contrast to what the reviewer seems to assume, is obtained from liver (as profiled by ENCODE).

      We did choose to use ChIP-seq data from liver tissue to corroborate our findings in isolated hepatocytes in the tissue of origin (largely composed of hepatocytes). Indeed, the near perfect co-localization of HNF4A and ATF3/EGR1 in liver tissue and the enrichment of corresponding DNA motifs in our ATAC-seq data strongly suggests interaction between bZIP family members and hepatocyte-specific transcription factors (including HNF4A) and hence support our conclusion.

      To further address this issue, we will better separate the data obtained from hepatocytes versus data obtained from liver in the figures and include additional data for liver if available (see also point 1 from this reviewer). Additionally, we will indicate exactly which ENCODE datasets were used (see table below). Where relevant, we will explicitly mention the limitations/confounding factors of our analysis.

      EGR1-liver ChIP-seq

      ENCODE Project Consortium

      ENCFF389LQC, ENCFF132PDR

      JUND-liver ChIP-seq

      ENCODE Project Consortium

      ENCFF215GBK, ENCFF978CPC

      ATF3-liver ChIP-seq

      ENCODE Project Consortium

      ENCFF522PUA, ENCFF094LXX

      HNF4A-liver ChIP-seq

      ENCODE Project Consortium

      ENCFF302XOK, ENCFF500ZBE

      FOXA1-liver ChIP-seq

      ENCODE Project Consortium

      ENCFF765EAP, ENCFF945VNK

      CTCF-liver ChIP-seq

      ENCODE Project Consortium

      ENCFF002EXB

      RAD21-liver ChIP-seq

      ENCODE Project Consortium

      ENCFF643ZXX, ENCFF171UDL

      EGR1- K562 ChIP-seq

      ENCODE Project Consortium

      ENCFF000PZK, ENCFF000PZP

      JUND- K562 ChIP-seq

      ENCODE Project Consortium

      ENCFF000YSC, ENCFF000YSE

      ATF3- K562 ChIP-seq

      ENCODE Project Consortium

      ENCFF000PWC, ENCFF000PWA

      With respect to the second paragraph: We obtained these liver tissue ChIP-seq profiles from ENCODE, in which these have gone through thorough validation procedures. Furthermore, we do observe very similar patterns with a complementary, but independent approach, ATAC-seq in hepatocytes. Hence, we do not think that further validation by ChIP-qPCR will have much added value.

      We will follow the advice of the reviewer by i) indicating MACS peaks in our examples, ii) check whether ChIP-seq peaks in coding regions are typical for these datasets. If not, we will show better examples. If they are, we will are investigate potential motifs present in gene bodies, iii) investigate literature for a possible link between these factors and elongating RNA Pol-II complexes; and iv) investigate actively transcribing gene bodies

      3) The synergism between AP1 and HNF4 is based on RNA and ChIP data in Primary hepatocytes. The main evidence for the synergism are co-binding of the two factors and the regulome profiles in the individual cells. In ICOs where both factors are expressed at high levels ChIP-seq data are not available and the potential binding distribution is estimated by the presence of binding motifs in ATAC-seq positive areas. Considering the concern described in point 2, it is important to obtain ChIP-seq data in ICOs too.

      We would like to point out that, we make the central observations on overlapping regulatory modules in perfusion-derived hepatocytes, the ChIP-seq data to show co-binding of AP-1 and other factors with HNF4A (Fig 2c-f; Fig S3c-e) is all based on liver tissues. By showing this in the tissue or origin, we feel we provide sufficient evidence for the (potential) interplay between these factors in the liver, making ChIP-seq in ICOs redundant and beyond the scope of this study.

      In addition, more direct experimental evidence for the synergism is needed. For example, demonstrating the synergism between HNF4 and some AP1 factors in specific genes by co-transfection experiments.

      With regards to the potential synergy between HNF4 and AP1 in adult hepatocytes: previous studies have shown an essential role for c-Jun (part of AP1) in normal hematogenesis, with hepatocytes being rounded and detached in c-Jun KO mice (PMID: 8371760). This clearly shows the critical role of c-Jun in liver development and support to a potential interaction with HNF4 factors.

      Yet, we agree with the reviewers that co-transfection (or knock down) experiments would be an elegant means to further support our conclusion. Unfortunately, however, PHHs are refractory to transfection making this experiment nearly impossible. Hence, instead we will tone down our statements about cooperation between these factors, instead referring to overlapping regulatory modules and co-binding as we observe.

      4) Transcriptome comparisons between primary hepatocytes and intrahepatic cholangiocyte organoids (ICO) or ICOs cultured in hepatocyte differentiation medium (DM-ICO) were performed before (Ref. 6). These cells were derived from the same donor. In the current study ICOs were obtained from a biobank, thus they were from different donors. Differences between the expression patterns of primary cells and EM-IOC and DM-IOC organoid cultures are expected even if they derived from the same donor. In Ref.6 it is clearly demonstrated that DM-IOCs closely mimic many, but not all aspects of the liver phenotype. The present paper therefore provides only incremental new knowledge about the usefulness of organoid cultures in general. On the other hand, the scRNA-seq data with cells from the organoids point to the lack of zonation, which is an important new information, not analysed in Ref.6

      We agree with the reviewer that the EM-ICOs and DM-ICOs have been well characterized in the ground-breaking works Reference 6. Indeed, in Figure 5d of Reference 6, it is shown that DM-ICOs display more comparable expression profile to hepatocytes than EM-ICOs. However, there are also clear differences between hepatocytes and DM-ICOs, indicating incomplete differentiation of the later. In our study, we now make the important observation that the differentiation potential of ICOs at least in part depends on the expression of ELF3 (Figure 3B).

      To address this issue, we will put emphasis on the findings in Ref 6, and we will put our observations in better perspective in relation to Ref 6.

      5) In the methods section the description of ICO culture conditions are very epigrammatic. It refers to previously published protocols but also mentions the addition of BMP7 in the first round of culturing without explaining why was this important. It would be useful if the authors describe exactly the culture conditions they used. Were the ICOs from the biobank established under culture conditions described in Ref 6 or by previous protocols?

      We apologize for this being unclear. We will include this information in the revised manuscript.

      6) The results about ELF3 function are interesting and convincing. This is a novel finding and may worth to perform a global transcriptome analysis and some immunostainings with specific markers in siELF3 cells to further strengthen its regulatory role in cholangiocyte-hepatocyte conversion.

      We agree with the reviewer. To follow this up, we will perform RNA-seq during differentiation of ICOs towards hepatocytes, with and without siRNA-mediated ELF3 knockdown. This will further reveal the precise regulatory role of ELF3 in during hepatocyte differentiation.

      Reviewer #2:

      Comments:

      1) Hepatocyte nuclear factors do not form a transcription factor (TF) family, they are from different TF families: the nuclear receptor, homeobox, and forkhead TF (super)families.

      We thank the reviewer for pointing the mistakes in points 1 to 6 with regards to the naming of protein and protein families in our manuscript, we apologize for these inaccuracies. We will correct these naming and references, and check for any further inconsistencies.

      2) AP-1 is not a TF family either. It is basically a heterodimer of FOS and JUN (sub)family members, which are part of the bZIP (super)family such as C/EBPs and ATF3, which latter is related to JDP2.

      We will adapt this.

      3) EGR1 is not a bZIP protein, it is a zinc finger protein from the EGR family. Was the motif of EGRs enriched? Only the motif of C/EBPs is shown on Fig. 2D.

      We will adapt this. We will also analyze whether the motif of EGRs is enriched

      4) RAD21 is not a TF, it is part of the Cohesin ring, which is associated to the insulator-binding CTCF.

      We will adapt this.

      5) EP300 (Fig. 2A) and PPARGC1A (Fig. 3B) are not TFs, they are co-regulators, basically co-activators, which can interact with several TFs. EP300 is otherwise not so specific, its presence in the chromatin is one of the major active enhancer marks.

      We will adapt this.

      6) DNA sequence motifs are typically not specific for a single TF, rather for a TF (sub)family, so based on a motif, it is usually not possible to identify a certain TF (Fig. 3F). Are there other nuclear receptors, SOX or ETS proteins that can bind to the identified motifs? (For example, FLI1 and several other ETS proteins can bind to the motif of ELF3/EHF, or there are several DR1-binding nuclear receptor dimers like HNF4/HNF4 or PPAR/RXR.)

      We agree with the reviewer. We will analyze this and adapt the manuscript according to our findings.

      &) Although the manuscript is easy to follow and understand, it needs to be checked for grammar.

      We have asked a native speaker to proofread and adapt the manuscript.

      Reviewer #3:

      1) It is well known that perfusion of primary hepatic tissues (mice and human) results in immediate genetic responses, which will be captured right away in the performed RNASeq analysis. Stress pathways are upregulated and will normalize when the cells are put in culture for a couple of days. (Not too long, as they then undergo EMT and de-differentiate into non-parenchyma cells.) These responses can influence the expression profiles observed.

      We thank the reviewer for this comment. Please see how we will address this concern in our reply to reviewer 1, issue 1, who raised a very similar point.

      2) Why were the organoid cultures not differentiating properly into hepatocytes using different media cocktails (EM versus DM)? They seem to maintain cholangiocyte features, which questions the culture conditions used.

      We thank the reviewer for the chance to clarify this important point. We like to stress that we do use the standard differentiation protocol as published (which we will also better detail in our material methods) and it does lead to differentiation towards hepatocyte like cells (both morphologically and gene expression-wise). However, what is not highlighted in previous publications, but broadly observed in the field, is that this differentiation is far from being complete and that the extent to which proper differentiation occurs varies between organoids from different donors. In our study, we now make the important observation that the differentiation potential of ICOs at least in part depends on the expression of ELF3 (Figure 3B).

      3) The authors found the up-regulation of the AP-1 family proteins such as ATF3 and EGR1 which are known to induce apoptosis/cell death. Hepatic organoids are often found to have the un-intended necrotic core development which is caused by the oxygen diffusion matter and this issue is highly likely relevant to the size of the organoids. So, it would be advisable to specify the size of hepatic organoids (i.e., diameter) and check the necrosis-related genes.

      To follow-up on this comment of the reviewer: We will measure the size of our organoids. These organoids indeed are typically hollow inside and hence we will check the expression of necrosis related genes and adjust our conclusions accordingly.

      4) The KD approach with ELF3 in the ICOs is a good way forward, however only a minor number of hepatocellular genes are recovered, questioning the central role of ELF3 in driving the hepatocellular program. Functional assays, such as albumin release, bile acid production and CYP450 response should be coupled with the gene expression analysis.

      In line with the response to reviewer 1 (point 6) we will perform RNA-seq to better characterize ELF3 KD-associated genes expression changes including genes typical and functionally relevant for hepatocyte function (e.g. albumin release and bile acid secretion)

      5) The manuscript should be supplemented by adding the statement regarding the specific reason why a different set of donors was selected for two transcriptomics. The authors used three different donors for scRNA-seq and other two donors for the ATAC-seq. It seems better if all five donors were used for both transcriptomics analyses to reduce the inconsistent proportion of primary human hepatocytes (PHHs) from each donor. In addition, the donors which are selected should have identical genetic backgrounds for in-depth analysis of PHHs. The various backgrounds such as age, sex and ethnicity cause the transcriptional and translational heterogeneity. The authors need to explain the criteria on the selection of the donors.

      We do agree with the reviewer that ideally all experiments are performed on the same set of donors. However, PHHs are obtained from surgical margins and hence provide a very limited source, leading to different experiments being performed on different donors. Importantly, the replicates for each experiment type have been obtained from multiple donors enabling us to capture common rather than donor specific expression/chromatin accessibility signatures.

      Within the revised manuscript, we will include a paragraph on the criteria on the selection of the donors, and why a different set of donors was selected for two transcriptomics. Also, we will provide information with respect to the background of the donors.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We appreciate the time and effort that you and the reviewers have dedicated to providing your valuable feedback on our manuscript. Those comments are all valuable and very helpful for revising and improving our paper, as well as the importance guiding significance to our researches. We have highlighted the changes in yellow within the manuscript.

      *Here is a point-by-point response to the reviewers’ comments and concerns. *

      Comments from Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The provided document, titled "Camel Milk Affects Serum Metabolites by Modulating the Intestinal Microflora," is an extensive research paper. My summary covers the first 44 pages of the total 63 pages. The document begins with a standard review commons manuscript notice and provides contact information for the Review Commons office.

      The research focuses on the effects of camel milk on serum metabolites and the intestinal microflora. It starts with a detailed introduction to the topic, outlining the crucial role of gut microbes in human health and the influence of various factors like diet, genetics, and environment on these microbes. The paper emphasizes the nutritional richness of camel milk and its potential as a functional food, particularly its impact on gut microbiota and host metabolism.

      Initial sections of the paper discuss the research methodologies, including the study's keywords, abstract, and introduction. The abstract highlights the study's significant findings, such as the presence of various beneficial bacteria in sour camel milk, the inter- and intra-species transportation of microbiomes, and the impact of camel milk on the gut microflora and serum metabolites of type 2 diabetic rats.

      The introduction further delves into the composition of the human gut microbiota and the shaping factors of the adult gut microbiome. It also examines the role of diet in modulating gut microbiota and the potential health benefits of dairy products, with a particular focus on camel milk.

      Subsequent sections present detailed research findings, including the results of microbial composition and source analysis in camel milk, the composition and changes of rat gut microbiota under camel milk regulation, and the effects of camel milk-regulated gut microbiota on metabolism in rats. The research also explores the interspecies transfer of microbes using camel milk as a vector and analyzes the gut microbiota in people consuming camel milk.

      The paper further discusses the endophytic flora of camel edible desert plants and their possible influence on the camel's gut microbiota. The discussion section integrates the findings, offering insights into the potential health benefits of camel milk and its probiotic qualities. It also compares the effects of camel milk with other dairy products and discusses its role as a vector for beneficial microbes.

      Materials and methods used in the study are detailed towards the end of the summarized portion, describing sample collection and processing, the experimental setup for rats, and data processing and analysis techniques.

      Reviewer #1 (Significance (Required)):

      The paper continues with detailed research findings, including the microbial composition in camel milk, the impact on the gut microflora of rats and humans, and the serum metabolism effects.

      There's a focus on how camel milk, as a vector, can transfer beneficial microbes between species, influencing gut microbiota and host metabolism.

      The paper compares the effects of camel milk with other dairy products, emphasizing its unique health benefits and its role in transferring beneficial microbes.

      It discusses various bacteria found in camel milk and their potential health benefits.

      The research findings extend to understanding how camel milk affects human gut microbiota, with studies on pastoral herders who consume camel or bovine milk.

      Author response: We thank you for your approval and constructive and valuable feedback from you and other reviewers.

      Comments from Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      summary:

      The authors introduce a study assessing the bacterial flora of sour fermented camel milk and its capability to introduce beneficial species into consumer's gut. They further tested the potential of its nutrients and species for beneficial effects on type 2 diabetic (t2d) rats. They claim that t2d rats fed with high-dose camel whey reveal a microbiota closer to that of healthy rats rather than that of other t2d rats not receiving the camel whey treatment. Further they claim that this effect is due to the presence of Eubacterium limnetica that was exclusively found in the gut microflora of rats taking camel milk and producing MtcB protein. They conclude that camel milk may have the potential to be functional food.

      Overall, I think the approach of looking into camel milk and its microbiota is of broad interest, as it is food consumed traditionally by many tribes and in several countries. However, to me the presentation of the findings, the data and the analysis is often unprecise and confusing.

      For example, the MtcB protein they claim to be the mechanism of reducing the risk for t2d in the abstract is mentioned only once in the whole study and there only as a finding of another study (cited). According to my understanding the abstract should contain the main findings of the study, rather than some side-finding from other studies happens to match with the study results. I assume the authors have plenty of results from their sequencing data and metabolomics that they could mention in the abstract.

      In the text the authors mention the analysis of the microbial composition and source analysis of camel milk, the analysis of the gut microbiota of young camels, the composition, and changes of rat gut microbiota under the regulation of camel milk, the structure and changes of gut microbiota in people taking camel milk and the analysis of the endophytic flora of camel edible desert plants. And this just quoting the headers in the results section. Why is that not represented/mentioned in the abstract? Instead the authors focus on the t2d rats and the MtcB mechanism they fail to present.

      Further the authors are sloppy when it comes to typos and preciseness. For example, in the abstract they talk first about sour camel milk, then whey and then milk again.

      I suggest a major restructuring/rewriting and if necessary partial reanalysing of the results and the conclusions.

      It would be good to have an overview figure combining the work done, also stating the number of samples for each experiment.

      __Author response: __Thank you very much for your nice suggestion on our manuscript, we applied some restructuring to our manuscript and the changes were highlighted in yellow.

      Major comments:

      1) Please make sure all raw data (sequences and filtering/assembly results) are deposited in public databases, like NCBI, ENA or else.

      __Author response: __The corresponding data is available as Mendeley Data, V1, https://doi.org/10. 17632/4w8n8n96tc.1, some datasets with bigger size uploaded failed owing to internet problem. The full version could be offered in other approaches if requested.

      2) Please state briefly for each dataset analysed, which sequencing method was used, how many samples were collected and how many were pooled for the sequencing runs:

      AmpliAeq, whole metagenome HiSeq, MiSeq?

      __Author response: __Sample and dataset information for sequence was supplied in Supplementary Table 9 and 12. Sequencing library was prepared following Illumina library preparation instructions, and sequenced using Illumina Miseq platform at Majorbio Bio-Pharm Technology Co., Ltd. (Shanghai, China) with pair-end (PE) 150 bp reads.

      3) Page14 line283:

      F082? What is it? A strain, species or a sample?

      Please state clearly in the text.

      Also please avoid using abbreviations where possible and if you have to use them, please define.

      __Author response: __When applying diversity analysis at the specie level, a species annotated as unclassified_g_norank_f_F082 was found abundant in camel feces in Darbancheng.

      4) Page14 line307:

      "These evidenced that camel milk was a vector transferring microbes from the female camel to their cubs."

      Yes, that may be likely, but 16S amplicon-seq cannot provide evidence. Evidence would be strain similarity confirmed by SNP's or the like. So please state that this is speculative or show appropriate evidence.

      __Author response: __We completely agree that SNP’s is better evidence for this point and thank you. Microbial diversity analysis was a main part of initial design, and our limited sample couldn’t meet the needs of diversity and SNPs in the same time. There also were reports which used 16S based methods to trace the microbes source(Du et al., 2022; El-Mokdad, 2014; Wang et al., 2018).

      5) Page15 line322 ff:

      "Besides, using raw milk was not effective in type 2 diabetic rat model, so we chose camel whey and bovine whey as the diet of type 2 diabetic rats in follow-up experiments"

      Data/evidence? How is it different from whey on a nutrient perspective, as whey was more effective? Any explanation for this difference? And the bovine whey, what species did it contain? Can they be transferred regarding the processing of whey prior to application?

      __Author response: __This is an interesting and valuable question. We prepared raw milk and whey for the pre-test, then directly turned to validate the function of whey. Maybe we will investigate the composition difference in the future. The whey was prepared using the following protocol: Centrifuge fresh milk for 20 mins at 5000 r/min, discard the fat, and precipitate and obtain the middle layer of skim milk. After 20 mins in a 40 ℃ water bath, adjust the pH to 4.6 with 10% glacial acetic acid, and store in a 4°C refrigerator, overnight. Then, the skim milk was centrifuged at 8000 r/min for 20 min, repeated twice, and the middle whey fraction was collected. The centrifuged whey was poured into a petri dish and sealed. It was frozen at -80°C for 12 hours and then pierced with a sterile toothpick on the petri dish and then freeze-dried to get whey powders. A speculation was the preparing progress of whey played an important role in their functional difference. A comprehensive comparison of camel raw milk, camel whey, bovine raw milk, and whey will be an interesting point and we may investigate it shortly.

      6) Page17 line366ff:

      "Taking the number of microbes involved in this pathway, 8001 species were noted in the high-dose camel whey group, 3447 in the positive drug group, and only 1467 in the diabetics." How many species were present in the rats initially? Was species abundance different in the first place, or did they get lost, or came from the camel whey?

      __Author response: __The rats were fed with broad-spectrum antibiotics for 2 weeks, which ensured the same species abundance in the beginning.

      7) Page17 line369 ff:

      "It indicated that these microbes might resist the high glucose environment of the host through the synthesis and metabolism of their amino acids, and the effect of high-dose camel milk was more effective than that of metformin"

      -> How high was the glucose level in the rat gut? Or were there any obvious physiological changes in the t2d model rats that are characteristic for such a high-glucose environment? Please explain.

      __Author response: __This is an interesting and critical question. We didn’t measure the glucose level in the rat gut directly because we had to make sure other related characterizations worked properly. Besides, we thought camel milk could regulate microbial community, and further influence the blood sugar level, which was more representative in our sight. Blood sugar level is supplied in Fig.4O and Supplementary Table 11.

      8) The resolution/quality of the figures is low and the labelling often small. So not all text is readable.

      __Author response: __We adjusted the figures in the manuscript and offered additional independent picture files. Additionally, it seemed caused by the PDF merge progress, please check the pictures in .docx or .png files for details.

      9) Page19 line400 ff:

      What serum metabolites were analysed and why? Please write an intro-sentence to make it easier for the reader.

      Please write more precise what methods were used. Maybe I missed it, but I didn't find it in the methods part as well (Page40/41).

      __Author response: __The rats fed high-dose camel whey or metformin showed similar improvement in serum metabolite imbalance and were closer to normal. Caproylcarnitine, taurodeoxycholic acid, acetylcarnitine, creatinine, linoleic acid, and tridecanoic acid were detected as upregulated; 2-deoxyuridine, cyclohexylamine, L-pipecolic acid, LysoPC(18:0), uracil, caprylic acid, cholesterol sulfate, L-citrulline, pelargonic acid, and phenol downregulated. Carnitine supplementation, due to its key role in lipid metabolism and antioxidant effects, may effectively manage Type 2 Diabetes by addressing fatty acid metabolism dysregulation and oxidative stress(Bene, Hadzsiev, & Melegh, 2018). Studies have shown that taurodeoxycholic acid can enhance the effect of insulin and reduce blood sugar levels by regulating endoplasmic reticulum stress, and have potential in the treatment of diabetes(Xing, Zhou, Wang, & Xu, 2023). Low serum creatinine is associated with the development of T2D(Song, Hong, Sung, & Lee, 2022). Increased linoleic acid consumption was recommended for the prevention of T2D(Henderson, Crofts, & Schofield, 2018). The uridine is phosphorylated into uracil, which is converted to 2-deoxyuridine. Then 2-deoxyuridine is further converted to thymine with thymidine phosphorylase, the expression of thymidine phosphorylase was lost or considerably reduced when the organism suffered nephropathy and the high concentration of thymidine is a cause of DNA impairment, which is related to diabetes and diabetic nephropathy(Spinazzola et al., 2002; Szabo et al.; Xia, Hu, Liang, Zou, Wang, & Luo, 2010). L-Pipecolic acid are associated with higher incidence of T2D(Razquin et al., 2019). A research showed LysoPC(16:0) and (18:0) may mediated a fast progression of diabetic kidney disease(Yoshioka et al., 2022). Cholesterol sulfate is the most abundant known sterol sulfate in human plasma, and it plays a significant role in the control of glucose metabolism, which contribute to the pathogenesis of insulin resistance and the resultant development of diabetes(Shi et al., 2014; Zhang et al., 2022). L-citrulline supplementation might improve glucose homeostasis, some lipid factors and inflammatory markers in overweight and obese patients with T2D(Azizi, Mahdavi, Mobasseri, Aliasgharzadeh, Abbaszadeh, & Ebrahimi-Mameghani, 2021). T2D mellitus is associated with increased total plasma free fatty acid and modulating its concentration is the mechanism of some fibrates and statins drugs(I. S. Sobczak, A. Blindauer, & J. Stewart, 2019). Most of these metabolites have been reported as causes of T2D or consequences of T2D progress, some have been designed as therapeutic target.

      The serum metabolites were carried out using Agilent 1290 Infinity UHPLC system equipped with a HILIC column. The mobile phase of the optimized method consisted of (A) water with 25 mM ammonium acetate and 25 mM ammonia; and (B) acetonitrile (ACN). The following gradient elution was used: 5% A at 0-1min; 5-35% A at 1-14 min; 35-60% A at 14-16 min; 60% A at 16-18 min ; 60-5% A at 18-18.1 min and 5% A at 18.1-23 min. The flow rate was 0.3 mL/min, injection volume 2 μL, and column temperature was 25 ℃. Triple TOF 5600 mass spectrometer was applied for mass spectrometer analysis. The condition was used as following: Ion Source Gas1:60,Ion Source Gas2:60,Curtain gas:30,source temperature:600℃,IonSapary Voltage Floating ± 5500 V. TOF MS scan m/z range:60-1000 Da,product ion scan m/z range:25-1000 Da,TOF MS scan accumulation time 0.20 s/spectra, product ion scan accumulation time 0.05 s/spectra.MS/MS was gathered by information dependent acquisition (IDA) using high sensitivity mode, Declustering potential:±60 V, Collision Energy:35±15 eV, and IDA was set as Exclude isotope within 4 Da, Candidate ions to monito per cycle: 6. The methods part was complemented.

      Minor comments:

      1) Page1, line56-58 ff

      Please phrase more clearly:

      "This study specified that the transportation of microbiome happened both intra- and inter-species and played a principal role in the formation of progeny gut microflora."

      While the content is mostly comprehensible, there is a need for rephrasing and correction of language also in the following text.

      __Author response: __As suggested by the reviewer, we have rephrased and modified the abstract part.

      2) Page14 line300 ff:

      There is no need to show the OTU numbers in the text, please provide your results as a table in the supplements and refer to it in the text.

      Author response: We deleted OTU numbers in the manuscript and added the corresponding table in supplementary file.

      3) Page15 line328: Please check for typos, it is Shannon index, not Shanno.

      __Author response: __The corresponding correction was applied in the manuscript.

      4) Page16 line334:

      Please mention the number, age and sex of the rats used and how many groups you had in your experiments.

      __Author response: __SPF-grade male rats weighing 180-220 g were used for our related experiments. The detailed information is available in Supplementary Material (Supplementary Table 11-13).

      5) The headlines should logically structure the paper:

      For example, the authors have two very similar sections in the results part: "Composition and changes of rat gut microbiota under the regulation of camel milk" and "Analysis of the composition of gut microbiota in rats". Those can be combined or stated more concise.

      Also, other headlines improvement to make it easier for the reader to follow.

      __Author response: __We adjusted this part in the manuscript according to the reviewer’s suggestion.

      Reviewer #2 (Significance (Required)):

      I do think the study is of broad interest and relevance. However, the presentation of the analysis and data needs major revision. Especially it is lacking clarity on what was done for which samples and how the authors draw their conclusions. Also, I think that abstract and main text have a different focus. I would suggest to the authors to concentrate on their findings in abstract and text and state precisely what was done and what they found.

      __Author response: __Thank you very much for your recognition of our manuscript.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The authors introduce a study assessing the bacterial flora of sour fermented camel milk and its capability to introduce beneficial species into consumer's gut. They further tested the potential of its nutrients and species for beneficial effects on type 2 diabetic (t2d) rats. They claim that t2d rats fed with high-dose camel whey reveal a microbiota closer to that of healthy rats rather than that of other t2d rats not receiving the camel whey treatment. Further they claim that this effect is due to the presence of Eubacterium limnetica that was exclusively found in the gut microflora of rats taking camel milk and producing MtcB protein. They conclude that camel milk may have the potential to be functional food.

      Overall, I think the approach of looking into camel milk and its microbiota is of broad interest, as it is food consumed traditionally by many tribes and in several countries. However, to me the presentation of the findings, the data and the analysis is often unprecise and confusing. For example, the MtcB protein they claim to be the mechanism of reducing the risk for t2d in the abstract is mentioned only once in the whole study and there only as a finding of another study (cited). According to my understanding the abstract should contain the main findings of the study, rather than some side-finding from other studies happens to match with the study results. I assume the authors have plenty of results from their sequencing data and metabolomics that they could mention in the abstract. In the text the authors mention the analysis of the microbial composition and source analysis of camel milk, the analysis of the gut microbiota of young camels, the composition, and changes of rat gut microbiota under the regulation of camel milk, the structure and changes of gut microbiota in people taking camel milk and the analysis of the endophytic flora of camel edible desert plants. And this just quoting the headers in the results section. Why is that not represented/mentioned in the abstract? Instead the authors focus on the t2d rats and the MtcB mechanism they fail to present. Further the authors are sloppy when it comes to typos and preciseness. For example, in the abstract they talk first about sour camel milk, then whey and then milk again.

      I suggest a major restructuring/rewriting and if necessary partial reanalysing of the results and the conclusions.

      It would be good to have an overview figure combining the work done, also stating the number of samples for each experiment.

      Major comments:

      1. Please make sure all raw data (sequences and filtering/assembly results) are deposited in public databases, like NCBI, ENA or else.
      2. Please state briefly for each dataset analysed, which sequencing method was used, how many samples were collected and how many were pooled for the sequencing runs: AmpliAeq, whole metagenome HiSeq, MiSeq?
      3. Page14 line283: F082? What is it? A strain, species or a sample? Please state clearly in the text. Also please avoid using abbreviations where possible and if you have to use them, please define.
      4. Page14 line307: "These evidenced that camel milk was a vector transferring microbes from the female camel to their cubs." Yes, that may be likely, but 16S amplicon-seq cannot provide evidence. Evidence would be strain similarity confirmed by SNP's or the like. So please state that this is speculative or show appropriate evidence.
      5. Page15 line322 ff: "Besides, using raw milk was not effective in type 2 diabetic rat model, so we chose camel whey and bovine whey as the diet of type 2 diabetic rats in follow-up experiments" Data/evidence? How is it different from whey on a nutrient perspective, as whey was more effective? Any explanation for this difference? And the bovine whey, what species did it contain? Can they be transferred regarding the processing of whey prior to application?
      6. Page17 line366ff: "Taking the number of microbes involved in this pathway, 8001 species were noted in the high-dose camel whey group, 3447 in the positive drug group, and only 1467 in the diabetics." How many species were present in the rats initially? Was species abundance different in the first place, or did they get lost, or came from the camel whey?
      7. Page17 line369 ff: "It indicated that these microbes might resist the high glucose environment of the host through the synthesis and metabolism of their amino acids, and the effect of high-dose camel milk was more effective than that of metformin"
      8. How high was the glucose level in the rat gut? Or were there any obvious physiological changes in the t2d model rats that are characteristic for such a high-glucose environment? Please explain.
      9. The resolution/quality of the figures is low and the labelling often small. So not all text is readable.
      10. Page19 line400 ff: What serum metabolites were analysed and why? Please write an intro-sentence to make it easier for the reader. Please write more precise what methods were used. Maybe I missed it, but I didn't find it in the methods part as well (Page40/41).

      Minor comments:

      1. Page1, line56-58 ff Please phrase more clearly: "This study specified that the transportation of microbiome happened both intra- and inter-species and played a principal role in the formation of progeny gut microflora." While the content is mostly comprehensible, there is a need for rephrasing and correction of language also in the following text.
      2. Page14 line300 ff: There is no need to show the OTU numbers in the text, please provide your results as a table in the supplements and refer to it in the text.
      3. Page15 line328: Please check for typos, it is Shannon index, not Shanno.
      4. Page16 line334: Please mention the number, age and sex of the rats used and how many groups you had in your experiments.
      5. The headlines should logically structure the paper: For example, the authors have two very similar sections in the results part: "Composition and changes of rat gut microbiota under the regulation of camel milk" and "Analysis of the composition of gut microbiota in rats". Those can be combined or stated more concise. Also, other headlines improvement to make it easier for the reader to follow.

      Significance

      I do think the the study is of broad interest and relevance. However, the presentation of the analysis and data needs major revision. Especially it is lacking clarity on what was done for which samples and how the authors draw their conclusions. Also I think that abstract and main text have a different focus. I would suggest to the authors to concentrate on their findings in abstract and text and state precisely what was done and what they found.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      DBF4 and DRF1 knockout cells were generated and used to separate DBF4- and CDC7-dependent from DRF1- and CDC7-dependent activities. DBF4- and CDC7-dependent activities at replication forks were independent of DRF1. These include the replication timing pattern, replication fork velocity, DNA damage signaling. DBF4 is required to recruit CDC7 to active replication forks. The study is in large part exceptional.

      The inclusion of quantitation for a modest bandshift on CDC7 in figure 2 (30% vs 50% reduced) is not justified given the abundance of the main band and our knowledge of the lack of linearity of western blot quantitation. This should be removed.

      We thank the reviewer for evaluating our manuscript and for the positive feedback.

      In the revised manuscript we have removed the quantification of the bandshift related to CDC7 autophosphorylation in mitotic cells which was reported in Figure 1E. We recognise that the quantification may not be accurate although performed using semiquantitative near-infrared scanning technology. Importantly the experiment was performed three times with almost identical results.

      The only significant weakness in the paper is the explanation of the replication timing analyses in Figure 3. I don't understand what the differences between the plots equate to in terms of timing. I understand the replication of these regions that diverge is either early or late, but their were only two fractions of cells - 2N-3N and 3N-4N (the cells are "normal"). If this is the case, isn't the readout binary? a sequence either replicates in S phase between 2N and 3N or in S phase between 3N and 4N. Why are the differences so small? Are they only evident in a small population of cells? If that is the case, then what does the difference really mean? I think the description of these data needs to be precise.

      The replication timing experiments were performed with a well-established and reliable protocol (Ryba et al., 2011, https://doi.org/10.1038/nprot.2011.328). Asynchronous cells are labelled with a short pulse of BrdU, and sorted in two fractions, early and late S-phase, as described in Hiratani et al., 2008, Ryba et al., 2010, Hadjadj et al, 2016 and 2020 (https://doi.org/10.1371/journal.pbio.0060245) (https://doi.org/10.1101/gr.099655.109, https://doi.org/10.1016/j.gdata.2016.07.003, https://doi.org/10.1093/nargab/lqaa045).

      This method does not take into account the variation in the DNA copy number (2N vs 4N) between replicated and non-replicated parts of the genome (S/G1 ratio) as in Siefert et al., 2017 (https://doi.org/10.1101/gr.218602.116).

      The profiles depict the average replication timing of a population of 20,000,000 cells; thus, the readout is not binary.

      Replication timing profiles display the log ratio between early and late replicated fractions along the chromosome. Early replicated regions show positive log ratios and late replicated regions show negative ratios. The differential analysis performed with the START-R suite allows the comparison of the profiles (Ctrl vs either CDC7i-treated or DBF4-deficient cells). The genomic regions with altered timing are shown in green or in purple below the profiles, showing advanced and delayed regions, respectively.

      Importantly, the differences in replication timing are expressed with log ratio, that explains why the profiles are varying from -2 (very late replicating regions) and +2 (very early replicating regions). The differences we observed in Figure 3 are representative of two experiments, each composed of two technical replicates that are highly reproducible.

      To better describe the data, we have modified the text in the results section with the words in bold, as below: “These two neo-synthesized DNA fractions were then hybridised on human whole genome microarrays, as previously described. The log ratio between early and late replicated fractions was calculated and visualised for the whole genome.” We also changed the labelling of the replication profiles in Figure 3 and former Figure S3 (now Figure S4) by adding Log2 (Early/Late) to intensity and added two new sentences to the figure legend 3.“____Replication timing profiles display the log ratio between early and late replicated fractions along the chromosome. Positive log ratios correspond to early replicated regions whereas negative ratios correspond to late replicated regions.”

      Reviewer #1 (Significance (Required)):

      I think this paper is a significant advance that should be published. CDC7 is a critical kinase and identifying its co-factor at the replication fork is important both for our understanding of mechanisms of DNA replication and the impact of CDC7 kinase inhibitors in the clinic. I think the majority of the experiments are well designed and the results are unambiguous and precisely described.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      CDC7 is a master cell cycle kinase with essential functions in DNA replication and important roles in the DNA damage response. For its functions, CDC7 relies on a regulatory factor, DBF4, which is essential in many species but not in human cells as a consequence of the presence of a second DBF4-related factor, DRF1. In this work, Göder and colleagues study the relative relevance of these regulatory proteins in CDC7 roles. Their study reveals DBF4 as the major regulatory subunit both in DNA replication, DNA damage checkpoint and fork dynamics. The objective of the study is highly relevant to understand an essential cell cycle kinase with potential applications in cancer therapies, the experiments are well performed and the conclusions are "in principle" sound.

      We thank this reviewer for the time and attention in evaluating the manuscript, for the positive feedback and for indicating key points for improvement and discussion.

      The major handicap of the study is the absence of western blots showing the elimination of DBF4 and DRF1 in the edited cell lines due to the lack of specific antibodies. The authors have generated homozygous mutations that lead to premature stop codons behind critical CDC7 domains. However, as they mention, it is not possible to fully exclude some proteins arising from internal start sites or exon skipping events with residual (functional or altered, and not necessarily residual) activity. This is not unexpected, especially for essential proteins. This would not be a major handicap if the study were focused in a specific factor because it would only question the impact of but not the affected function, but it aims to compare the relative effect of two defective genes. In this case, it is essential to confirm that both genes are eliminated, at least to the same degree.

      We agree with the reviewer that it would be valuable to confirm the effect of the mutations by immunoblotting.

      Over the years we have had multiple attempts at generating sensitive antibodies against both DBF4 and DRF1, using recombinant proteins and synthetic peptides. We also tested several commercially available anti-DBF4 and anti-DRF1 antibodies. While often we were able to detect overexpressed proteins, the detection of endogenous levels has been particularly challenging especially in non-transformed cells, such MCF10A.

      Nevertheless, with an anti-DBF4 serum we obtained from the Diffley lab, which was generated against the C-terminus fragment of hDBF4, we managed to detect endogenous full length DBF4 in parental but not in the DBF4-KO cells (this blot is now included as supplementary Fig S1B). Even with this reagent the detection levels are low and multiple non-specific immunoreactive bands are present, making the detection of DBF4 particularly challenging across the experiments. Interestingly, while DBF4 is no longer detectable in DBF4-11, one the two clones used in this work , we detect a new immunoreactive band of approximately 55kDa in the other clone DBF4-30. We reckon that this may be the result of mRNA translation from the next downstream methionine. In this case this aberrant protein would lack the N domain and most of the M domain, involved in CDC7 binding and activation, and thus this fragment is very likely not functional.

      Importantly, most results in this study were obtained using both DBF4-11 and DBF4-30 clones with indistinguishable results. Only the replication timing experiments were done using a single clone DBF4-11, in which DBF4 protein is not detected.

      We had less success with the direct detection of DRF1. As also suggested by reviewer #3, to screen the clones after genome editing, we originally performed IP-western experiments. We used an anti-DRF1 mAb and unrelated IgG for the immunoprecipitations and an anti-CDC7 antibody as a probe in western blotting. We detected an immunoreactive band above the background at the expected molecular weight for CDC7 when the immunoprecipitation was performed with extracts from parental cells (as well as in a clone obtained with a different sgRNA, targeting DRF1 Exon1 and never used in this study) but not when the immunoprecipitation was performed with extracts from the DRF1- 5 and DRF1-7 clones used in the study. These original co-IPs are credible although not particularly pretty and importantly the result was confirmed in a more convincing experiment in the DRF1-5 clone.

      These new data are now included in the resubmission in Figure S1. So, while the detection of the CDC7 regulatory subunits still remains particularly difficult, we can now provide evidence that their expression is altered in the engineered cell lines used in the study.

      The computational analysis in Figure 1C is consistent with the major conclusion about the primary regulatory role of DBF4 in replication, but it is insufficient to validate the specific phenotypes addressed in the study.

      The figure reports the effects of targeting single genes with multiple sgRNA (4 to 8 according to the library used) on proliferation rate/fitness measured after multiple days in more than 1000 screens across many different human cell types. Loss of fitness can be due either to a direct problem with DNA replication or with other cellular processes.

      We agree with the reviewer that the analysis in Fig 1C is consistent with the phenotypes shown in the study. Particularly it is consistent with the lack of a major defect of DRF1-deficient cells in DNA replication, and it strongly indicates an essential role for CDC7 which was somehow challenged by Suski and co-workers (see also below).

      Indeed, there is a result that is hard to understand if the edited cell lines are defective in the expression of the regulators, specially DRF1. Figure S2D-E shows no synergistic defect in DNA synthesis when the second regulator is knock down with specific siRNAs, not even DRF1 defective cell lines treated with a siDBF4 that reduces its expression 10 times. Also, it is not clear why the defects, specially in DBF4-defective cell lines, are less severe than in cells treated with an inhibitor that causes a partial inhibition of CDC7. If it is due to the expression of DRF4, a siRNA against DRF4 should cause more severe defects.

      Yes, we did not detect synergy or additive effect on the rate of DNA replication when targeting both DBF4 and DRF1 by multiple approaches. This was also for us an unexpected result, that we examined to the best of our capabilities.

      The lack of the expected synergy in the replication assays could be explained in multiple ways and could be of biological or technical nature such as 1) residual low levels of DBF4/DRF1 proteins remaining in the cells upon either CRISPR/Cas9 or siRNA targeting, 2) alternative mechanisms of kinase activation by a different, yet unidentified protein, 3) minimal residual enzymatic activity of hCdc7 kinase not requiring an activating subunit.

      We performed further computational analysis using the dataset of the DepMap project, assessing if the effect of targeting DBF4 on fitness may be dependent on the levels of DRF1 expression. In several instances, when dealing with paralogues the gene effect of knocking out one of the paralogues directly correlates with the expression levels of the second, a phenomenon known as paralogue buffering (De Kegel et al. 2019 https://doi.org/10.1371/journal.pgen.1008466 ).

      In the case of DBF4 and DRF1, this correlation is minimal (plot below: X and Y axes are DRF1 expression levels and DBF4 gene effect respectively, Pearson's correlation = 0.12) so that there are ~ 470 other genes whose expression is more correlated with DBF4 essentiality. Furthermore, by stratifying cell lines according to whether DBF4 was essential or not and then looking at DBF4B (DRF1) expression, we failed to see significant association (graph below).

      Thus, this analysis reinforces the idea that if cooperation between DBF4 and DRF1 exists, it is particularly difficult to demonstrate. To date the interplay between DBF4 and DRF1 is only indicated by the partial impairment on MCM2 phosphorylation and CDC7 autophosphorylation observed in the individual KOs and by the fact that we were unable to obtaining viable double KO mutant clones. We recognise that the latter is a negative result and double KO may be generated in other cellular models or with different strategies.

      We are happy to include the above computational analysis in a revised manuscript and to expand the discussion on the essentiality of CDC7, DBF4 and DRF1.

      The effects of directly inhibiting CDC7 with 10 microM XL413 (concentration used in this study) are indeed stronger than DBF4 KO / depletion on both DNA synthesis (Fig 2A-B) and MCM2 phosphorylation (Fig 4A and Fig 5A).

      We and others have previously shown that CDC7 inhibition by XL413 causes a dose dependent decrease in MCM2 phosphorylation and DNA synthesis. Importantly in the experiments where XL413 was titrated on MCF10A cells from 0.3 microM to 80 microM, we demonstrated that these parameters are uncoupled and that doses that are ~20-fold higher are required to cause a strong impediment of DNA synthesis compared to the dose required to cause full MCM2 dephosphorylation (Rainey et al. 2017 https://doi.org/10.1021/acschembio.7b00117 ).

      DBF4 deficiency only partially affects MCM2 phosphorylation thus it is comparable to very low doses of XL413, that we can estimate to be in the range between 1 and 2 microM.

      Minor points

      • Title in Pag 12. "DBF4 mediates the majority of CDC7 functions in the replication stress response". In this section the authors address only the role of CDC7 in checkpoint signalling but not in other processes related to the replication stress response.

      We agree and we have modified the title of this section accordingly.

      • Figure 2. "EdU incorporation in late S-phase/ per cell" is clearer

      We have modified the label of this figure.

      • Right panels in Figures 3A and 3B are duplicated

      We sincerely apologise for the mistake occurred while assembling the figure. The figure has been corrected, and shows that the changes in the replication timing with the CDC7i or with DBF4-KO are indeed similar but not identical.

      **Referees cross-commenting**

      I am aware of the difficulty to sort out the detection problem, a major handicap of the work. Immunoprecipitation as suggested by rev. 3 might be an interesting possibility. The results should be published, in any case, as they are well performed and try to answer a relevant question. But, if finally the authors fail to detect the proteins, they should make clear in the paper the limitation of their conclusions by the possibility that the expression of the regulators is not completely eliminated or could be altered. Indeed, the apparent contradiction with Suski's results raised by Rev 3 might be discussed in this context.

      We appreciate the reviewer’s recognition of the technical problems we have encountered. We are glad that we now are in a position to provide evidence of impairment of DBF4 and DRF1 expression in the engineered cells (discussed above and reported in new Figure S1 and S2).

      Also, it is important to explain the lack of synergism when combining the edited mutations with siRNAs.

      In a revised manuscript we will explain the potential reasons why lack of synergism either doesn’t exist or is not observed, as discussed above.

      Reviewer #2 (Significance (Required)):

      In summary, the work is relevant and interesting, but the lack of controls about the effect of the edition rises important concerns about the conclusions. It is evident from the acknowledgment section that the authors have tried without success to generate specific antibodies. An alternative possibility would be 1) to get similar results with at least two clones addressing different exons (actually, only one clone was used for DRF1 in most cases) and 2) show synergistic effects for the more important phenotypes in edited cells transfected with efficient siRNAs. This is particularly important for DRF1-defective cells, which show no phenotypes except for an increase in micronuclei. If DBF4 is not essential because the complementary activity of DRF1, impairment of DBF4 expression with siRNAs in DRF1 deficient cells should cause synergistic defects at least in DNA replication and cell viability.

      We hope we have satisfactory addressed this reviewer’s comments, by providing experimental evidence of the impairment of DBF4 and DRF1 expression/function in the engineered cells and several points for discussion addressing the lack of obvious synergy between DBF4 and DRF1.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary Assembly of the CMG helicase during DNA replication initiation is regulated by the DBF-Dependent Kinase known as CDC7 (or DDK), which also plays roles at DNA replication forks during elongation. In vertebrates, DDK has two regulatory subunits called DBF4 and DRF1. Until now, the division of labour between these two activators of CDC7 was poorly understood in mammalian cells. To address this issue, the authors used CRISPR-Cas9 to edit the DBF4 and DRF1 genes in immortalised human breast cells (MCF10A), thereby truncating key domains of the DBF4 and DRF1 proteins. The DBF4-deficient and DRF1-deficient lines are viable, whereas the double mutant was unobtainable and likely inviable, as reported previously by the authors for knockout of CDC7 in MCF10A cells. The authors compare the DBF4-deficient and DRF1-deficient lines with the CDC7 inhibitor XL413, providing evidence that DBF4 has the major role in supporting CDC7 activity in MCF10A cells compared to DRF1, in terms of DNA replication, origin firing, fork progression, and checkpoint activation. Curiously, DRF1 appears to be more important in preventing the formation of micronuclei - another phenotype seen upon inhibition of CDC7 kinase activity.

      Major comments: The data are of high quality and the key conclusions are convincing, although it is unfortunate that the authors were not able to monitor the level of DBF4 and DRF1 by immunoblotting to validate their edited cell lines. The authors previously reported using immunoprecipitation of CDC7, DBF4 and DRF1 (Tenca et al, 2007, 10.1074/jbc.M604457200) to monitor DDK subunits in HeLa cells, which would presumably have been helpful here in MCF10A cells. Nevertheless, the DNA sequence of the edited clones indicates frameshift mutations that lead to premature STOP codons, and the various phenotypes reported in this manuscript are consistent with loss of DBF4 / DRF1 function as described.

      We thank the reviewer the time an effort in carefully assessing the manuscript, and with his/her positive assessment.

      We have now included experimental evidence indicating that DBF4 expression is deficient in the DBF4 KO cells used in this study and that the interaction with DRF1 and CDC7 is deficient in the DRF1-KO cells using the same Co-IP strategy previously reported in Hela cells. Please see also the response to reviewer #2 to the same point.

      Minor comments: 1. The authors should discuss their data in the context of the recent study by Suski et al (https://doi.org/10.1038/s41586-022-04698). The latter study reported that knockout of DBF4 in mouse fibroblasts impairs proliferation but is not lethal, in agreement with the present manuscript, but Suski et al also argue that CDC7 is dispensable for DNA replication in mammalian cells due to redundancy with CDK1.

      The requirement for CDC7 kinase activity for genome duplication in mammalian cells has become a contentious point of debate. CRISPR screens in more than 1000 cell lines indicate that CDC7 is a core essential gene required for proliferation (DepMap.org). Clearly human cells can clearly withstand reduced CDC7 activity, and several proteins contribute both positively and negatively to the effectiveness of CDC7 inhibition in DNA replication and cell proliferation e.g. RIF1 depletion, ATR inhibition, PTBP1 mutation. (Hiraga et al. 2017 https://doi.org/10.15252/embr.201641983 ; Rainey et al. 2020 https://doi.org/10.1016/j.celrep.2020.108096 : Jones et al. 2021 https://doi.org/10.1016/j.molcel.2021.01.004 ; Göder et al. 2023 https://doi.org/10.1016/j.isci.2023.106951).

      Specifically CDK1-phosphporylatyon of RIF1 was shown to disrupt RIF1/PP1 interaction and PP1’s ability to counteract CDC7-dependnet phosphorylation of the MCM complex (Moiseeva et al. 2019 https://doi.org/10.1073/pnas.1903418116 ; Jones et al. 2021 https://doi.org/10.1016/j.molcel.2021.01.004). Thus increased CDK1 activity can be helpful in dealing with low levels of CDC7 kinase.

      Suski et al argue that CDC7 is dispensable for DNA replication in human cells based on acute degradation of CDC7 or by its inhibition using an “Shokat type” analogue sensitive CDC7 allele. However, another study showed that DNA replication is not completed using the same approach and the same analogue sensitive allele (Jones et al. 2021 https://doi.org/10.1016/j.molcel.2021.01.004). In mouse embryonic stem cells, the Masai group had previously shown that CRE-Lox mediated inactivation of mDBF4 leads to a strong decrease of DNA synthesis and that mDBF4, like mCDC7 is essential for cell ES cells viability (Kim et al, 2002 https://doi.org/10.1093/emboj/21.9.2168 and Yamashita 2005 https://doi.org/10.1111/j.1365-2443.2005.00857.x ). Intriguingly mDRF1 has yet not been identified nor characterised. In our opinion, the simplest explanation to reconciliate the different reports is that human and mouse CDC7 are indeed required for DNA replication and for cell proliferation, but the phenotype of the most severe effects of its inhibition requires the complete loss of function of the kinase and may be delayed in time. We are happy to add these considerations in the discussion section of the revised manuscript.

      1. Some discussion of the increased frequency of micronuclei in DRF1-deficient cells compared to DBF4-deficient lines would be useful (c.f. Figure 1F-G).

      In the discussion we have suggested that the increase of micronucleated cells in the DRF1 deficient clones “could be consistent with a (DRF1) specific but not yet identified function in chromosome segregation, in the fine-tuning of DNA replication or the DNA repair process”. Of interest, CDC7 kinase was recently involved in modulating ATR function in cytokinetic abscission, and impairment of this process can lead to increase frequency of micro nucleated cells (Luessing et al. 2023 https://doi.org/10.1016/j.isci.2022.104536 ). It is possible that this new role of CDC7 could be dependent on DRF1, an hypothesis at present purely speculative, that we will be testing in the future. We are happy to add these considerations to the discussion section of the revised manuscript.

      1. It would be helpful to present actual p values in Figure 2, rather than asterisks.

      Asterisks report the range in which the p values fall into, which currently is specified in the legend. These can be substituted with actual numbers in the figures, and we will comply with the requirement of the journal in which the manuscript will be accepted.

      Reviewer #3 (Significance (Required)):

      The main strength of this manuscript is the exploration of the division of labour between DBF4 and DRF1 in human cells, regarding the roles of CDC7 kinase during DNA replication initiation, fork progression and checkpoint control. A limitation would be the failure to monitor the level of DBF4 and DRF1 in the CRISPR-edited cell lines, whilst it is also possible that the relative roles of DBF4 and DRF1 might vary in different cell types.

      Previous studies of DNA replication in Xenopus egg extracts (e.g. Takahashi et al, 2005: doi: 10.1101/gad.1339805) indicated that DRF1 is the dominant activator of CDC7. In contrast, past work from the current authors (Tenca et al, 2007, 10.1074/jbc.M604457200) indicated that DBF4 is the major partner of CDC7 in human HeLa cells, at least at the level of promoting MCM2 phosphorylation (the only parameter monitored in the previous study, whereas the present manuscript goes much deeper into the various roles of CDC7 in DNA replication control and focusses on the role of CDC7 at replication forks and in checkpoint control).

      This study should be of interest to those studying chromosome replication, checkpoints and genome integrity. It should also interest those with a more clinical perspective, due to the potential importance of CDC7 kinase inhibitors as anti-cancer agents.

      My own expertise is in the field of chromosome replication.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study assesses anatomical, behavioral, physiological, and neurochemical effects of early-life seizures in rats, describing a striking astrogliosis and deficits in cognition and electrophysiological parameters. The convincing aspects of the paper are the wide range of convergent techniques used to understand the effects of early-life seizures on behavior as well as hippocampal prefrontal cortical dynamics. While reviewers thought that the scope was impressive, there was criticism of the statistical robustness and number of animals used per study arm, as well as the lack of causal manipulations to determine cause-and-effect relationships. This paper will be of interest to neurobiologists, epileptologists, and behavioral scientists.

      We thank Joseph Gleeson as the Reviewing Editor and Laura Colgin as the Senior Editor for considering this revision of our manuscript for publication in eLife. We appreciate the positive acknowledgment of the study and the critical points raised by the reviewers. We have addressed all the excellent comments of the two reviewers, providing a detailed response for each comment. We believe that these revisions have significantly improved the quality and rigor of our study.

      We want to assure you that our experimental design was meticulously crafted, incorporating adequate control groups, and is grounded in prominent studies in systems neurophysiology focusing into early-life seizures effects, especially for capturing mild effects. We conducted statistical tests adhering to established norms and recommendations, ensuring a thorough and transparent description of the employed statistical methods. We welcome any specific suggestions to further improve this aspect.

      In fact, the concerns raised by the reviewers regarding statistical robustness may stem from a misunderstanding of the rat cohorts used in each experiment. Criticism was directed at the use of only 5 animals without a control group for acute electrophysiological recording. It is essential to clarify that this group served the sole purpose of confirming that the injection of lithium-pilocarpine would induce both behavioral and electrographic seizures. Importantly, this was a descriptive result, and no statistical test or further analysis was conducted with these data. In the revised manuscript, we have made adjustments to this description, aiming to eliminate any ambiguity, particularly addressing the issue of sample size in each experiment.

      Regarding the lack of causal manipulations, we fully agree that this approach would provide a deeper mechanistic understanding of our findings and is an essential next step. Still, developmental brain disturbances are linked to manifold intricate outcomes, so an initial observational exploration would offer insights about particular and nuanced relationships for following studies aimed at targeted interventions. In this context, our objective was to provide a comprehensive characterization of ELS effects to serve as a foundation for future research. While recognizing the relevance of causal manipulations, only a more sophisticated data analyses were able to reveal more complex aspects like specific multivariate associations and non-linear relationships that would not have been revealed by causally perturbing one or another factor at first. In the revised manuscript, we emphasized the limitation of lacking causal manipulations as well as the advantages of our approach. Also, we mentioned some possible targets for following perturbational investigations based on our findings.

      For a more detailed discussion on these matters, we invite you to review our response to reviewers.

      Reviewer 1

      In this paper, Ruggiero, Leite, and colleagues assess the effects of early-life seizures on a large number of anatomical, physiological, behavioral, and neurochemical measures. They find that prolonged early-life seizures do not lead to obvious cell loss, but lead to astrogliosis, working memory deficits on the radial arm maze, increased startle response, decreased paired pulse inhibition, and increased hippocampal-PFC LTP. There was a U-shape relationship between LTP and cognitive deficits. There is increased theta power during the awake state in ELS animals but reduced PFC theta-gamma coupling and reduced theta HPC-PFC coherence. Theta coherence seems to be similar in ACT and REM states in ELS animals while in decreases in active relative REM in controls.

      Strengths:

      The main strength of the paper is the number of convergent techniques used to understand how hippocampal PFC neural dynamics and behavior change after early-life seizures. The sheer scale, breadth, and reach of the experiments are praiseworthy. It is clear that the paper is a major contribution to the field as far as understanding the impact of early-life seizures. The LTP findings are robust and provide an important avenue for future study. The experiments are performed carefully and the analysis is appropriate. The paper is well-written and the figures are clear.

      We express our gratitude to Reviewer #1 for conducting a thoughtful and comprehensive review of our manuscript. We sincerely value both the constructive criticisms provided and your acknowledgment of the manuscript's strengths.

      Weaknesses:

      The main weakness of the paper is the lack of causal manipulations to determine whether prevention or augmentation of any of the findings has any impact on behavior or cognition. Alternatively, if other manipulations would enhance working memory in ELS animals, it would be interesting to see the effects on any of these parameters measured in the paper.

      We sincerely appreciate the insightful comments from Reviewer #1 regarding the potential benefits of including causal manipulations in our study. We wholeheartedly agree that such manipulations can provide a deeper understanding of the mechanistic underpinnings of the observed relationships and represent a crucial next step in our research trajectory.

      Our primary objective in this study was to establish a comprehensive framework through observational examinations, exploring intricate relationships across various neurobiological and behavioral variables in the aftermath of early-life seizures (ELS). By identifying these associations, our work aims to provide a foundation for future investigations that can delve into targeted interventions.

      While we acknowledge the importance of causal manipulations, we would like to underscore the advantages of our initial multivariate correlational study. Importantly, developmental brain disturbances have lasting impacts affecting multiple biological outcomes that may have intricate relationships between themselves. Firstly, although some neurobiological variables stood out from the comparisons of group means, this did not reveal some nuanced relationships within the data. The complexity of the relationships we uncovered, involving behavior, cognition, immunohistochemistry, plasticity, neurochemistry, and network dynamics, required a more elaborate analytical approach. Only through sophisticated data analysis techniques, we were able to dissect important peculiarities, such as the robust multivariate association between brain-wide astrogliosis and sensorimotor impairments, as well as non-linear relationships, such as the inverted-U relationship between plasticity and working memory. These nuances might not have been fully revealed through causal manipulations, since several variables are strongly related and consequently can affect several outcomes, leading to a false conclusion of direct causality.

      Nevertheless, we acknowledge the understatement of the limitation of lacking causal manipulations in our manuscript. To address this, we have included a dedicated section in the discussion highlighting this limitation. We emphasize the advantages of this exploratory phase, supported by a review of the literature on cause-and-effect studies that align with our findings. Additionally, we speculate on promising targets for future cause-and-effect studies based on our findings. For instance, we hypothesize that enhancing plasticity may improve working memory in control subjects, while attenuating plasticity might have a similar effect in ELS subjects. Furthermore, we propose that reactive astrogliosis and concurrent neuroinflammatory processes likely underlie sensorimotor changes in the ELS group. Lastly, we suggest that dopaminergic antagonism in the ELS group could normalize behavioral deficits, prevent the exaggerated LTP induction of the HPC-PFC pathway, reestablish the state-dependent network dynamics, and desensitize the dopaminergic response.

      [...]Also, I find the sections where correlations and dimensionality reduction techniques are used to compare all possible variables to each other less compelling than the rest of the paper (with the exception of the findings of U-shaped relationship of cognition to LTP). In fact, I think these sections take away from the impact of the actual findings.

      We appreciate the reviewer's feedback and would like to emphasize the significance of the multivariate analysis conducted in our study. Multivariate analysis extends beyond bivariate correlations and is the only type of analysis capable of comprehending the relation of data in a multidimensional way, offering a comprehensive approach to understanding complex relationships among multiple variables. By employing techniques such as principal component analysis (PCA), generalized linear models (GLM), and canonical correlation analysis (CCA), we aimed to unravel intricate patterns of covariance that explore how different variables collectively contribute to the observed outcomes and assess the impact of each independent variable (predictor) on the dependent variable (the variable to be predicted or explained). Importantly, it enables us to control for potential confounding factors by keeping all other variables constant.

      While we acknowledge that these sections may appear intricate, their inclusion is indispensable for a comprehensive understanding of the diverse variables associated with SE outcomes. We believe that these analyses offer valuable insights into the intricate dynamics of our study, providing a more holistic perspective on the altered spectrum induced by early-life seizures (ELS).

      Regarding the reviewer's observations about the impact of the U-shaped relationship between cognition and LTP, we have made graphical and textual adjustments to emphasize the significance of these findings, aiming to enhance their clarity and impact within the broader context of our research. We trust that these modifications contribute to a more compelling presentation of our results.

      […]Finally, the apomorphine section seemed to hang separately from the rest of the paper and did not seem to fit well.

      We appreciate the Reviewer #1 feedback on the apomorphine section. In order to address this point, we carefully rewrote our rationale before the results to clarify our hypothesis and chosen methodology. In our work, we performed the apomorphine experiment as a logical next step of previous data. We showed that ELS rats display REM-like oscillatory dynamics during active behavior, similar to genetically and pharmacologically hyperdopaminergic mice (Dzirasa et al., 2006). Furthermore, other results also indicated possible dopamine neurotransmission alterations, such as working memory deficits, hyperlocomotion, PPI deficits, aberrant HPC-PFC LTP, and abnormal PFC gamma coordination. Therefore, we hypothesized that ELS animals would present a state of hyperdopaminergic activity. Among the possible methodologies to investigate the hyperdopaminergic state, we choose the apomorphine sensitivity test, which is classically used and induces unambiguous behavior and neurochemical alterations in hyperdopaminergic rodents (Duval, 2023; Ellenbroek & Cools, 2002).

      Reviewer 1 (Recommendations For The Authors):

      (1) It would be useful to stain for other GABAergic interneuron markers such as somatostatin, VIP, CCK.

      (2) The authors refer to neuroinflammation but they are really referring to reactive astrogliosis. I would also suggest staining for microglial markers.

      (3) The duration of chronic electrographic seizures in ELS animals should also be calculated and presented.

      (4) Word usage: the authors frequently use the word "presents" when "demonstrates" would be more appropriate

      (1) We appreciate your insight into staining for other GABAergic interneuron markers such as somatostatin, VIP, CCK. While investigating additional interneuron types is indeed relevant, it was not the primary focus of this study for several reasons: 1) The overall neuron density, assessed through NeuN immunostaining, revealed no differences between controls and early life seizure (ELS) groups, even in brain regions susceptible to neuron death after SE (i.e., CA1). Therefore, differences in interneurons, which are more resistant to death in SE and constitute approximately 20% of the cells, are unlikely. 2) Among all interneuron subtypes, Parvalbumin-positive (PV+) interneurons represent a substantial population and are susceptible to various stressors. In the hippocampus, 24% of GABAergic neurons are PV+, whereas 14% are SST+, 10% are CCK+, and VIP+ are less than 10% (Freund and Buzsaki, 1996). Consequently, we considered PV+ interneurons to be a more sensitive subpopulation for evaluating the effects of SE. As they showed no significant difference, we do not believe that assessing smaller subtypes, such as VIP+ or CCK+ cells, would yield significant differences.

      (2) While we often see activated microglia in hippocampal sclerosis, these cells are only slightly increased in cases without hippocampal sclerosis (which are similar to our animals), as we previously published (Peixoto-Santos et al., 2012). Astrocytes are a better marker for the epileptogenic zone, as are increased in epileptogenic zones without neuron loss and are also important for controlling neuronal activity by neurotransmitter recycling and ion buffering. In fact, our present model is very similar to the mesial temporal lobe epilepsy patients with gliosis-only, which are characterized by only presenting increased reactive astrogliosis in the hippocampus, without cell loss, and also present changes in innate inflammatory response related to the presence of reactive astrocytes (Grote et al., 2023).

      (3) We have performed these calculations and added this information to the revised manuscript.

      (4) We thank the reviewer for the word usage recommendation. Indeed, we frequently used “present” throughout the manuscript to describe the observations and patterns the groups “exhibited” or “showed”. However, we believe this is truly not the most appropriate usage in the Discussion when we describe the multivariate latent factors, as we did not “present” them, but rather, we “demonstrated” their existence and significance through our analysis. We rewrote these sentences and hope this is the point the reviewer was referring to.

      References:

      Duval F. Systematic review of the apomorphine challenge test in the assessment of dopaminergic activity in schizophrenia. Healthcare. 2023 11 (1487): 1-11. doi: 10.3390/healthcare11101487.

      Dzirasa K, Ribeiro S, Costa R, Santos LM, Lin SC, Grosmark A, Sotnikova TD, Gainetdinov RR, Caron MG, Nicolelis MAL. Dopaminergic control of sleep-wake states. Journal of Neuroscience. 2006 26:10577–10589. doi:10.1523/JNEUROSCI.1767-06.2006.

      Freund TF, Buzsáki G. Interneurons of the hippocampus. Hippocampus. 1996;6(4):347-470. doi: 10.1002/(SICI)1098-1063(1996)6:4<347::AID-HIPO1>3.0.CO;2-I. PMID: 8915675.

      Ellenbroek BA & Cools AR. Apomorphine susceptibility and animal models for psychopathology: genes and environment. Behavior Genetics. 2002 32 (5): 349-361. doi: 10.1023/a:1020214322065.

      Grote A, Heiland DH, Taube J, Helmstaedter C, Ravi VM, Will P, Hattingen E, Schüre JR, Witt JA, Reimers A, Elger C, Schramm J, Becker AJ, Delev D. 'Hippocampal innate inflammatory gliosis only' in pharmacoresistant temporal lobe epilepsy. Brain. 2023 Feb 13;146(2):549-560. doi: 10.1093/brain/awac293. PMID: 35978480; PMCID: PMC9924906.

      Peixoto-Santos JE, Galvis-Alonso OY, Velasco TR, Kandratavicius L, Assirati JA, Carlotti CG, Scandiuzzi RC, Serafini LN, Leite JP. Increased metallothionein I/II expression in patients with temporal lobe epilepsy. PLoS One. 2012;7(9):e44709. doi: 10.1371/journal.pone.0044709. Epub 2012 Sep 18. Erratum in: PLoS One. 2016;11(7):e0159122. PMID: 23028585; PMCID: PMC3445538.

      Reviewer 2

      In this manuscript, the authors employ a multilevel approach to investigate the relationship between the hippocampal-prefrontal (HPC-PFC) network and long-term phenotypes resulting from early-life seizures (ELS). Their research begins by establishing an ELS rat model and conducting behavioral and neuropathological studies in adulthood. Subsequently, the manuscript delves into testing hypotheses concerning HPC-PFC network dysfunction. While the results are intriguing, my enthusiasm is tempered by concerns related to the logical flow

      We thank the reviewer for bringing attention to the logical flow of the manuscript. Given the diverse array of behavioral and neurobiological variables examined in our study obtained through various methods and measures, we utterly recognize the utmost importance of a clear and coherent logical flow to provide a comprehensive understanding of the overall narrative.

      Our goal was to articulate the neurobiological findings in a manner that underscores their convergence of mechanisms, revealing a cohesive relationship between early-life seizure, cognitive deficits, sensorimotor impairments, abnormal network dynamics, aberrant plasticity, neuroinflammation and dysfunctional dopaminergic transmission.

      Briefly, an outline of our narrative could be summarized in the highlights:

      (1) ELS induces sensorimotor alterations and working memory deficits.

      (2) ELS does not induce neuronal loss, so neurobiological underpinnings may be molecular and functional.

      (3) ELS induces brain-wide astrogliosis and exaggerated HPC-PFC long-term plasticity.

      (4) ELS does not induce neuronal loss, so neurobiological underpinnings may be molecular and functional.

      (5) Sensorimotor alterations are more correlated to astrogliosis, while cognitive deficits to altered HPC-PFC plasticity.

      (6) ELS-induced functional alterations may also be observable in freely moving subjects. ELS induces state-dependent alterations in the HPC-PFC network dynamics, such as increased hippocampal theta and abnormal PFC gamma coordination during behavioral activity.

      (7) ELS leads to REM-ACT similarity, previously reported in hyperdopaminergic mice, indicating dopaminergic dysfunction.

      (8) ELS exhibits altered dopaminergic transmission and behavioral sensitivity that mirror the initial sensorimotor findings.

      (9) The literature establishes an inverted-U relationship between dopamine and cognition and PFC plasticity, which may explain our finding of an inverted-U relationship between working memory and HPC-PFC LTP across CTRL and ELS rats.

      To address this concern, we have made revisions to enhance the logical flow, ensuring a more seamless transition between the different sections of the Results by presenting clearer links between observations and following investigations. We hope these changes contribute to a more straightforward rationale and easily understandable presentation of our hypotheses and results.

      Focus on Correlations: The manuscript primarily highlights correlations as the most significant findings. For instance, it demonstrates that ELS induces cognitive and sensorimotor impairments. However, it falls short of elucidating why these deficits are specifically linked to HPC-PFC synaptic plasticity/network. Furthermore, the manuscript mentions the involvement of other brain regions like the thalamus in the long-term outcomes of ELS based on immunohistochemistry data.

      Thank you for your insightful comments, which allowed us to provide further clarification on our study's focus and findings. Our primary goal was to delve into the electrophysiological alterations within the HPC-PFC pathway. The rationale behind this choice lies in the hypothesis that, even in the absence of significant neuronal loss, functional changes in circuits closely linked to the cognitive and behavioral aspects under investigation could be identified.

      While we concentrated our electrophysiological investigation on the HPC-PFC pathway due to its well-established functional correlates in existing literature, it is essential to highlight that our data reveal broader alterations in neural circuitry. Notably, we observed an increase in GFAP in the entorhinal cortex and thalamic reticular nucleus, along with changes in the dopaminergic release within the VTA-NAc pathway. These findings suggest that the impact of early-life seizures extends beyond the HPC-PFC circuit.

      While we recognize the relevance of other brain circuits in the outcomes of ELS, we argue for a specific role of the HPC-PFC circuit in the outcomes of ELS. We will detail the supporting evidence and arguments that specifically link the HPC-PFC function to our ELS-related observations in a later comment regarding the "overinterpretation" of the HPC-PFC role. To better convey these important nuances, we have made specific modifications to the results and in the discussion section to underscore the broader implications of our findings, providing a more comprehensive understanding of the study's scope and outcomes.

      […]This raises questions about the subjective nature and persuasiveness of the statistical studies presented.

      All statistical analyses were carefully applied based on the literature and following well-established precepts and precautions. Specifically, we constructed the experimental design for univariate inferential statistics for the data related to behavioral tests, synaptic plasticity, immunohistochemistry, oscillatory activity, and dopaminergic sensitization. However, we also submitted our data to multivariate statistical analysis, which is recommended in cases with a considerable amount of data, and intend to investigate possible hidden effects. In this situation, multivariate analyses are inherently exploratory due to the possibility of using multiple measurements for each phenomenon investigated. Nevertheless, their application is not subjective and follows the same statistical rigor as univariate analyses. We firmly believe that abstaining from exploring these data, would not reach the full potential of this analytical method in dissecting the multidimensional associations within our dataset. In order to eliminate any doubt regarding the objectivity in the choice and application of statistics, we carefully rewrote the methods, highlighting the details of statistical rigor even more.

      Sample Size Concerns: The manuscript raises concerns about the adequacy of sample sizes in the study. The initial cohort for acute electrophysiology during ELS induction comprised only 5 rats, without a control group. Moreover, the behavioral tests involved 11 control and 14 ELS rats, but these same cohorts were used for over four different experiments. Subsequent electrophysiology and immunohistochemistry experiments used varying numbers of rats (7 to 11). Clarification is needed regarding whether these experiments utilized the same cohort and why the sample sizes differed. A power analysis should have been performed to justify sample sizes, especially given the complexity of the statistical analyses conducted.

      We appreciate the reviewer's thoroughness and considerations regarding the sample sizes used in our study. The concerns raised about statistical robustness seem to stem from a lack of clarity in delineating the rat cohorts used in each experiment. It is encouraging to note that several studies in the field of neurophysiology, employing similar analyses, utilize a sample size similar to what was used in our research. The choice of the sample size was based on a thorough analysis of the existing literature, considering specific experimental demands, the complexity of employed techniques, and the need to achieve statistically robust results. In response to these concerns and to enhance clarity on the sample sizes, we have made several modifications (highlighted in red) in the text. Below, we provide details for each animal cohort utilized:

      Cohort 1 - Acute Electrophysiology

      The decision to use only 5 animals without a control group for acute electrophysiological recording aimed specifically to confirm that the injection of lithium-pilocarpine would induce both behavioral and electrographic seizures. It is crucial to note that this was a descriptive result and a methodological control of the ELS model. Besides, no statistical test or further analysis was conducted on these data. We maintain the belief that a group of 5 animals is sufficient to demonstrate that the protocol induces electrographic seizures, and introducing a control group was considered unnecessary to show that saline injection does not induce electrographic seizures.

      Cohort 2 - Behavior, LTP Recording, and Immunohistochemistry

      Initially, 14 (ELS) and 11 (CTRL) rats were used for behavior assessment. The reduction in sample size for LTP and immunohistochemistry experiments was influenced by practical challenges, including mortality during LTP surgery and issues with immunohistochemical staining that hindered a proper analysis for some animals.

      Cohort 3 - Chronic Freely-Moving Electrophysiology

      A new cohort of animals (n=6 and 9 for CTRL and ELS, respectively) was used specifically for freely-moving electrophysiological data.

      Cohort 4 - Behavioral Sensitization to Psychostimulants

      A fourth cohort was utilized for assessing behavioral sensitization to psychostimulants (CTRL n=15 and ELS n=14). The reduced sample size for neurotransmitter analysis (CTRL n=8 and ELS n=9) was a deliberate selection of a subsample to ensure a sufficient sample for quantification while maintaining statistical validity

      Overinterpretation of HPC-PFC Network Dysfunction: The manuscript potentially overinterprets the role of HPC-PFC network dysfunction based on the results.

      We appreciate the insight from Reviewer #2 regarding the potential overinterpretation of the role of the hippocampal-prefrontal cortex (HPC-PFC) network dysfunction in the various alterations observed after ELS.

      The significance of HPC-PFC plasticity and network function has been extensively documented concerning cognitive, affective, and sensorimotor functions, as well as in models of neuropsychiatric diseases. Our recent review (Ruggiero et al., 2021) compiles these findings. Specifically, the HPC-PFC network has been linked to spatial working memory through a series of causal and correlational studies conducted by Floresco et al. and Gordon et al. These findings make the HPC-PFC pathway a plausible candidate for underlying alterations associated with working memory, consistent with our observation of exaggerated HPC-PFC LTP associated with poorer performance in the ELS group. Regarding the immunohistochemical observations, we concur with Reviewer #2 that these findings suggest broader-scale brain alterations related to sensorimotor dysfunction beyond the HPC-PFC circuitry. Surely, we acknowledge that these large-scale alterations may underlie brain-wide network functional changes.

      In our network dynamics study arm, we investigated HPC-PFC oscillatory activity, allowing us to discuss potential relationships between abnormal plasticity (verified in the second study arm) and network dynamics. It is important to note that while there is some anatomical specificity to the LFPs recorded in the HPC and PFC, these activities may represent larger-scale limbic-cortical dynamics. The intermediate HPC exhibits a significant influence from both dorsal and ventral HPC, and the prelimbic PFC is intricately related to both hippocampal and thalamic oscillations exhibiting under-demand state-dependent synchrony. Additionally, the state maps used in our study were initially described to distinguish states at a global forebrain network level. Even in our past studies, we have described HPC-PFC patterns of network activity (Marques et al., 2022a) that later were found to represent a part of a brain-wide synchrony pattern (Marques et al., 2022b). However, most of our findings on oscillatory dynamics were centered around theta oscillations, a well-established brain-wide activity that originates and spreads from the hippocampus and are present in the HPC-PFC circuit during activity.

      In conclusion, we believe the correlations between HPC-PFC LTP and working memory, as well as the specific alterations of theta coordinated activity, support a particular role of the HPC-PFC network dysfunction in the effects of ELS. However, the brain-wide immunochemical alterations are plausible indications of larger-scale dysfunctional networks. To address this issue, we emphasized in the discussion of network findings that the immunohistochemical and neurochemical findings endorse the need to investigate ELS effects on larger networks.

      Notably, cognitive deficits are described as subtle, with no evidence of learning deficits and only faint working memory impairments. However, sensorimotor deficits show promise. Consequently, it's essential to justify the emphasis on the HPC-PFC network as the primary mechanism underlying ELS-associated outcomes, especially when enhanced LTP is observed. Additionally, the manuscript seems to sideline neuropathological changes in the thalamus and the thalamus-to-PFC connection. The analysis lacks a direct assessment of the causal relationship between HPC-PFC dysfunction and ELS-associated outcomes, leaving a multitude of multilevel analyses yielding potential correlations without easily interpretable results.

      We thank Reviewer #2 for the thorough review and insightful comments. To better grasp the context, it is crucial to consider this characterization within the scope of our experimental design and expected outcomes. Unlike epilepsy models involving adult animals or interventions causing pronounced neuronal loss and structural modifications, our study was intentionally designed to explore moderate behavioral alterations. In fact, the mild behavioral alterations observed in ELS models and the lack of neuronal loss guided our focus on investigating changes in HPC-PFC communication.

      While our observed cognitive deficits may be milder compared to certain models, it is imperative to underscore their robustness and clinical relevance. These findings have been consistently replicated globally across various experimental models, encompassing ELS induced by hyperthermia (Chang et al., 2003; Kloc et al., 2022), kainic acid (Statsfrom et al. 1993), flurothyl (Karnam et al., 2009a; 2009b), and hypoxia (Najafian et al., 2021; Hajipour et al., 2023). Mild cognitive deficits were also evident by other research groups using the pilocarpine model in P12 (Mikulecká et al., 2019; Kubová et al., 2013; Kubová et al., 2002). Furthermore, our group replicated the working memory deficit results using an alternative paradigm (the T-maze) and a different rat strain (Sprague Dawley), enhancing the reliability of our observations (D’Agosta et al., 2023).

      The clinical perspective gains importance, considering that cognitive effects of ELS may be less severe than those in patients with long-term epilepsy. In fact, the majority of patients with childhood epilepsy exhibit mild cognitive impairment as the most common grade of severity - more than two times the rate of severe cognitive impairment (Sorg et al., 2022). Investigating the mechanisms underlying these mild cognitive changes is crucial for shedding light on neurobiological aspects not fully understood, thereby expanding our comprehension of the consequences of ELS.

      We recognize the challenges associated with conducting causal experiments in neuroscience, especially in long-term and chronic alterations as seen in our model. Isolating modifications of specific activities is indeed intricate. However, it's essential to acknowledge that neuroscience progress has not solely relied on causal experiments but has significantly advanced through correlational observations. Our findings serve as a foundational step in comprehending the repercussions of ELS, proposing mechanisms and circuits that necessitate further in-depth dissection and study in the future. We have integrated these considerations into the discussion section of the manuscript to enhance clarity.

      Overall, while the manuscript presents intriguing findings related to the HPC-PFC network and ELS outcomes, it requires a more rigorous experimental design[…]

      We thank the reviewer for acknowledging our intriguing findings. Regarding the experimental design, we are confident that all the manuscript hypotheses, design, and execution of experiments were rigorously based on the literature and carried out with all necessary controls. As stated earlier, we constructed the experimental design for univariate inferential statistics and explored associations between variables using multivariate statistics. Specifically, we achieved a rigorously experimental design following a series of guidelines. First, the planning of the sample size in each experiment and their respective controls were based on mild effects from the ELS literature. As previously indicated, the only experiment with one group was just the description of the behavioral effects and electrographic seizures after the acute injection of lithium-pilocarpine. Given the exhaustive replication of these data in the ELS literature, this result was presented descriptively as a methodological control. Second, detailed descriptions of statistics were made in both methods and results, always indicating positive and negative results. Notably, the experimental designs used in the work do not correspond to any novelty or radicalization, strictly following the literature of the field. However, new indications and references about the experimental accuracy were added to the manuscript to resolve any doubts regarding objectivity.

      References:

      Chang YC, Huang AM, Kuo YM, Wang ST, Chang YY, Huang CC. Febrile seizures impair memory and cAMP response-element binding protein activation. Ann Neurol. 2003 Dec;54(6):706-18. doi: 10.1002/ana.10789. PMID: 14681880.

      D'Agosta R, Prizon T, Zacharias LR, Marques DB, Leite JP, Ruggiero RN. Alterations in hippocampal-prefrontal cortex connectivity are associated with working memory impairments in rats subjected to early-life status epilepticus. In: NEWROSCIENCE INTERNATIONAL SYMPOSIUM, 2023, Ribeirão Preto. Poster.

      Hajipour S, Khombi Shooshtari M, Farbood Y, Ali Mard S, Sarkaki A, Moradi Chameh H, Sistani Karampour N, Ghafouri S. Fingolimod Administration Following Hypoxia Induced Neonatal Seizure Can Restore Impaired Long-term Potentiation and Memory Performance in Adult Rats. Neuroscience. 2023 May 21;519:107-119. doi: 10.1016/j.neuroscience.2023.03.023. Epub 2023 Mar 28. PMID: 36990271.

      Karnam HB, Zhou JL, Huang LT, Zhao Q, Shatskikh T, Holmes GL. Early life seizures cause long-standing impairment of the hippocampal map. Exp Neurol. 2009 Jun;217(2):378-87. doi: 10.1016/j.expneurol.2009.03.028. Epub 2009 Apr 2. PMID: 19345685; PMCID: PMC2791529.

      Karnam HB, Zhao Q, Shatskikh T, Holmes GL. Effect of age on cognitive sequelae following early life seizures in rats. Epilepsy Res. 2009 Aug;85(2-3):221-30. doi: 10.1016/j.eplepsyres.2009.03.008. Epub 2009 Apr 22. PMID: 19395239; PMCID: PMC2795326.

      Kubová H, Mareš P. Are morphologic and functional consequences of status epilepticus in infant rats progressive? Neuroscience. 2013 Apr 3;235:232-49. doi: 10.1016/j.neuroscience.2012.12.055. Epub 2013 Jan 7. PMID: 23305765.

      Kloc ML, Marchand DH, Holmes GL, Pressman RD, Barry JM. Cognitive impairment following experimental febrile seizures is determined by sex and seizure duration. Epilepsy Behav. 2022 Jan;126:108430. doi: 10.1016/j.yebeh.2021.108430. Epub 2021 Dec 10. PMID: 34902661; PMCID: PMC8748413.

      Kubová H, Mares P, Suchomelová L, Brozek G, Druga R, Pitkänen A. Status epilepticus in immature rats leads to behavioural and cognitive impairment and epileptogenesis. Eur J Neurosci. 2004 Jun;19(12):3255-65. doi: 10.1111/j.0953-816X.2004.03410.x. PMID: 15217382.

      Marques DB, Ruggiero RN, Bueno-Junior LS, Rossignoli MT, and Leite JP. Prediction of Learned Resistance or Helplessness by Hippocampal-Prefrontal Cortical Network Activity during Stress. The Journal of Neuroscience. 2022a 42 (1): 81-96.. https://doi.org/10.1523/jneurosci.0128-21.2021.

      Marques DB, Rossignoli MT, Mesquita BDA, Prizon T, Zacharias LR, Ruggiero RN and Leite JP. Decoding fear or safety and approach or avoidance by brain-wide network dynamics abbreviated. bioRxiv. 2022b https://doi.org/10.1101/2022.10.13.511989.

      Mikulecká A, Druga R, Stuchlík A, Mareš P, Kubová H. Comorbidities of early-onset temporal epilepsy: Cognitive, social, emotional, and morphologic dimensions. Exp Neurol. 2019 Oct;320:113005. doi: 10.1016/j.expneurol.2019.113005. Epub 2019 Jul 3. PMID: 31278943.

      Najafian SA, Farbood Y, Sarkaki A, Ghafouri S. FTY720 administration following hypoxia-induced neonatal seizure reverse cognitive impairments and severity of seizures in male and female adult rats: The role of inflammation. Neurosci Lett. 2021 Mar 23;748:135675. doi: 10.1016/j.neulet.2021.135675. Epub 2021 Jan 28. PMID: 33516800.

      Ruggiero RN, Rossignoli MT, Marques DB, de Sousa BM, Romcy-Pereira RN, Lopes-Aguiar C and Leite JP. Neuromodulation of Hippocampal-Prefrontal Cortical Synaptic Plasticity and Functional Connectivity: Implications for Neuropsychiatric Disorders. Frontiers in Cellular Neuroscience. 2021 15 (October): 1–23. https://doi.org/10.3389/fncel.2021.732360.

      Sorg AL, von Kries R, Borggraefe I. Cognitive disorders in childhood epilepsy: a comparative longitudinal study using administrative healthcare data. J Neurol. 2022 Jul;269(7):3789-3799. doi: 10.1007/s00415-022-11008-y. Epub 2022 Feb 15. PMID: 35166927; PMCID: PMC9217877.

      Stafstrom CE, Chronopoulos A, Thurber S, Thompson JL, Holmes GL. Age-dependent cognitive and behavioral deficits after kainic acid seizures. Epilepsia. 1993 May-Jun;34(3):420-32. doi: 10.1111/j.1528-1157.1993.tb02582.x. PMID: 8504777.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      This is a short but important study. Basically, the authors show that α-synuclein overexpression's negative impact on synaptic vesicle recycling is mediated by its interaction with E-domain containing synapsins. This finding is highly relevant for synuclein function as well as for the pathophysiology of synucleinopathies. While the data is clear, functional analysis is somewhat incomplete.

      (1) The authors should present a clearer dissociation of endocytosis and exocytosis under the various conditions they study. They should quantify the rate of rise and decay of pHluorin signals. 2. In addition, I strongly recommend a few additional experiments with and without a vATPase inhibitor such as bafilomycin to estimate the relative effects on exo- vs. endocytosis. As the authors are aware bafilomycin will mask the re-acidification /endocytosis component, thus revealing pure exocytosis and thus enabling quantification of endocytosis with minimal contamination from exocytosis.

      In the revised version, we analyzed and quantified exocytosis and endocytosis separately, with bafilomycin experiments, as the reviewer suggested (new data, Fig. 1- Fig. Supp. 1A-B). Overexpression of human alpha-synuclein only attenuated exocytosis in neurons that also expressed synapsins (WT neurons and synapsin TKO neurons transduced with synapsin Ia). In parallel, we also examined endocytosis by calculating the time-constant of the decay in the fluorescence of sypHy during the endocytotic phase (Fig. 1- Fig. Supp. 1C-E). Previous studies have shown that after brief stimulus-trains – like those used in our study (20Hz/300AP) – most endocytosis occurs after the cessation of stimulation 1. Expression of human alpha-synuclein did not alter the endocytosis time-constant in any of our experiments. To summarize, the interaction of alpha-synuclein with the synapsin E domain was required for alpha-synuclein induced attenuation of exocytosis, but not endocytosis.

      Reviewer #2

      ...The paper will be improved significantly if additional experiments are added to expand and provide a more mechanistic understanding of the effect of α-syn and the intricate interplay between synapsin, α-syn, and the SV. For an enthusiastic reader, the manuscript as it looks now with only 3 figures, ends prematurely. Some of the experiments above or others could complement, expand and strengthen the current manuscript, moving it from a short communication describing the phenomenon to a coherent textbook topic. Nevertheless, this work provides new and exciting evidence for the regulation of neurotransmitter release and its regulation by synapsin and α-syn.

      (1) Did the authors try to attach E-domain for example to synapsin Ib and restore α-syn inhibition with synapsin Ib-E?

      This is an interesting idea, but in previous studies, we found that synapsin Ib does not associate with synaptic vesicles2, so it will not be present at the right location to be able to restore alpha-synuclein induced synaptic attenuation. We have also seen that this mis-localization alters synaptic properties (unpublished).

      (2) Was the expression level of Synapsin-IaScrE examined and compared to WT Synapsin-Ia in Fig 3?

      Yes, this data is now shown in Fig. 3-Fig. Supp. 1.

      (3) Were SVs dispersed in α-syn overexpression as predicted?

      We interpret the reviewer’s question and reasoning as follows. If alpha-synuclein binds to the E-domain of synapsin, a prediction in the alpha-synuclein over-expression scenario is that the overabundance of alpha-synuclein molecules would bind to and sequester the E-domain synapsins away from synaptic vesicles. In the absence of E-domain synapsins, the synaptic-vesicle clustering effects of synapsins would be lost, and there would be dispersion of synaptic vesicles. We tested this prediction, which is now shown in an additional figure (new data, Fig. 4). Indeed, the AAV-mediated over-expression of alpha-synuclein leads to a dispersion of synaptic vesicles, and this dispersion is dependent on synapsins Ia and Ib, but not IIa and IIb (please see Fig. 4D-E in the revised manuscript). Appropriate text is also added, starting with “Previous studies have shown that loss of all synapsins...” presents this data and interprets it.

      (4) How does this study coincide with the effects of α-syn on fusion pore and endocytosis? This should be at least discussed. It is also possible that the effects of α-syn on endocytosis might affect the results as if endocytosis is affected, SVs number and distribution will be also affected.

      It is difficult to reconcile our data with the idea that alpha-synuclein facilitates fusion-pore opening, as proposed by the Edwards lab 3. In fact, its difficult to reconcile this concept with their own previous data, showing that alpha-synuclein over-expression attenuates SV-recycling 4. As mentioned above, modulation of endocytosis does not seem to be a major factor in our experiments, though this does not rule out a physiologic role for alpha-synuclein in endocytosis, since all our experiments are based on over-expression paradigms. Future experiments looking at phenotypes after acute alpha-synuclein knockdown may provide more clarity. In any case, there are many purported roles of alpha-synuclein, and this is now mentioned in the last paragraph (starting with Additionally, -syn has been implicated…”

      (5) What happened after stimulation when synapsin is detached from SV, does α-syn continues to be linked to it?

      The fate of alpha-synuclein after stimulation is unclear in our experiments. Previous experiments suggest that while both synapsin and alpha-synuclein detach from the SV cluster during stimulation, synapsin returns to synapses while alpha-synuclein does not 5. However, our more recent experiments (unpublished) suggest that the activity-induced dispersion of alpha-synuclein might be phosphorylation-dependent, and that over-expression of alpha-synuclein may not be the best setting to evaluate protein dispersion. We hope to answer this question more rigorously using alpha-synuclein knock-in constructs.

      (6) The experiment with E-domain fused to syPhy assumes that α-syn will still be bound to the SV. So how does α-syn inhibit ST?

      The goal of this experiment was to force the synapsin E-domain to be in a location where it would normally be present – i.e. surface of the synaptic vesicle – by tagging it to sypHy (sypHy-E), and ask if this forced-retention would be sufficient to reinstate the alpha-synuclein mediated attenuation of SV-recycling (as shown in Fig. 3F, it does). Please note that the sypHy-E in these experiments does target to the synapses (new data, Fig. 3-Fig. Supp. 2D). In this context, we are not sure what the reviewer means by “So how does a-syn inhibit synaptic transmission?” We don’t think that alpha-synuclein needs to unbind from the SVs in order to inhibit synaptic transmission. Overall, we think that alpha-synuclein needs to cooperate with synapsins to perform its function, but as mentioned above and in the manuscript, the precise role of alpha-synuclein in this process is still unclear.

      (7) An interesting experiment will be the expression of the isolated E-domain and examining blockage of α-syn inhibition and disruption of synapsin- α-syn interaction. Have the authors examined it as was done in other models?

      We did do the experiment where we only over-expressed the isolated synapsin E-domain in neurons. We were thinking that perhaps the E-domain would have a dominant-negative effect on SV-clustering, as it did in the lamprey and other model-systems, where the E-peptide was directly injected into the axon. However, we found that in cultured hippocampal neurons, the over-expressed E-domain behaves like a soluble protein and is not enriched in synapses (see new data, Fig. 3-Fig. Supp. 2B). Also, the over-expressed E-domain cannot reinstate the synaptic attenuation induced by alpha-synuclein (new data, Fig. 3-Fig. Supp. 2C), likely because the E-domain does not target to synapses. Actually, this is why we did the syPhy-E domain experiment in the first place, to ensure that the E-domain was in the right location to have an effect.

      (8) A schematic model/scheme providing a mechanistic view of the interplay between the proteins is essential and can improve the paper.

      The only model we can confidently make right now would be stick-figures showing the site where alpha-synuclein C-terminus binds to synapsin, which is obviously not very insightful. As noted above (and in the revised version), several different functions have been attributed to alpha-synuclein, and the precise role of alpha-synuclein/synapsin interactions in regulating the SV-cycle is unclear. We hope to create a better model after getting some more data from us and our colleagues working on this challenging problem.

      References

      (1) Kononenko NL & Haucke V. (2015) Molecular mechanisms of presynaptic membrane retrieval and synaptic vesicle reformation. Neuron 85, 484-496.

      (2) Gitler D, Xu Y, Kao H-T, Lin D, Lim S, Feng J, Greengard P & Augustine GJ. (2004) Molecular Determinants of Synapsin Targeting to Presynaptic Terminals. J. Neurosci. 24, 3711-3720.

      (3) Logan T, Bendor J, Toupin C, Thorn K & Edwards RH. (2017) α-Synuclein promotes dilation of the exocytotic fusion pore. Nat Neurosci 20, 681-689.

      (4) Nemani VM, Lu W, Berge V, Nakamura K, Onoa B, Lee MK, Chaudhry FA, Nicoll RA & Edwards RH. (2010) Increased expression of alpha-synuclein reduces neurotransmitter release by inhibiting synaptic vesicle reclustering after endocytosis. Neuron 65, 66-79.

      (5) Fortin DL, Nemani VM, Voglmaier SM, Anthony MD, Ryan TA & Edwards RH. (2005) Neural activity controls the synaptic accumulation of alpha-synuclein. J Neurosci 25, 10913-10921.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important work identifies a previously uncharacterized capacity for songbirds to recover vocal targets even without sensory experience. While the evidence supporting this claim is solid, with innovative experiments exploring vocal plasticity in deafened birds, additional behavioral controls and analyses are necessary to shore up the main claims. If improved, this work has the potential for broad relevance to the fields of vocal and motor learning.

      We were able to address the requests for additional behavioral controls about the balancing of the groups (reviewer 1) and the few individual birds that showed a different behavior (reviewer 2) without collecting any further data. See our detailed replies below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zai et al test if songbirds can recover the capacity to sing auditory targets without singing experience or sensory feedback. Past work showed that after the pitch of targeted song syllables is driven outside of birds' preferred target range with external reinforcement, birds revert to baseline (i.e. restore their song to their target). Here the authors tested the extent to which this restoration occurs in muted or deafened birds. If these birds can restore, this would suggest an internal model that allows for sensory-to-motor mapping. If they cannot, this would suggest that learning relies entirely on feedback-dependent mechanisms, e.g. reinforcement learning (RL). The authors find that deafened birds exhibit moderate but significant restoration, consistent with the existence of a previously under-appreciated internal model in songbirds.

      Strengths:

      The experimental approach of studying vocal plasticity in deafened or muted birds is innovative, technically difficult, and perfectly suited for the question of feedback-independent learning. The finding in Figure 4 that deafened birds exhibit subtle but significant plasticity toward restoration of their pre-deafening target is surprising and important for the songbird and vocal learning fields, in general.

      Weaknesses:

      The evidence and analyses related to the directed plasticity in deafened birds are confusing, and the magnitude of the plasticity is far less than the plasticity observed in control birds with intact feedback. The authors acknowledge this difference in a two-system model of vocal plasticity, but one wonders why the feedback-independent model, which could powerfully enhance learning speed, is weak in this songbird system.

      We fully agree with the reviewer. This surprising weakness applies to birds’ inability rather than our approach for characterizing it.

      There remains some confusion about the precise pitch-change methods used to study the deafened birds, including the possibility that a critical cohort of birds was not suitably balanced in a way where deafened birds were tested on their ability to implement both pitch increases and decreases toward target restoration.

      Both deaf groups were balanced: (dLO and WNd) were balanced in that half of the birds (5/10 WNm and 4/8 dLO) shifted their pitch up (thus target restoration corresponded to decreasing pitch) and half of the birds (5/10 WNd and 4/8 dLO) shifted their pitch down (thus target restoration corresponded to increasing pitch), see Methods.

      To clarify the precise pitch-change method used, we added to the methods an explanation about why we used the sensitivity index 𝒅′ in Fig. 4:

      We used sensitivity 𝒅′ relative to the last 2 h of WN/LO instead of NRP because we wanted to detect a pitch change, which is the realm of detection theory, i.e. 𝒅′. Furthermore, by measuring local changes in pitch relative to the last 2 h of WN/LO reinforcement, our measurements are only minimally affected by the amount of reinforcement learning that might have occurred during this 2 h time window — choosing an earlier or longer window would have blended reinforced pitch changes into our estimates. Last but not least, changes in the way in which we normalized 𝒅’ values — dividing by 𝑺𝑩, — or using the NRP relative to the last 2 h of WN/LO did not qualitatively change the results shown in Fig. 4D.

      Reviewer #2 (Public Review):

      Summary:

      This paper investigates the role of motor practice and sensory feedback when a motor action returns to a learned or established baseline. Adult male zebra finches perform a stereotyped, learned vocalization (song). It is possible to shift the pitch of particular syllables away from the learned baseline pitch using contingent white noise reinforcement. When the reinforcement is stopped, birds will return to their baseline over time. During the return, they often sing hundreds of renditions of the song. However, whether motor action, sensory feedback, or both during singing is necessary to return to baseline is unknown.

      Previous work has shown that there is covert learning of the pitch shift. If the output of a song plasticity pathway is blocked during learning, there is no change in pitch during the training. However, as soon as the pathway is unblocked, the pitch immediately shifts to the target location, implying that there is learning of the shift even without performance. Here, they ask whether the return to baseline from such a pitch shift also involves covert or overt learning processes. They perform a series of studies to address these questions, using muting and deafening of birds at different time points. learning.

      Strengths:

      The overall premise is interesting and the use of muting and deafening to manipulate different aspects of motor practice vs. sensory feedback is a solid approach.

      Weaknesses:

      One of the main conclusions, which stems primarily from birds deafened after being pitch-shifted using white noise (WNd) birds in comparison to birds deafened before being pitchshifted with light as a reinforcer (LOd), is that recent auditory experience can drive motor plasticity even when an individual is deprived of such experience. While the lack of shift back to baseline pitch in the LOd birds is convincing, the main conclusion hinges on the responses of just a few WNd individuals who are closer to baseline in the early period. Moreover, only 2 WNd individuals reached baseline in the late period, though neither of these were individuals who were closer to baseline in the early phase. Most individuals remain or return toward the reinforced pitch. These data highlight that while it may be possible for previous auditory experience during reinforcement to drive motor plasticity, the effect is very limited. Importantly, it's not clear if there are other explanations for the changes in these birds, for example, whether there are differences in the number of renditions performed or changes to other aspects of syllable structure that could influence measurements of pitch.

      We thank the reviewer for these detailed observations. We looked into the reviewer’s claim that our main conclusion of revertive pitch changes in deaf birds with target mismatch experience hinges on only few WNd birds in the early period.

      When we remove the three birds that were close to baseline (NRP=0) in the early period, we still get the same trend that WNd birds show revertive changes towards baseline: Early 𝒅’ = −𝟎. 𝟏𝟑, 𝒑 = 𝟎. 𝟐𝟒, tstat = −𝟎.𝟕𝟒, 𝒅𝒇 = 𝟔, 𝑵 = 𝟕 birds, one-sided t-test of H0: 𝒅′ = 𝟎; Late 𝒅’ = −𝟏. 𝟐𝟔, 𝒑 = 𝟎. 𝟎𝟖, tstat = −𝟏.𝟔𝟑, 𝒅𝒇 = 𝟔, 𝑵 = 𝟕 birds, one-sided t-test of H0: 𝒅′ = 𝟎. Furthermore, even without these three birds, bootstrapping the difference between WNd and dC birds shows the same trend in the early period (p=0.22) and a significant reversion in the late period (p<0.001). Thus, the effect of reversion towards baseline in the late period is robustly observed on a population level, even when discounting for three individual birds that the reviewer suspected would be responsible for the effect.

      Moreover, note that there are not two but three WNd individuals that reached baseline in the late period (see Figure 2C, D). One of them was already close to baseline in the early period and another one was already relatively close, too.

      Also, the considerable variability among birds is not surprising, it is to be expected that the variability across deaf birds is large because of their ongoing song degradation that might lead to a drift of pitch over time since deafening.

      Last but not least, see also our multivariate model (below).

      With regards to the “differences in the number of renditions” that could explain pitch changes: Deaf birds sing less after deafening than hearing birds: they sing less during the first 2 hours (early): 87±59 renditions (WNd) and 410±330 renditions (dLO) compared to 616±272 renditions (control birds). Also, WN deaf birds sing only 4300±2300 motif renditions between the early and late period compared to the average of 11000±3400 renditions that hearing control birds produce in the same time period. However, despite these differences, when we provide WNd birds more time to recover, namely 9 days after the early period, they sung on average 12000±6000 renditions, yet their NRP was still significantly different from zero (NRP = 0.37, p=0.007, tstat=3.47, df=9). Thus, even after producing more practice songs, deaf birds do not recover baseline pitch and so the number of songs alone cannot explain why deaf birds do not fully recover pitch. We conclude that auditory experience seems to be necessary to recover song.

      We added this information to the Results.

      In this context, note that the interesting part of our work is not that deaf birds do not fully recover, but that they recover anything at all (“main conclusion”, Fig. 4). The number of songs does not explain why deaf birds with mismatch experience (WNd, singing the least and singing significantly less than control birds, p=2.3*10-6, two-tailed t-test) partially revert song towards baseline, unlike deaf birds without mismatch experience (dLO, singing significantly more than WNd birds, p=0.008, and indistinguishable from control birds, p=0.1). We added this information to the Results section.

      With regards to ‘other aspects of syllable structure’: We did not look into this. Regardless of the outcome of such a hypothetical analysis, whether other syllable features change is irrelevant for our finding that deaf birds do not recover their target song. Nevertheless, note that in Zai et al. 2020 (supplementary Figure 1), we analyzed features other than pitch change in deaf birds. Absolute change in entropy variance was larger in deaf birds than in hearing birds, consistent with the literature on song degradation after deafening (Lombardino and Nottebohm, 2000, Nordeen and Nordeen 2010 and many others). In that paper, we found that only pitch changes consistently along the LO direction. All other features that we looked at (duration, AM, FM and entropy) did not change consistently with the LO contingency. We expect that a similar result would apply for the changes across the recovery period in WNd and dLO birds, i.e., that song degradation can be seen in many features and that pitch is the sole feature that changes consistently with reinforcement (LO/WN) direction.

      While there are examples where the authors perform direct comparisons between particular manipulations and the controls, many of the statistical analyses test whether each group is above or below a threshold (e.g. baseline) separately and then make qualitative comparisons between those groups. Given the variation within the manipulated groups, it seems especially important to determine not just whether these are different from the threshold, but how they compare to the controls. In particular, a full model with time (early, late), treatment (deafened, muted, etc), and individual ID (random variable) would substantially strengthen the analysis.

      We performed a full model of the NRP as the reviewer suggests and it supports our conclusions: Neither muting, deafening nor time without practice between R and E windows have a significant effect on pitch in the E window, but the interaction between deafening and time (late, L) results in a significant pitch change (fixed effect 0.67, p=2*10-6), demonstrating that deaf birds are significantly further away from baseline (NRP=0) than hearing birds in late windows, thereby confirming that birds require auditory feedback to recover a distant pitch target. Importantly, we find a significant fixed effect on pitch in the direction of the target with mismatch experience (fixed effect -0.37, p=0.006), supporting our finding that limited vocal plasticity towards a target is possible even without auditory feedback.

      We included this model as additional analysis to our manuscript.

      The muted birds seem to take longer to return to baseline than controls even after they are unmuted. Presumably, there is some time required to recover from surgery, however, it's unclear whether muting has longer-term effects on syrinx function or the ability to pass air. In particular, it's possible that the birds still haven't recovered by 4 days after unmuting as a consequence of the muting and unmuting procedure or that the lack of recovery is indicative of an additional effect that muting has on pitch recovery. For example, the methods state that muted birds perform some quiet vocalizations. However, if birds also attempt to sing, but just do so silently, perhaps the aberrant somatosensory or other input from singing while muted has additional effects on the ability to regain pitch. It would also be useful to know if there is a relationship between how long they are muted and how quickly they return to baseline.

      We agree, it might be the case that muting has some longer-term effects that could explain why WNm birds did not recover pitch 4 days after unmuting. However, if such an effect exists, it is only weak. Arguing against the idea that a longer muting requires longer recovery, we did not find a correlation between the difference in NRP between early and late and 1. the duration the birds were muted (correlation coefficient = -0.50, p=0.20), and 2. the number of renditions the birds sung between early and late (correlation coefficient = 0.03, p=0.95), and 3. the time since they last sung the target song (last rendition of baseline, correlation coefficient = -0.43, p=0.29). Neither did we find a correlation between the early NRP and the time since the muting surgery (correlation coefficient = 0.26, p=0.53), suggesting that the lack of pitch recovery while muted was not due to a lingering burden of the muting surgery. We added these results to the results section.

      In summary, we used the WNm group to assess whether birds can recover their target pitch in the absence of practice, i.e. whether they recovered pitch in the early time period. Whether or not some long-term effect of the muting/unmuting procedure affects recovery does not impair the main finding we obtained from WNm birds in Figure 1 (that birds do not recover without practice).

      Reviewer #3 (Public Review):

      Summary:

      Zai et al. test whether birds can modify their vocal behavior in a manner consistent with planning. They point out that while some animals are known to be capable of volitional control of vocalizations, it has been unclear if animals are capable of planning vocalizations -that is, modifying vocalizations towards a desired target without the need to learn this modification by practicing and comparing sensory feedback of practiced behavior to the behavioral target. They study zebra finches that have been trained to shift the pitch of song syllables away from their baseline values. It is known that once this training ends, zebra finches have a drive to modify pitch so that it is restored back to its baseline value. They take advantage of this drive to ask whether birds can implement this targeted pitch modification in a manner that looks like planning, by comparing the time course and magnitude of pitch modification in separate groups of birds who have undergone different manipulations of sensory and motor capabilities. A key finding is that birds who are deafened immediately before the onset of this pitch restoration paradigm, but after they have been shifted away from baseline, are able to shift pitch partially back towards their baseline target. In other words, this targeted pitch shift occurs even when birds don't have access to auditory feedback, which argues that this shift is not due to reinforcement-learning-guided practice, but is instead planned based on the difference between an internal representation of the target (baseline pitch) and current behavior (pitch the bird was singing immediately before deafening).

      The authors present additional behavioral studies arguing that this pitch shift requires auditory experience of the song in its state after it has been shifted away from baseline (birds deafened early on, before the initial pitch shift away from baseline, do not exhibit any shift back towards baseline), and that a full shift back to baseline requires auditory feedback. The authors synthesize these results to argue that different mechanisms operate for small shifts (planning, does not need auditory feedback) and large shifts (reinforcement learning, requires auditory feedback).

      We thank the reviewer for this concise summary of our paper. To clarify, we want to point out that we do not make any statement about the learning mechanism birds use to make large shifts to recover their target pitch, i.e. we do not say that large shifts are learned by reinforcement learning requiring auditory feedback. We only show that large shifts require auditory feedback.

      The authors also make a distinction between two kinds of planning: covert-not requiring any motor practice and overt-requiring motor practice but without access to auditory experience from which target mismatch could be computed. They argue that birds plan overtly, based on these deafening experiments as well as an analogous experiment involving temporary muting, which suggests that indeed motor practice is required for pitch shifts.

      Strengths:

      The primary finding (that partially restorative pitch shift occurs even after deafening) rests on strong behavioral evidence. It is less clear to what extent this shift requires practice, since their analysis of pitch after deafening takes the average over within the first two hours of singing. If this shift is already evident in the first few renditions then this would be evidence for covert planning. This analysis might not be feasible without a larger dataset. Similarly, the authors could test whether the first few renditions after recovery from muting already exhibit a shift back toward baseline.

      This work will be a valuable addition to others studying birdsong learning and its neural mechanisms. It documents features of birdsong plasticity that are unexpected in standard models of birdsong learning based on reinforcement and are consistent with an additional, perhaps more cognitive, mechanism involving planning. As the authors point out, perhaps this framework offers a reinterpretation of the neural mechanisms underlying a prior finding of covert pitch learning in songbirds (Charlesworth et al., 2012).

      A strength of this work is the variety and detail in its behavioral studies, combined with sensory and motor manipulations, which on their own form a rich set of observations that are useful behavioral constraints on future studies.

      Weaknesses:

      The argument that pitch modification in deafened birds requires some experience hearing their song in its shifted state prior to deafening (Fig. 4) is solid but has an important caveat. Their argument rests on comparing two experimental conditions: one with and one without auditory experience of shifted pitch. However, these conditions also differ in the pitch training paradigm: the "with experience" condition was performed using white noise training, while the "without experience" condition used "lights off" training (Fig. 4A). It is possible that the differences in the ability for these two groups to restore pitch to baseline reflect the training paradigm, not whether subjects had auditory experience of the pitch shift. Ideally, a control study would use one of the training paradigms for both conditions, which would be "lights off" or electrical stimulation (McGregor et al. 2022), since WN training cannot be performed in deafened birds. This is difficult, in part because the authors previously showed that "lights off" training has different valences for deafened vs. hearing birds (Zai et al. 2020). Realistically, this would be a point to add to in discussion rather than a new experiment.

      We added the following statement to our manuscript:

      It is unlikely that dLO birds’ inability to recover baseline pitch is somehow due to our use of a reinforcer of a non-auditory (visual) modality, since somatosensory stimuli do not prevent reliable target pitch recovery in hearing birds (McGregor et al 2022).

      A minor caveat, perhaps worth noting in the discussion, is that this partial pitch shift after deafening could potentially be attributed to the birds "gaining access to some pitch information via somatosensory stretch and vibration receptors and/or air pressure sensing", as the authors acknowledge earlier in the paper. This does not strongly detract from their findings as it does not explain why they found a difference between the "mismatch experience" and "no mismatch experience groups" (Fig. 4).

      We added the following statement: Our insights were gained in deaf birds and we cannot rule out that deaf birds could gain access to pitch information via somatosensoryproprioceptive sensory modalities. However, such information, even if available, cannot explain the difference between the "mismatch experience” (WNd) and the "no mismatch experience" (dLO) groups, which strengthens our claim that the pitch reversion we observe is a planned change and not merely a rigid motor response (as in simple usedependent forgetting).

      More broadly, it is not clear to me what kind of planning these birds are doing, or even whether the "overt planning" here is consistent with "planning" as usually implied in the literature, which in many cases really means covert planning. The idea of using internal models to compute motor output indeed is planning, but why would this not occur immediately (or in a few renditions), instead of taking tens to hundreds of renditions?

      Indeed, what we call ‘covert planning’ refers to what usually is called ‘planning’ in the literature. Also, there seems to be currently no evidence for spontaneous overt planning in songbirds (which we elicited with deafening). Replay of song-like syringeal muscle activity can be induced by auditory stimuli during sleep (Bush, A., Doppler, J. F., Goller, F., and Mindlin, G. B. (2018), but to our knowledge there are no reports of similar replay in awake, non-singing birds, which would constitute evidence for overt planning.

      We cannot ascertain how fast birds can plan their song changes, but our findings are not in disagreement with fast planning. The smallest time window of analysis we chose is 2h, which sets a lower bound of the time frame within which we can measure pitch changes. Our approach is probably not ideally suited for determining the minimal planning time, because the deafening and muting procedures cause an increase in song variability, which calls for larger pitch sample sizes for statistical testing, and the surgeries themselves cause a prolonged period without singing during which we have no access to the birds’ planned motor output. Note that fast planning is demonstrated by the recent finding of instant imitation in nightingales (Costalunga, Giacomo, et al. 2023) and is evidenced by fast re-pitching upon context changes in Bengalese finches (Veit, L., Tian, L. Y., Monroy Hernandez, C. J., & Brainard, M. S., 2021).

      To resolve confusion, it would be useful to discuss and add references relating "overt" planning to the broader literature on planning, including in the introduction when the concept is introduced.

      Overt and covert planning are terms used in the literature on child development and on adult learning, see (Zajic, Matthew Carl, et al., Overt planning behaviors during writing in school-age children with autism spectrum disorder and attention-deficit/hyperactivity disorder, 2020) and (Abbas zare-ee, Researching Aptitude in a Process-Based Approach to Foreign Language Writing Instruction. Advances in Language and Literary Studies, 2014), and references therein.

      Indeed, muddying the interpretation of this behavior as planning is that there are other explanations for the findings, such as use-dependent forgetting, which the authors acknowledge in the introduction, but don't clearly revisit as a possible explanation of their results. Perhaps this is because the authors equate use-dependent forgetting and overt planning, in which case this could be stated more clearly in the introduction or discussion.

      We do not mean to strictly equate use-dependent forgetting and overt planning, although they can be related, namely when ‘use’ refers to ‘altered use’ as is the case when something about the behavior is missing (e.g. auditory feedback in our study), and the dependence is not just on ‘use’ but also on ‘experience’.

      We added the following sentence to the discussion: We cannot distinguish the overt planning we find from more complex use-and-experience dependent forgetting, since we only probed for recovery of pitch and did not attempt to push birds into planning pitch shifts further away from baseline.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The single main issue with this paper is in the section related to Figure 4, and the Figure itself - this is the most important part of the paper essential to buttress the claim of covert learning. However, there are several sources of confusion in the text, analyses, and figures. The key result is in Figure 4B, C - and, in the context of Figs 1-3, the data are significant but subtle. That is, as the authors state, the birds are mostly dependent on slow sensory feedback-dependent (possibly RL) mechanisms but there is a small component of target matching that evidences an internal model. One wonders why this capacity is so small - if they had a good internal model they'd be much faster and better at recovering target pitches after distortion-driven deviations even without sensory feedback.

      (1a) The analysis of the WNd and DLO reversions of pitch (related to Fig. 4) uses a d' analysis which is a pivot from the NRP analysis used in the rest of the paper. It is not clear why different analyses are being used here to compute essentially the same measure, i.e. how much did the pitch revert. It's also odd that different results are now obtained - Fig. 4 has a small but significant reversion of pitch in WNd birds but Fig. 2 shows no significant return to baseline.

      We did not test for reversion towards baseline in Fig. 2 and made no statement about whether there is a significant reversion or not. But when we do such a test, we find a significant reversion for WNd birds in the ‘late’ window (NRP=0.5, p=0.02, N=10, tstat=-1.77, two-tailed t-test), which agrees with Figure 4. In the ‘early’ window in Fig. 2, we find only a trend but no reversion (NRP = 0.76, p=0.11, n=10, tstat=-1.76), which contrasts with our findings in Figure 4. However, the discrepancy can be simply explained by the difference in time alignment that we detail in the Materials and Methods. Namely, in Figure 2, we measure pitch relative to the pitch in the morning on the day before, which is not a good measure of ‘reversion’ (since pitch had been reinforced further away during the day), which is why we do not present this analysis in the paper and dedicate a separate analysis in Figure 4 to reversion.

      (1b) Also in Fig. 4 is it the case that, as in the schematic of 4a, ALL birds in these experiments had their pitch pushed up - so that the return to baseline was all down? If this is the case the analysis may be contaminated by a pitch-down bias in deafened birds. This would ideally be tested with a balance of pitch-up and pitch-down birds in the pre-deafening period, and/or analysis of non-targeted harmonic stacks to examine their pitch changes. If non-targeted stacks exhibit pitch-down changes after deafening, then the reversion that forms the key discovery of this paper will be undermined. Please address.

      Both groups in Figure 4 were balanced (same number of birds were shifted their pitch up and down), see response to public review and Methods.

      (1c) After multiple re-reads and consultations with the Methods section I still do not understand the motivation or result for Figure 4E. Please provide clarification of the hypothesis/control being assessed and the outcome.

      Figure 4E does not add an additional result but strengthens our previous findings because we obtain the same result with a different method. The pitch of deaf birds tends to drift after deafening. To discount for this drift and the effect of time elapsed since deafening, we bootstrapped the magnitude of the pitch change in WNd and dLO birds by comparing them to dC birds in matched time windows. We modified the sentence in the results section to clarify this point:

      To discount for the effect of time elapsed since deafening and quantify the change in pitch specifically due to reinforcement, we bootstrapped the difference in 𝒅′ between dLO/WNd birds and a new group of dC birds that were deafened but experienced no prior reinforcement (see methods).

      (1d) Line 215. It's not clear in the text here how the WNd birds experience a pitch mismatch. Please clarify the text that this mismatch was experienced before deafening. This is a critical paragraph to set up the main claims of the paper. Also, it's not clear what is meant by 'fuel their plan'? I can imagine this would simply be a DA-dependent plasticity process in Area X that does not fuel a plan but rather re-wires and HVC timestep to medium spiny neurons whose outputs drive pitch changes - i.e. not a fueled plan but simply an RL-dependent re-mapping in the motor system. Alternatively, a change could result in plasticity in pallial circuits (e.g. auditory to HVC mappings) that are RL independent and invoke an inverse model along the lines of the author's past work (e.g. Ganguli and Hahnlsoer). This issue is taken up in the discussion but the setup here in the results is very confusing about the possible outcomes. This paragraph is vague with respect to the key hypotheses. It's possible that the WNd and DLO groups enable dissection of the two hypotheses above - because the DLO groups would presumably have RL signals but without recovery - but there remains a real lack of clarity over exactly how the authors are interpreting Fig 4 at the mechanistic level.

      WNd birds experience a pitch mismatch because while singing they hear that their pitch differs from baseline pitch, but the same is not true for dLO birds. We simply tested whether this experience makes a difference for reversion and it does. We added ‘before deafening’ to the paragraph and changed the wording of our hypothesis to make it clearer (we reworded ‘fuel their plan’). Mechanistic interpretations we left in the discussion. Without going to details, all we are saying is that birds can only plan to revert motor changes they are aware of in the first place.

      Minor issues

      The songs of deafened birds degrade, at a rate that depends on the bird's age. Younger crystalized birds degrade much faster, presumably because of lower testosterone levels that are associated with increased plasticity and LMAN function. Some background is needed on deafened birds to set up the WNd experiments.

      Despite deafening leading to the degradation of song (Lombardino and Nottebohm, 2000), syllable detection and pitch calculation were still possible in all deaf birds (up to 13-50 days after deafening surgery, age range 90-300 dph, n=44 birds).

      Since pitch shifting was balanced in both deaf bird groups (the same number of birds were up- and down-shifted), systematic changes in pitch post deafening (Lombardino and Nottebohm, 2000) will average out and so would not affect our findings.

      Lines 97-103. The paragraph is unclear and perhaps a call to a SupFig to show the lack of recovery would help. If I understand correctly, the first two birds did not exhibit the normal recovery to baseline if they did not have an opportunity to hear themselves sing without the WN. I am failing to understand this.

      In the early window (first 2 hours after unmuting) birds have not changed their pitch compared to their pitch in the corresponding window at the end of reinforcement (with matching time-of-day). We added ‘immediately after unmuting (early)’ to clarify this statement.

      Lines 68-69. What is the difference between (2) and (3)? Both require sensory representation/target to be mapped to vocal motor output. Please clarify or fuse these concepts.

      We fused the concept and changed the figure and explanation accordingly.

      Line 100. Please name the figure to support the claim.

      We marked the two birds in the Fig. 1H and added a reference in the text.

      Line 109. Is there a way to confirm / test if muted birds attempted to sing?

      Unfortunately, we do not have video recordings to check if there are any signs of singing attempts in muted birds.

      Line 296: Why 'hierarchically 'lower'?

      Lower because without it there is nothing to consolidate, i.e. the higher process can only be effective after the lower but not before. We clarified this point in the text.

      Past work on temporal - CAF (tcaf) by the Olveczky group showed that syllable durations and gaps could be reinforced in a way that does not depend on Area X and, therefore, related to the authors' discussion on the possible mechanisms of sensory-feedback independent recovery, may rely on the same neural substrates that Fig. 4 WNd group uses to recover. Yet the authors find in this paper that tCAF birds did not recover. There seems to be an oddity here - if covert recovery relies on circuits outside the basal ganglia and RL mechanisms, wouldn't t-CAF birds be more likely to recover? This is not a major issue but is a source of confusion related to the authors' interpretations that could be fleshed out.

      This is a good point, we reinvestigated the tCAF birds in the context of Fig 4 where we looked for pitch reversions towards baseline. tCAF birds do also revert towards baseline. We added this information to the supplement. We cannot say anything about the mechanistic reasons for lack of recovery, especially given that we did not look at brain-level mechanisms.

      Reviewer #2 (Recommendations For The Authors):

      The data presentation could be improved. It is difficult to distinguish between the early and late symbols and to distinguish between the colors for the individual lines on the plots or to match them with the points on the group data plots. In addition, because presumably, the points in plots like 2D are for the same individuals, lines connecting those points would be useful rather than trying to figure out which points are the same color.

      We added lines in Fig. 2D connecting the birds in early and late.

      The model illustrations (Fig 1A, Fig 5) are not intuitive and do not help to clarify the different hypotheses or ideas. I think these need to be reworked.

      We revised the model illustrations and hope they improved to clarify the different hypothesis.

      Some of the phrasing is confusing. Especially lines 157-158 and 256-257.

      Lines 157-158: we removed an instance of ‘WNd’, which was out of place.

      Lines 256-257: we rephrased to ‘showing that prior experience of a target mismatch is necessary for pitch reversion independently of auditory feedback’

      Reviewer #3 (Recommendations For The Authors):

      For Fig. 1, the conclusion in the text "Overall, these findings suggest that either motor practice, sensory feedback, or both, are necessary for the recovery of baseline song" is not aligned with the figure header "Recovery of pitch target requires practice".

      We rephrased the conclusion to: Overall, these findings rule out covert planning in muted birds and suggest that motor practice is necessary for recovery of baseline song.

      The use of the term "song experience" can be confusing as to whether it means motor or auditory experience. Perhaps replace it with "singing experience" or "auditory experience" where appropriate.

      We did the requested changes.

      Fig. 1A, and related text, reads as three hypotheses that the authors will test in the paper, but I don't think this turns out to the be the main goal (and if it is, it is not clear their results differentiate between hypotheses 1, 2, and 3). Perhaps reframe as discussion points and have this panel not be so prominent at the start, just to avoid this confusion.

      We modified the illustration in Fig 1A and simplified it. We now only show the 2 hypotheses that we test in the paper.

      Line 275-276, "preceding few hours necessitates auditory feedback, which sets a limit to zebra finches' covert planning ability". Did the authors mean "overt", not covert? Since their study focuses on overt planning.

      Our study focuses on covert planning in figure 1 and overt planning in subsequent figures.

      The purpose of the paragraph starting on line 278 could be more clear. Is the goal to say that overt planning and what has previously been described as use-dependent forgetting are actually the same thing? If not, what is the relationship between overt planning and forgetting? In other words, why should I care about prior work on use-dependent forgetting?

      We moved the paragraph further down where it does not interrupt the narrative. See also our reply to reviewer 3 on use-dependent forgetting.

      Line 294, "...a dependent process enabled by experience of the former...", was not clear what "former" is referring to. In general, this paragraph was difficult to understand. Line 296: Which is the "lower" process?

      We added explanatory parentheses in the text to clarify. We rephrased the sentence to ‘the hierarchically lower process of acquisition or planning as we find is independent of immediate sensory experience.’

      Line 295, the reference to "acquisition" vs. "retention". It is not clear how these two concepts relate to the behavior in this study, and/or the hierarchical processes referenced in the previous sentence. Overall, it is not clear how consolidation is related to the paper's findings.

      We added explanatory parentheses in the text and changed figure 5 to better explain the links.

      Line 305, add a reference to Warren et al. 2011, which I believe was the first study (or one of them) that showed that AFP bias is required for restoring pitch to baseline.

      We are citing Warren et al. 2011 in the sentence:

      Such separation also applies to songbirds. Both reinforcement learning of pitch and recovery of the original pitch baseline depend on the anterior forebrain pathway and its output, the lateral magnocellular nucleus of the anterior nidopallium (LMAN)(1).

      Line 310, "Because LMAN seems capable of executing a motor plan without sensory feedback", is this inferred from this paper (in which case this is an overreach) or is this referencing prior work (if so, which one, and please cite)?

      We changed the wording to ‘It remains to be seen whether LMAN is capable of executing a motor plans without sensory feedback’.

      Line 326, "which makes them well suited for planning song in a manner congruent with experience." I don't fully understand the logic. Can this sentence be clarified?

      We rephrased the sentence and added an explanation as follows: …which makes them well suited for executing song plans within the range of recent experience (i.e., if the song is outside recent experience, it elicits no LMAN response and so does not gain access to planning circuits).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors conducted two tasks at 300 days of separation. First, a social perception task, where Ps responded whether a pictured person either deserved or needed help. Second, an altruism task, where Ps are offered monetary allocations for themselves and a partner. Ps decide whether to accept, or a default allocation of 20 dollars each. The partners differed in perceived merit, such that they were highly deserving, undeserving, or unknown. This categorisation was decided on the basis of a prisoner's dilemma game the partner played beforehand. "Need" was also manipulated, by altering the probability that the partner must have their hand in cold water at the end of the experiment and this partner can use the money to buy themselves out. These two tasks were conducted to assess the perception of need/merit in the first instance, and how this relates to social behaviour in the second. fMRI data were collected alongside behavioural.

      The authors present many analyses of behaviour (including DDM results) and fMRI. E.g., they demonstrate that they could decode across the mentalising network whether someone was making a need or deserving judgement vs control judgement but couldn't decode need vs deserving. And that brain responses during merit inferences (merit - control) systematically covaried with participants' merit sensitivity scores in the rTPJ. They also found relationships between behaviour and rTPJ in the altruism task. And that merit sensitivity in the perception task predicted the influence of merit on social behaviour in the altruism task.

      Strengths:

      This manuscript represents a sensible model to predict social perceptions and behaviours, and a tidy study design with interesting findings. The introduction introduced the field especially brilliantly for a general audience.

      Response: We are pleased that the reviewer found the model sensible and the findings interesting! Below, we respond to each of the reviewer’s comments/critiques.

      Weaknesses: (1) The authors do acknowledge right at the end that these are small samples. This is especially the case for the correlational questions. While the limitation is acknowledged at the end, it is not truly acknowledged in the way that the data are interpreted. I.e. much is concluded from absent relationships, where the likelihood of Type II error is high in this scenario. I suggest that throughout the manuscript, authors play down their conclusions about absence of effects.

      Response: We agree with the reviewer that the limitation of small samples should be adequately reflected in the interpretation of the data. We have therefore added cautionary language to the interpretation of the correlational effects in several places of the revised manuscript. For example, we now state: “However, this absence of effects for need ought to be interpreted with caution, given the comparatively small sample size.” (pg. 33) and “As mentioned above, we cannot rule out the possibility that null findings may be due to the comparatively small sample size and should be interpreted cautiously (also see discussion)” (pg. 34-35).

      (2) I found the results section quite a marathon, and due to its length I started to lose the thread concerning the overarching aims - which had been established so neatly in the introduction. I am unsure whether all of these analyses were necessary for addressing the key questions or whether some were more exploratory. E.g. it's unclear to me what one would have predicted upfront about the decoding analyses.

      Response: We acknowledge and share the reviewer’s concern about the length of the results section and potential loss of clarity. Regarding the decoding analyses, we want to clarify that they were conducted as a sanity check to compare against the results of the univariate analysis. We didn’t have apriori hypotheses regarding these supplemental decoding analysis. We have clarified this issue in the revised version of the manuscript and moved the decoding analyses fully to the supplemental material to streamline the main text. The remaining results reported in the manuscript are indeed all based on apriori, key questions (unless specified otherwise, for example, supplemental analyses for other regions of interest for the sake of completeness). The only exception is the final set of results (Neural markers of merit sensitivity predict merit-related behavioral changes during altruistic choice) which represent posthoc tests to clarify the role of activation in the right temporoparietal junction (rTPJ) in merit-related changes in other-regard in altruistic decisions. While we acknowledge that this is a complex paper, after careful consideration we couldn’t identify any other parts of the results section to remove or report in the supplemental material.

      (3) More specifically, the decoding analyses were intriguing to me. If I understand the authors, they are decoding need vs merit, and need+merit vs control, not the content of these inferences. Do they consider that there is a distributed representation of merit that does not relate to its content but is an abstracted version that applies to all merit judgements? I certainly would not have predicted this and think the analyses raise many questions.

      Response: We thank the reviewer for sharing their thoughts on the decoding analyses and agree that this set of analyses are intriguing, yet raise additional questions, such as the neural computations required to assess content. However, we wish to clarify that the way we view our current results is very much analogous to results obtained from studies of perception in other fields. For example, in the face perception literature, it is often observed that the fusiform face area is uniformly more active, not only when a face (as opposed to an object) is on the screen, but when a compound stimulus consistent of features of a face and other features (e.g. of objects) is on the screen, but participants are instructed to attend to and identify solely the face. Moreover, multivariate activity in the FFA (but not univariate activity) is sufficient to decode the identity of the face. We view the results we report in the manuscript as more akin to the former types of analyses, where any region that is involved in the computation is uniformly more active when attention is directed to judgment-specific features. Unfortunately, the present data are not sufficient to properly answer the latter questions, about which areas enable decoding of specific intensity or identity of merit-related content. Follow-up experiments with a more optimized design are needed. Although interesting, we thus refrain from further discussing the decoding analyses in the manuscript to avoid distracting from the main findings based on the univariate comparison of brain responses observed while participants make merit or need inferences in the social perception task.

      Reviewer #2 (Public Review):

      When people help others is an important psychological and neuroscientific question. It has received much attention from the psychological side, but comparatively less from neuroscience. The paper translates some ideas from a social Psychology domain to neuroscience using a neuroeconomically oriented computational approach. In particular, the paper is concerned with the idea that people help others based on perceptions of merit/deservingness, but also because they require/need help. To this end, the authors conduct two experiments with an overlapping participant pool:

      (1) A social perception task in which people see images of people that have previously been rated on merit and need scales by other participants. In a blockwise fashion, people decide whether the depicted person a) deserves help, b) needs help, and c) whether the person uses both hands (== control condition).

      (2) In an altruism task, people make costly helping decisions by deciding between giving a certain amount of money to themselves or another person. How much the other person needs and deserves the money is manipulated.

      The authors use a sound and robust computational modelling approach for both tasks using evidence accumulation models. They analyse behavioural data for both tasks, showing that the behaviour is indeed influenced, as expected, by the deservingness and the need of the shown people. Neurally, the authors use a block-wise analysis approach to find differences in activity levels across conditions of the social perception task (there is no fMRI data for the other task). The authors do find large activation clusters in areas related to the theory of mind. Interestingly, they also find that activity in TPJ that relates to the deservingness condition correlates with people's deservingness ratings while they do the task, but also with computational parameters related to helping others in the second task, the one that was conducted many months later. Also, some behavioural parameters correlate across the two tasks, suggesting that how deserving of help others are perceived reflects a relatively stable feature that translates into concrete helping decisions later-on.

      The conclusions of the paper are overall well supported by the data.

      Response: We thank the reviewer for the positive evaluation of our study and the comprehensive summary of our main findings. We would like to clarify, though, that we did originally collect fMRI data for the independent altruism task. Unfortunately, due to COVID-19-related interruptions, only 25 participants from the sample that performed the social perception task also completed the fMRI altruism task (see pg. 18). Given the limited sample size and noise level of fMRI data, we moved anything related to the neuroimaging data of the altruism task to the supplemental material (see Note S7) and decided to focus solely on the behavior of the altruism task to address our research objectives. We apologize for any confusion.

      (1) I found that the modelling was done very thoroughly for both tasks. Overall, I had the impression that the methods are very solid with many supplementary analyses. The computational modelling is done very well.

      Response: We are pleased that the reviewer found the computational model sensible.

      (2) A slight caveat, however, regarding this aspect, is that, in my view, the tasks are relatively simplistic, so even the complex computational models do not do as much as they can in the case of more complex paradigms. For example, the bias term in the model seems to correspond to the mean response rate in a very direct way (please correct me if I am wrong).

      Response. We agree that the Bias term relates to mean responding (although it is not the sole possibility: thresholds and starting default biases can also produce changes in mean levels of responding that, without the computational model, are not possible to dissociate). However, we think that the primary value of this parameter comes not from the analysis of the social judgment task (where the reviewer is correct that the bias relates in a quite straightforward way to the mean response rate), but in the relationship of this parameter to the un-contextual generosity response in the altruism task. Here, we find that this general bias term relates not to overall generosity, but rather to the overall weight given to others’ outcomes, a finding that makes sense if the tendency to perceive others as deserving overall yields an increase in overall attention/valuation of their outcomes. Thus, a simple finding in one task relates to a more nuanced finding in another. However, we agree it is important to acknowledge the point raised by the reviewer, and now do so on pg. 20: “It is worth noting that the Bias parameters are strongly associated with (though not the sole determinant of) the mean response rate.”

      (3) Related to the simple tasks: The fMRI data is analysed in a simple block-fashion. This is in my view not appropriate to discern the more subtle neural substrates of merit/need-based decision-making or person perception. Correspondingly, the neural activation patterns (merit > control, need > control) are relatively broad and unspecific. They do not seem to differ in the classic theory of mind regions, which are the focus of the analyses.

      Response: The social perception task is modified from a well-established social inference task (Spunt & Adolphs, 2014; 2015) designed to reliably localize the mentalizing network in the brain. As such, we acknowledge that it is not optimally designed to discern the intrinsic complexities of social perception, or the specific appraisals or computations that yield more or less perception (of need or merit) in a given context. Instead, it was designed to highlight regions that are more generally recruited for performing these social perceptions/inferences.

      We heartily agree with the reviewer that it would be interesting and informative to analyze this task in a trial-wise way, with parametric variation in evidence for each image predicting parametric variation in brain activity. Unfortunately, the timing of this task is not optimal for this kind of an analysis, since trials were presented in rapid and blocked fashion. We were also limited in the amount of time we could devote to this task, since it was collected in conjunction with a number of other tasks as part of a larger effort to detail the neural correlates of social inference (reported elsewhere). Thus, we were not able to introduce the kind of jittered spacing between trials that would have enabled such analysis, despite our own wish to do so. We hope that this work will thus be a motivator for future work designed more specifically to address this interesting question, and now include a statement to this effect on pgs. 2223: “Future research may reveal additional distinctions between merit and need appraisals in trial-wise (compared to our block-wise) fMRI designs.”

      References:

      Spunt, R. P. & Adolphs, R. Validating the Why/How contrast for functional MRI studies of Theory of Mind. Neuroimage 99, 301-311, doi:10.1016/j.neuroimage.2014.05.023 (2014).

      Spunt, R. P. & Adolphs, R. Folk explanations of behavior: a specialized use of a domain-general mechanism. Psychological Science 26, 724-736, doi:10.1177/0956797615569002 (2015).

      (4) However, the relationship between neural signal and behavioural merit sensitivity in TPJ is noteworthy.

      Response: We agree with this assessment and thank the reviewer for their positive assessment; we feel that linking individual differences in merit sensitivity with variance in TPJ activity during merit judgments is one of the key findings of the study.

      (5) The latter is even more the case, as the neural signal and aspects of the behaviour are correlated across subjects with the second task that is conducted much later. Such a correlation is very impressive and suggests that the tasks are sensitive for important individual differences in helping perception/behaviour.

      Response: Again, we share the reviewer’s impression that this finding is more noteworthy for appearing in tasks separated both by considerable conceptual/paradigmatic differences, and by such a long temporal distance. These findings make us particularly excited to follow up on these results in future research.

      (6) That being said, the number of participants in the latter analyses are at the lower end of the number of participants that are these days used for across-participant correlations.

      Response: We fully agree with this assessment. Unfortunately, COVID-related disruptions in data collection, as well as the expiration of grant funds due to the delay, severely limited our ability to complete assessments in a larger sample. Future research needs to replicate these results in a larger sample. We comment on this issue in the discussion on pg. 40. If the editor or reviewer has suggestions for other ways in which we could more fully acknowledge this, we would be happy to include them.

      Reviewer #3 (Public Review):

      Summary:

      The paper aims to provide a neurocomputational account of how social perception translates into prosocial behaviors. Participants first completed a novel social perception task during fMRI scanning, in which they were asked to judge the merit or need of people depicted in different situations. Secondly, a separate altruistic choice task was used to examine how the perception of merit and need influences the weights people place on themselves, others, and fairness when deciding to provide help. Finally, a link between perception and action was drawn in those participants who completed both tasks.

      Strengths:

      The paper is overall very well written and presented, leaving the reader at ease when describing complex methods and results. The approach used by the author is very compelling, as it combines computational modeling of behavior and neuroimaging data analyses. Despite not being able to comment on the computational model, I find the approach used (to disentangle sensitivity and biases, for merit and need) very well described and derived from previous theoretical work. Results are also clearly described and interpreted.

      Response: We thank the reviewer for their positive comments regarding presentation, approach, and content.

      Weaknesses:

      My main concern relates to the selection of the social perception task, which to me is the weakest point. Such weakness has been also addressed by the same authors in the limitation section, and related to the fact that merit and need are evaluated by means of very different cues that rely on different cognitive processes (more abstract thinking for merit than need). I wonder whether and how such difference can bias the overall computational model and interpretation of the results (e.g. ideal you vary merit and need to leave all other aspects invariant).

      Response: We agree with the reviewer on the importance of future research to more fully unpack the differences in this task, and develop better ways to manipulate need and merit in more comparable fashion. However, we point out that the issue of differences in abstractness of cues for need and merit does not actually seem to have a strong influence on the parameters retrieved by the computational model. Participants seem to be equally sensitive to BOTH merit and need information, despite that information deriving from different sources, as evidenced by the fact that the magnitude of the sensitivity parameters for need and merit in the social judgment task were nearly identical, and not statistically distinguishable. Nor were other parameters related to non-decision time or threshold statistically different (see Supplemental Table S2). If our results were driven purely by differences in the difficulty or abstractness of these judgments, we would have expected to see some evidence of this in the computational model, in the form of longer non-decision times, higher thresholds, or both. We do not. Likewise, the neural underpinnings evoked by both need and merit perceptions in this task (in the mentalizing brain network) were comparable. This is not to say that there aren’t real differences in the cues that might signal these quantities in our social perception task - just that there is little direct evidence for this difference in computational parameters or evoked brain responses, and thus it is unlikely that our results (which rely on an analysis of computational parameters) are driven solely by computational model biases, or the inability of the model to adequately assess participant sensitivity to need as opposed to merit.

      A second weakness is related to the sample size which is quite small for study 2. I wonder, given that study 2 fRMI data are not analyzed, whether is possible to recover some of the participants' behavioral results, at least the ones excluded because of bad MR image quality.

      Response: We fully agree with the reviewer that increasing the sample size for the cross-task correlations would be desirable. Unfortunately, the current sample size already presents the maximum of ‘usable’ data; the approach suggested by the reviewer won’t affect the sample size. We used all participants whose behavioral data in the altruism task suggested they were performing the task in good faith and conscientiously.

      Finally, on a theoretical note, I would elaborate more on the distinction of merit and need. These concepts tap into very specific aspects of morality, which I suspect have been widely explored. At the moment I am missing a more elaborate account of this.

      Response: Need and merit are predominantly studied in separate lines of research (Molouki & Bartels, 2020) so there is relatively little theoretical research on the distinction between the two. Consequently, Siemoneit (2023) states that the relation between the concepts of need and merit in allocative distributions remains diffuse. To emphasize the distinct concepts of morality in the introduction we have now added to pg. 3: “Need and deservingness (merit) are two distinct principles of morality. The need principle involves distributing resources to those who require them, irrespective of whether they have earned them, while the "merit principle" focuses on allocating resources based on individuals' deservingness, regardless of their actual need (Wilson, 2003).”

      One of the added values of our paper to the research literature is in adding to the clarification of computational and neural underpinnings of broad concepts like merit and need. To highlight the latter point, we have added the following statement on pg. 5 to the manuscript: “Examining need and merit concurrently in this task will also help clarify the computational and neural underpinnings of related, but distinct concepts, distinguishing between them more effectively.”

      References:

      Molouki, S., & Bartels, D. M. (2020). Are future selves treated like others? Comparing determinants and levels of intrapersonal and interpersonal allocations. Cognition, 196, 104150.

      Siemoneit, A. (2023). Merit first, need and equality second: hierarchies of justice. International Review of Economics, 70(4), 537-567.

      Wilson, C. (2003). The role of a merit principle in distributive justice. The Journal of ethics, 7, 277-314.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I acknowledge the difficulty with respect to recruitment, especially in the age of covid, but is it possible for the authors to collect larger samples for their behavioural questions via online testing? Admittedly, I'm sure they don't want to wait 300 days to have the complete dataset, but I would be in favour of collecting a sample in the hundreds on these behavioural tasks, completed at a much shorter separation (if any). I believe this would strengthen the authors' conclusions considerably if they could both replicate the effects they have and check these null effects in a sample where they could draw conclusions from them. Indeed, Bayesian stats to provide evidence for the null would also help here.

      Response: We share the reviewer’s desire to see these results replicated (ideally in a sample of hundreds of participants). We have seriously considered the possibility of trying to replicate our results online, even before submitting the first version of the paper. However, it is difficult to fully replicate this paradigm online, given the elaborate story and context we engaged in to convince participants that they were playing with real others, as well as the usage of physical pain (Cold Pressor Task) for the need manipulation in the altruism task. Moreover, given comments by this reviewer that the results are already a little long, adding a new, behavioral replication would likely only add to the memory burden for the reader. We have thus opted not to include a replication study in the current work. However, we are actively working on a replication that can be completed online, using a modified experimental paradigm and different ways to manipulate need and merit. Because of the differences between that paradigm and the one described here, which would require considerable additional exposition, we have opted not to include the results of this work in the current paper. We hope to be able to publish this work as a separate, replication attempt in the future.

      Given the difficulty of wading through the results section while keeping track of the key question being answered, I would suggest moving any analyses that are less central to the supplementary. And perhaps adding some more guiding sentences at the start and end of each section to remind the reader how each informs the core question.

      Response: We deliberated for quite some time about what results could be removed, but in the end, felt that nearly all results that we already described need to be included in the paper, since each piece of the puzzle contributes to the central finding (relating parameters and behavior to neural and choice data across two separate tasks). However, we did move the decoding analysis results to the supplemental (see point below). We also take the reviewers point that the results can be made clearer. We thus have worked to include some guiding sentences at the start and end of sections to remind the readers how each analysis informs the core questions.

      I think it needs unpacking more for the reader what they should conclude from the significant need+merit vs control decoding analyses, and what they would have expected in terms of cortical representation from the decoding analyses in general.

      Response: We agree with the reviewer that given the decoding results position in the main manuscript it would need unpacking. After considering the reviewer's prior suggestion, we have reevaluated the placement of these supplemental results. Consequently, we have relocated it to the supplemental materials, as it was deemed less relevant to directly addressing the core research questions in the main manuscript. On pg. 23, the main manuscript now only states “We also employed supplemental multivariate decoding analyses (searchlight analysis 85-87), as commonly used in social perception and neuroscience research 7,58,82,88,89, corroborating our univariate findings (see Supplemental Note S6, Supplemental Table S10).”

      Reviewer #2 (Recommendations For The Authors):

      (1) I would suggest moving information on how the computational models were fitted to the main text.

      Response: The computational models are a key element of the paper and we deliberated about the more central exposure of the description of how the models were fitted in the main manuscript. However, we are concerned about the complexity and length of the article, which requires quite a lot from readers to keep in mind (as also commented on by reviewer 1). Those readers who are particularly interested in details of model fitting can still find an extensive discussion of the procedures we followed in the supplements. We thus have opted to retain the streamlined presentation in the main manuscript. However, if the editor feels that including the full and extensive description of model fitting in the main paper would significantly improve the flow and exposition of ideas, we are happy to do so.

      (2) For the fMRI analyses: Could it be worth analysing the choices in the different conditions? They could be modelled as a binary regressor (yes/no) and this one might be different across conditions (merit/need/hands). Maybe this won't work because of the tight trial timeline, but it could be another avenue to discern differences across fMRI conditions.

      Response: We thank the reviewer for this interesting suggestion! Unfortunately, the block design and rapid presentation of stimuli within each condition make it challenging to distinguish the different choices (within or across conditions). While we see the merit in the suggested analytical approach (in fact, we discussed it before the initial submission of the article), it would require some modifications of the task structure (e.g., longer inter-trial-intervals between individual stimuli) and an independent replication fMRI study. We were not able to have such a long inter-trial interval in the original design due to practical constraints on the inclusion of this paradigm in a larger effort to examine a wide variety of social judgment and inference tasks. We hope to investigate this kind of question in greater detail in future fMRI work.

      (3) The merit effects seem to be more stable across time than the need conditions. Would it be worthwhile to test if the tasks entailed a similar amount of merit and need variation? Maybe one variable varied more than the other in the task design, and that is why one type of effect might be stronger than the other?

      Response: We thank the reviewer for drawing attention to this important point. We used extensive pilot testing to select the stimuli for the social perception task, ensuring an overall similar amount of need and merit variation. For example, the social perception ratings of the independent, normative sample suggest that the social perception task entails a similar amount of need and merit variation (normative participant-specific percentage of yes responses for merit (mean ± standard deviation: 53.95 ± 13.87) and need (45.65 ± 11.07)). The results of a supplemental paired t-test (p = 0.122) indicate comparable SD for need and merit judgments. Moreover, regarding the actual fMRI participant sample, Figure S3 illustrates comparable levels of variations in need and merit perceptions (participant-specific percentage of yes responses for merit (56.70 ± 11.91) and need (48.69 ± 10.81) in the social perception task). Matching the results for the normative sample, the results of a paired t-test (p = 0.705) suggest no significant difference in variation between need and merit judgments. With respect to the altruism task, we manipulated the levels of merit and need externally (high vs. low).

      Reviewer #3 (Recommendations For The Authors):

      (1) It would be good to provide the demographics of each remaining sample.

      Response: We appreciate the attention to detail and agree with the reviewer’s suggestion. We have now added the demographics for each remaining sample to the revised manuscript.

      (2) The time range from study 1 to study 2, is quite diverse. Did you use it as a regressor of no interest?

      Response: We thank the reviewer for this interesting suggestion. We have examined this in detail in the context of our cross-task analyses (i.e., via regressions and partial correlations). Interestingly, variance in the temporal delay between both tasks does not account for any meaningful variation, and results don’t qualitatively change controlling for this factor.

      For example, when we controlled for the delay between both separate tasks (partial correlation analysis), we confirmed that variance in merit sensitivity (social perception task) still reflected meritinduced changes in overall generosity (altruism task; p = 0.020). Moreover, we confirmed that variance in merit sensitivity reflected individuals’ other-regard (p = 0.035) and self-regard (p = 0.040), but not fairness considerations (p = 0.764) guiding altruistic choices. Regarding people’s general tendency to perceive others as deserving, we found that the link between merit bias (social perception task) and overall other-regard (p = 0.008) and fairness consideration (p = 0.014) (altruism task) holds when controlling for the time range (no significant relationship between merit bias and self-regard, p = 0.191, matching results of the main paper).

      We refer to these supplemental analyses in the revised manuscript on ps. 33 and 35: “Results were qualitatively similar when statistically controlling for the delay between both tasks (partial correlations).”

      (3) Why in study 1 a dichotomous answer has been used? Would not have been better (also for modeling) a continuous variable (VAS)?

      Response: We appreciate the reviewer's thoughtful feedback. In Study 1, opting for a dichotomous response format in the social perception task (Figure 1a) was a deliberate methodological choice. This decision, driven by the study's model requirements, aligns with the common use of a computational model employing two-alternative forced choices ("yes" and "no") as decision boundaries. While drift– diffusion models for multiple-alternative forced-choice designs exist, our study's novel research questions were effectively addressed without their complexity. Finally, our model cannot accept continuous response variables as input unless they are transformed into categorical variables.

      (4) In the fMRI analyses, when you assess changes in brain activity as a function of merit, I would control for need (and the other way round), to see whether such association is specific.

      Response: Regarding the reviewer’s suggestion on controlling for need when assessing changes in brain activity as a function of merit, and vice versa, we would like to clarify the nature of our fMRI analyses in the social perception task. Our focus is on block-wise assessments (need vs. control, merit vs. control, need vs. merit blocks, following the fMRI task design from which our social perception task was modified from). We don’t assess changes in brain activity as a function of the level of perceived merit or need (i.e., “yes” vs. “no” trials within or across task blocks). Blocks are clearly defined by the task instruction given to participants prior to each block (i.e., need, merit, or control judgments). Thus, unfortunately, given the short inter-stimulus-intervals of each block, the task design is not optimal to implement the suggested approach.

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity): Summary:

      This research article describes genetic identification and expression analyses of six Ephrin type-B receptor 4 (EPHB4) variants identified in patients with dilated cardiomyopathy (DCM). Variants were identified Variants were identified in a cohort of 573 patients enrolled through the multicenter DZHK-TORCH (TranslatiOnal Registry for CardiomyopatHies) study and the Institute for Cardiomyopathies Heidelberg registry. Expression of downstream molecules, CAV1 and CD36, was assessed in human cardiac tissues by immunohistochemistry. EPHB4 cardiac expression was assessed using recently published single-cell/nucleus RNA sequencing data (Nicin et al 2022) incorporating siRNA-seq data from two other studies (healthy cardiac tissue, Litvinukova et al 2020) and (hypertrophic/aortic stenosis, Nicin et al. 2020).

      We thank the reviewer for the recommendations that have improved our manuscript.

      Major Comments:

      1. Details of identified truncating RBM20 and TTN variants must be provided. These should be integrated into Table 1 alongside each co-occurring EPHB4 variant. List whether the TTN truncating variant is located in the A-band and whether these variants would be adjudicated as pathogenic/likely pathogenic, variant of uncertain significance by ACMG and/or similarly refined DCM criteria (Morales et al. 2020, Circ-Genom Precis Med).

      Details of the truncating RNM20 and TTN have been provided in the new supplementary table 1. As indicated in the table both mutations are pathogenic, and thus, most probable the cause for the disease in these patients. In case of TTN this is a truncating variant and is located in the M-band in exon 358, which is annotated with a PSI in DCM of 100% in cardiodb.org. The fact that these mutations are most probably the cause for DCM in these patients has been included in the discussion section and reads as follows:

      Although it is most probable that in the case of the patients carrying TNN and RNM20 variants this would be the cause of the disease, this study further supports, the importance of EPHB4 regulating CD36 caveolar trafficking to the membrane, whether this happens in endothelial cells or cardiomyocytes, maintaining cardiac homeostasis in humans and its implication on DCM

      1. Discuss co-occurrence of multiple EPHB4 variants in two patients (DCM1, DCM3) and identification of 2 EPHB4 variants in more than one proband.

      As shown in Figure 1A, the detected variants are found in multiple domains of the protein, hence no clear hotspot is detected. We did not yet investigate on the exact mechanisms of action, however, when we compare the two patients with multiple EPHB4 variants, the average LVEF (echo) is 17.5 compared to 38,67 for the remaining 4 patients with only one EPHB4 variant and 35,17 for the six non-EPHB4 variant-carriers. Although the sample number only allows for a semi-quantitively analysis, it still hints at a possible EPHB4-variant effect, which certainly needs verification in a larger cohort.

      Since we do not postulate the detected variants as independently disease-causing, and we also did not explicitly filter for very rare variants, it is not surprising that we find two variants in multiple patients. As stated above, we did not investigate this further, but evidence is growing that compound heterozygosity is playing a role in heritable diseases. It will be interesting to analyze e.g. phasing (Hofmeister et al., Nature Genetics, 2023) or additive (biallelic) effects, which have come to attention also in cardiomyopathies recently (Lipov et al., Nature Cardiovascular Research, 2023).

      This fact has now been included in the manuscript, both in the results and in the discussion. It reads as follows:

      Interestingly, two of the analysed patients present more than one variant of EPHB4 and we could identify the same variant in more than one patient (Table 1)

      (…)

      Nevertheless, two of the patients carrying one benign or likely benign also carry another variant classified as likely pathogenic or of uncertain significance (Table 1) and interestingly, the average LVEF of the two patients with multiple EPHB4 variants is 17.5 compared to 38,67 for the remaining 4 patients with only one EPHB4 variant and 35,17 for the six non-EPHB4 variant-carriers. Although the sample number only allows for a semi-quantitively analysis, it still hints at a possible EPHB4-variant effect, which certainly needs verification in a larger cohort.

      1. Three of the six variants (p.Lys635Asn, Val113Ile, Glu890Asp) are classified as Clinvar Benign/Likely Benign. Additionally, p.Glu890Asp has been identified in 50 homozygotes in gnomAD non-Finnish European population. These data cast doubt on the pathogenicity of these variants. These classifications, as well as VUS classification of p.Pro79Leu, should be listed in Table 1. The authors should reconcile the benign/likely benign Clinvar classifications with their presented evidence for pathogenicity in the discussion.

      We have now included the ACMG classification in Table 1. Similar to the Clinvar classification, some of the variants are classified benign or likely benign. Still, the fact that the patients that carry them also carry another variant and that the histological findings are similar among the patients carrying an EPHB4 variant and different to those that don’t and the enriched presence of EPHB4 variants in the DCM population support our hypothesis that the Eph-ephrin signalling pathway plays a role in the development of DCM.

      Nevertheless, we agree with the reviewer that the fact that some of these variants have been classified as benign, and the presence of mutations in other genes already related to DCM like TNN or RMM20 may suggest that the EPHB4 mutations may not be the only cause for the disease but rather have an additive effect. As a consequence, we have toned down our conclusions and the discussion reads now as follows:

      Finally, although not as the main disease cause, this study not only supports the role of EPHB4 in the heart, but it also corroborates the importance of CD36 and CAV1 for the cardiac health, and has the potential to improve diagnosis and risk stratification tools for DCM. In addition, as other genes crucial for fatty acid transport may be involved in cardiac disease, this study may help identify new diagnostic or therapeutic targets.

      1. CD36 and CAV1 expression are not quantified. Qualitatively, it is difficult to confirm CD36 reduction in DCM and disruption in EPHB4 variant samples as imaging parameters are not specified and do not appear to be standardized across treatments. Clearly state (either in the figure legend or in the methods) whether identical imaging parameters were used across panels 1C-1E. Note any differences in these parameters.

      We have now quantified the two IHC. It is very clear that the total CD36 is significantly reduced in both groups when compared to the healthy donor (Figure for the reviewer 1A). In case of CAV1 this is not so evident, although the signal seems reduced this is not significant (Figure for the reviewer 1B). These new data have been included in the figure of the manuscript.

      Figure for the reviewer 1. Quantification of (A) CD36 and (B) CAV1 in the immunohistochemistry analysis of patients biopsies. Data shown as mean ± SEM. (A) P value was calculated using one sample one sample Wilcoxon test for DCM and one sample t test for DCM EPHB4. Both cohorts where compared to the mean of HD. P value < 0.05 was considered significant. (B) P value was calculated with one sample t test for both cohorts. P value < 0.05 was considered significant.

      All the images have been taken in the same conditions. The observed difference in the background is due to the disease conditions of the DCM samples. Furthermore, the apparent reduced number of capillaries observed in the DCM patients are caused by the hypertrophic state of the cardiomyocytes in the diseased state. These are bigger and thus, less cells and capillaries appear per picture. The parameters have been included in the methods and read as follow:

      Immunohistochemistry was imaged in a Leica Stellaris confocal microscope. All images were obtained with 63x magnification and the same laser and gain intensities. Images were acquired using the software LAS X (Leica, version 4.4) and quantified using the Volocity Software (Quorum Technologies, version 6.5.1)

      1. Why was EPHB4 membrane localization not assessed or reported?

      We agree with the reviewer that this would be a very interesting point. Unfortunately, we had very limited amount of material and we did not have a proper working antibody.

      1. A key finding of the manuscript is that all six variants produce similar histological impacts on CAV1 and CD36 expression, denoting downstream impacts of EPHB4 genetic disruption. There is no granular data presented to support this claim. Additional discussion is also required to address how the authors anticipate variants in functionally distinct domains on either side of the plasma membrane to similarly impact downstream expression of CAV1/CD36. Mapping to available crystal structures in the Protein Data Bank (PDB) may be insightful to determine which variants may be most likely to have an impact on heterotetramer formation or to exert dominant negative effects on receptor function.

      As an appendix to this revision we have included a figure with representation images of all biopsies analysed to support our claim.

      The whole protein structure is not solved and only some individual domains are present in the Protein Data Bank making difficult to analyse the effect on the tetramer without crystallising the whole protein.

      1. Study limitations are not discussed and are significant. 5 of the 6 samples were from male patients, there are limitations to analyses of non-diverse patient ancestry, there is uncertainty regarding pathogenic contributions of variants in established DCM genes in 2/6 patients, data is limited to expression-only analyses highlighting need for additional functional modeling in cell or animal based systems.

      We have now included a limitation sections that includes all the points raised by the reviewer. It reads as follows:

      Although this study offers valuable insights to the potential implication of the Eph-ephrin signalling pathway in the development of DCM it has some limitations that need to be discussed. Despite finding increased presence of EPHB4 variants in the DCM population when compared to the healthy population, analysis of the identified variants in using different classifications (CADD and ACMG) not always predicted pathogenicity for these variants. For this reason, further experiments should be performed to determine the effect of every variant.

      It is also important to note that given the lower number of patients analysed these are not age and gender matched. The EPHB4 carrying DCM patients were younger than the DCM patients carrying a wild type EPHB4 sequence and mainly male. Finally, no biomaterial nor genetic testing from family related patients is available.

      1. Language used in conclusions overstates study findings ["our results confirm a crucial role of the Eph-ephrin signaling pathway in DCM" (page 3), "this study not only confirms the crucial role of EPHB4 in the heart..." (page 8)]. Change to "suggest" or "support".

      We have revised our discussion according to the limitations discussed in the previous remark and these words have been corrected.

      Major Methods Comments:

      1. DCM diagnostic criteria (clinical and imaging) for inclusion in the DZHK-TORCH study and the Institute for Cardiomyopathies Heidelberg registry should be stated or referenced. Likewise, describe and/or reference DCM exclusion criteria. State any relevant differences in DCM enrolment criteria for the two registries.

      We have now included our inclusion criteria in the methods and include two references to support this. The paragraph reads as follows:

      The criteria to be included in the study was reduced left ventricular ejection fraction (LVEF) <50% validated either with two independent image techniques or at two different time points with the same imaging technique. Furthermore, patients should include left ventricular dilation (LVEDD) >117% corrected with age and body surface according to the Henry-Formel formula (LVEDD= 45,3 * BSA1/3 – 0,03*Age –7,2). In both cases the heart were analysed either by echocardiography or magnetic resonance tomography (MRT)

      1. Describe how the final cohort of 573 DCM patients was reached. (All patients with DCM in the DZHK-TORCH study/Heidelberg registry? All patients with available exome data meeting QC standards and having available cardiac tissue?).

      From the 573 DCM patients, 100 have been recruited as part of the DZHK-TORCH registry and have been genome sequenced. Further 62 genomes and 411 exomes have been sequenced from patients of the cohort from the Institute for Cardiomyopathies (ICH) at the Heidelberg University Hospital.

      From this cohort, we selected 6 patients with and 6 without an EPHB4 variant and received heart tissue slides from the pathology department.

      1. State whether any family/segregation data is available for these patients.

      DCM4 has a mother and aunt (mother’s sister) who are also affected by CMP. In case of DCM6, the mother was also diagnosed with CMP. Unfortunately, no further biomaterial nor genetic testing of those individuals is available. This has been included in the new limitation sections as described above.

      1. Description of genetic testing methods are inadequate. Describe how genetic analyses were completed for each study/registry and how results were filtered/quality controlled. If sequencing methods were different across registries, state which patients were tested by which methods. If any testing was gene-targeted rather than whole exome/genome, list the specific DCM genes tested.

      All data has been sequenced using Illumina paired-end technology with either 2x100bp or 2x150bp. Exome enrichment was achieved using SureSelect Human All Exon V6 Target Enrichment (Agilent Genomics) was used. Bioinformatics analysis pipeline was based on “Best Practices Guideline” from the Genome Analysis Toolkit (GATK) (https://gatk.broadinstitute.org/hc/en-us). Besides the analysis for EPHB4, we assessed further genes associated with cardiomyopathies (ACTC1, ACTN2, ALPK3, BAG3, CRYAB, CSRP3, DES, DMD, DSC2, DSG2, DSP, FLNC, GLA, HCN4, HRAS, JPH2, JUP, KRAS, LAMP2, LDB3, LMNA, MIB1, MYBPC3, MYH7, MYL2, MYL3, MYPN, NEXN, PKP2, PLN, PRDM16, PRKAG2, PTPN11, RAF1, RBM20, RYR2, SCN5A, SHOC2, TAZ, TMEM43, TNNC1, TNNI3, TNNT2, TPM1,TTN, TTR, VCL).

      This is information has been included in the methods section.

      1. Provide additional detail for human cardiac biopsies. Was the same chamber/tissue biopsied in all samples? Is an endomyocardial biopsy available for all 573 patients included in this study? If not, were additional EPHB4 variants identified in patients without biopsy samples?

      All biopsies investigated are from left-ventricular tissue, accessed during cardiac catheterization.

      We did find additional, mainly non-coding variants in the cohort. However, as the focus on the study was on the histological analysis of the CD36 and CAV1 expression, we did restrict our analysis to our selected samples as described in the response to comment 2.

      1. Describe the source of the healthy control biopsy, alongside brief clinical detail establishing suitability as a control. Did DCM controls carry variants in known DCM genes (including truncating variants in RBM20 or TTN)? How were DCM controls selected?

      The healthy control biopsy was kindly donated by Prof. Dettmeyer from the University Gießen. This is a postmortem sample with unrelated cause of death. Cardiac biopsy was examined to discard any pathological alterations. This sample originates from a 27 years old female, and thus ideal as a healthy sample. This information has been included in the methods.

      1. List statistical analyses and associated experiments. (Page 5).

      Statistical tests have been included in the figure legend of each experiment. This reads as follows:

      (B) EPHB4 variant allelle frequency analysis. Each variant is compared in a paired wise manner between the two population. P value was calculated with a paired one-tailed Student’s t test comparing the frequencies of the different variants in the two populations.

      And

      (F) Quantification of CD36 and CAV1 expression in the immunohistochemistry analysis of patients biopsies. Data shown as mean ± SEM. In the case of CD36, P value was calculated using one sample one sample Wilcoxon test for DCM and one sample t test for DCM EPHB4. P value < 0.05 was considered significant. In the case of CAV1, P value was calculated with one sample t test for both cohorts. P value < 0.05 was considered significant. In both cases, the cohorts where compared to the mean of HD.

      1. List microscopes/equipment and software used to complete immunohistochemistry experiments. Describe imaging parameters to facilitate comparisons between treatments in Figure 1C-E.

      Immunohistochemistry was imaged in a Leica Stellaris confocal microscope. All images were obtained with 63x magnification and the same laser and gain intensities. Images were acquired using the software LAS X (Leica, version 4.4) and quantified using the Volocity Software (Quorum Technologies, version 6.5.1)

      This paragraph has now been included in the methods section.

      1. Please reword the following passage, which is almost verbatim to the same passage in Nicin et al. 2022.

      Page 4

      **"In brief, a combination of two human snRNA-seq datasets was used. Data from healthy cardiac tissue from the septum of 14 individuals in the Litvinukova et al. study and data from location-matched hypertrophic cardiac tissues from five patients with aortic stenosis."

      Nicin et al. 2022 (https://doi.org/10.1038/s44161-022-00019-7)**

      "Two human snRNA-seq datasets were used: data from healthy cardiac tissue from the septum of 14 individuals in the Litvinukova et al. study and data from location-matched hypertrophic cardiac tissues from five patients with aortic stenosis."

      We have reworded the paragraph in the methods sections. Now it reads as follows:

      Healthy cardiac tissue data was derived from the cardiac septum of 14 individuals 15. Subsequently, it was integrated with data from the septum of hypertrophc cardiac tissue from 5 patients with aortic stenosis 16.

      Minor Comments:

      1. Results: List source for Non-Finnish European Control cohort (gnomAD) (Page 5).

      The Non-Finnish European Control cohort (gnomAD) was obtained from https://gnomad.broadinstitute.org/. This information has been included in the methods section.

      1. Discussion: "all DCM patients" (page 6) requires clarification.

      We have made clear that this refers to the patients analysed in this study. The new sentence reads as follows:

      Furthermore, our results stress the importance of the endothelial CD36 in the onset of cardiac disease as all DCM patients analysed by immunohistochemistry show a downregulation of CD36 in the endothelium and warrant a more detailed assessment of genes involved in vascular function20

      1. Discussion: Define acronyms. CSF, IL4, LPS (Page 7)

      We have defined the acronyms in the discussion. The new sentence reads as follows:

      CD36 expression is upregulated by the nuclear hormone transcription factor Peroxisome Proliferator-Activated Receptor-Gamma (PPAR-ɣ), cerebrospinal fluid (CSF) cytokines and Interleukin-4 (IL4). In the other hand, lipopolysaccharides (LPS) and dexamethasone downregulate its expression In microvascular endothelial cells, CD36 is downregulated by lysophosphatidic acid.

      1. Table 1. Table is confusingly arranged. It would make more sense to organize the table by cDNA/AAchange to better correspond to Figure 1A. List the impacted protein domain for each variant in a separate column. It is also unclear how DCM allele frequencies were calculated as the reported number of patients (DCM1-6) carrying each variant do not universally correspond to the listed allele frequencies (see AFs of 0.0052 and 0.0208). Clarification should be added to the legend so it is clear to the reader how these frequencies were determined

      In case of the EPHB4 variants table, we agree with the reviewer and to make the table more understandable we have removed the first three columns, which are the same for all variants. This information has been included in the table legend. Nevertheless, this information has been kept in the new Supplementary table 1 that contains the variants on the other DCM causing genes.

      Regarding the calculation of the allele frequency we made by dividing the number of alleles found in the population by the total number of alleles in the population. This information has been included in the methods.

      We want to note that we performed a mistake in the original table. We had calculated the frequencies by dividing the number of alleles by the number of individuals in the population. We have now corrected both Table 1 and Figure 1B.

      1. Figure 1B. Add variant labels. Indicate relevant p-values for each variant. It is unclear to which comparison the p = 0.024 belongs. State in legend that 2 variants were omitted (presumably due to absence from gnomAD)

      No variants were omitted in the representation of Figure 1B. Some of them have the same allele frequency in the DCM population and thus, the individual data points appear overlapping. The variants that were not detected in the genomAD population were considered as 0 for the representation and for the analysis.

      For the comparison with P=0.024 (now corrected to 0.0011) between the two groups we have performed a one tail paired t test comparing the frequencies in both populations. The information regarding the test has been included in the figure legend and included in the methods section as indicated above.

      1. Figure 1E. Add label to indicate which EPHB4 variant is depicted.

      The DCM sample from which the images originates is now indicated in Figure 1E.

      <br /> Referees cross-commenting****

      As is, this manuscript is not ready for publication. Our comments are in complete alignment. Like the other reviewer, I also emphasize the need for other DCM genes tested to be listed. I also reiterate that any similarly worded passages to other published material must be corrected

      Reviewer #1 (Significance): This study presents genetic and expression data on a novel DCM gene candidate (EPHB4) from a European cohort of 573 DCM patients. This work is of interest as much of genetic DCM remains unexplained and identification of novel genes and pathways will be critical to advance understanding of the disease and to develop novel treatments. Reported data will be of greatest interest to cardiovascular practitioners and translational/basic researchers working with genetic heart disease/DCM. The fact that cardiac tissue was available for histological analyses for all six patients is an asset. There are considerable weaknesses to the paper, as written. There is a lack of detail in the included genetic methods and results. While the premise of the study is intriguing, additional detail is required for identified TTN and RBM20 truncating variants and additional discussion is needed to resolve confusion regarding reported allele frequencies and benign/likely benign Clinvar classifications. Because study design is restricted to genetic and expression analyses, reported data do not address possible pathogenic mechanisms. Overall, there is insufficient data presented to confirm a role for EPHB4 in causing DCM. Manuscript-specific (as-opposed to study specific) weaknesses include insufficient methods detail, a lack of clarity in the presented genetic and expression data (particularly Figure 1), insufficiently described study limitations, and overstated study conclusions. These scientific and manuscript issues will need to be addressed for the manuscript to be suitable for publication.

      Reviewer fields of expertise: cardiovascular genetics, DCM.

      Insufficient expertise to evaluate statistical methods.

      Reviewer #2 (Evidence, reproducibility and clarity):<br /> I reviewed a paper by Luxan et al. describing EPHB4 variants as a novel disease gene for dilated cardiomyopathy (DCM).

      The short report is interesting, however, not enough evidence is given to convince me EPHB4 is indeed a novel disease gene for DCM. More work is needed before this can be published.

      Major points:

      1. Genetics: two individuals have EPHB4 variants together with DCM causing TTN tv or RBM20 variants. Which other DCM genes were excluded for the remaining four cases? GnomAD MAF of 0.008748404 suspiciously high.

      So overall the small case number makes it hard to judge whether these are truly pathogenic variants.<br /> Could the authors attempt co-segregation of DCM with EPHB4 variant in families?

      Unfortunately we do not have family information from these patients. We have included this in the new limitation sections in the discussion that reads as follows:

      Although this study offers valuable insights to the potential implication of the Eph-ephrin signalling pathway in the development of DCM it has some limitations that need to be discussed. Despite finding increased presence of EPHB4 variants in the DCM population when compared to the healthy population, analysis of the identified variants in using different classifications (CADD and ACMG) not always predicted pathogenicity for these variants. For this reason, further experiments should be performed to determine the effect of every variant.

      It is also important to note that given the lower number of patients analysed these are not age and gender matched. The EPHB4 carrying DCM patients were younger than the DCM patients carrying a wild type EPHB4 sequence and mainly male. Finally, no biomaterial nor genetic testing from family related patients is available.

      1. Only CADD tools was used for pathogenicity, several tools should be used. Is the structure solved? Structural predictions on the consequences of the variants should be done.

      We have now included the ACMG classification in Table 1. As discussed above in the comments of Reviewer 1, some of the variants are classified as benign or likely benign. For this reason we have now toned down our conclusion suggesting that the EPHB4 may not be sufficient to trigger DCM but act as modifiers. This is supported by the fact that the histological analysis revealed that the patients carrying EPHB4 variants are similar among themselves and different to the other patients. Furthermore, our hypothesis is also supported by the fact that those patients carrying benign or potentially benign variants also carry another variant and the fact that they even have lower LVEF. The new classification has been included in the results and discussion sections and it reads as follows:

      Nevertheless, the classification of the variants according to the American College of Medical Genetics (ACMG) 25 suggests that two of the variants are benign, two likely benign, one likely pathogenic and one variant of uncertain significance (Table1). Nevertheless, two of the patients carrying one benign or likely benign also carry another variant classified as likely pathogenic or of uncertain significance (Table 1) and interestingly, the average LVEF of the two patients with multiple EPHB4 variants is 17.5 compared to 38,67 for the remaining 4 patients with only one EPHB4 variant and 35,17 for the six non-EPHB4 variant-carriers. Although the sample number only allows for a semi-quantitively analysis, it still hints at a possible EPHB4-variant effect, which certainly needs verification in a larger cohort.

      And

      Our analysis identified several variants in EPHB4 enriched in a cohort of DCM patients. According to the CADD score prediction, all these variants have a deleterious potential. Nevertheless, the ACMG classified some of the variants as benign or potentially benign. Also the fact, that one variant has identified in two non-related patients suggests that this variant may be benign for the protein. Nevertheless, two of the patients carrying a benign or potentially benign variant also carried another potentially pathogenic or of uncertain significance. During Eph-ephrin signalling, the binding of the ligand induces Eph receptor heterotetramers to initiate the signalling via Eph–Eph cis interactions30. Thus, variant EPHB4 molecules could have a dominant negative effect on these heterotetramers, and while maybe not completely abrogating its function, reducing the functionality of the heterotetramers. This observation could explain why the presence of one variant copy in the DCM patients of our cohort would be sufficient to reduce the activity of the Eph-ephrin signalling pathway. Although this shows that some of the variants may indeed not be the sole cause for DCM it shows that the Eph-ephrin signalling pathway, and in particular EPHB4 may be important for the development of DCM.

      Only parts of the protein have been resolved and present in the Protein Data Base.

      1. The microscopy Figure 1C-E is not convicing. Only one sample shown while 6 were available/investigated. I would not be comfortable to identify cardiomyocytes/endothelial cells from these sections

      As an appendix to this document, we included figures with images obtained from all the analysed patients. These were not included on the original figure for space reasons.

      These sections are perfect to identify cardiomyocytes and endothelial cells in cardiac tissue. First, endothelial cells, that form the microvasculature are labelled with ULEX, a well known marker of endothelial cells. Secondly, cardiomyocytes are really big cells easy to score for their size and location between the capillaries in the heart. Other cells present in the heart, like fibroblasts, macrophages, or pericytes would also be located in the space left in between cardiomyocytes but would need to be labelled for visualization. We believe that our interpretation of the immunohistochemistry pictures is correct.

      1. Functional work is needed to understand the interplay between EPHB4, CAV1 and CD36. Such as transfecting mutant EPHB4 into cells and probing for altered localisation/attachment of binding partners, most likely in endothelial - cardiomyocyte co-culture systems.

      Our study is based in our previous murine study in which we showed that the deletion of EphB4 or its ligand ephrinB2 would induce a phenotype similar to DCM in mice. At the molecular level, defects in the Ephb4 are linked to compromised caveolar function and reduced CAV1 phosphorylation, which involves the kinase Src, a known mediator of Eph receptor signalling. In the healthy heart, caveolar transport is required for the membrane translocation and correct function of fatty acid translocase FAT/CD36, which mediates the uptake of fatty acids. The objective of this follow up study was to study whether we could identify EPHB4 mutations in DCM patients. As seen in the results we have observed that there is an enrichment of EPHB4 variants in the DCM population. We think that the previous study supports our conclusions and hope that the reviewer agrees with us. Nevertheless, we agree with the reviewer that functional assays could be performed with every variant. We have included this in the new limitation sections of the manuscript described above.

      Minor points:

      1. Figure 1B does not make sense

      Figure 1B confirms the enrichment of EPHB4 mutations in the DCM population. We have corrected the labelling to make this clearer. We have now labelled the figure “EPHB4 variant allele frequency in control and DCM population”.

      1. Statistics: Which tests were performed, if normality tests were applied, which one was used?

      The tests used for every comparison are included in the figure legend. In case of EPHB4 variant allele frequency, we performed a paired one-tailed Student’s t test comparing the frequencies of the different variants in the two populations. In case of the CD36 and CAV1 quantifications, we performed a two-tailed one sample t test. In this case, we compare the expression of CD36 and CAV1 to an hypothetical healthy population with mean equal 1 as que have used this value for normalization.

      1. Please do not use contractions, e.g. 'can't' in discussion section

      Contractions have been removed from the manuscript.

      <br /> Referees cross-commenting****

      Overall I agree with the other reviewer on the points raised.

      Reviewer #2 (Significance): Description of EPHB4 as a novel DCM gene is of interest, but the current data are not convincing enough to make this statement.

      Mechanistic work on the interplay of endothelial cells and cardiomyocytes and consequences of EPHB4 variants would make it a very compelling story.

      Reviewer #3 (Evidence, reproducibility and clarity): Summary:

      The authors of this manuscript studied the prevalence of a population of Ephrin type-B receptor 4 (EPHB4) in a cohort of 573 DCM patients and found six new EPHB4 variants, possibly pathogenic based on the Combined Annotation Dependent Depletion (CADD) score and population frequency. Moreover, the authors perform immunofluorescence (IF) and histologic analysis on 6 EPHB4 variant carrying DCM patients, 6 DCM patients with wild type EPHB4 and one healthy control biopsy and found dysregulation of Caveolin 1 (CAV1) and CD36 (which are implicated in fatty acid transport in endothelial cells and cardiomyocytes) in both groups of DCM patients.

      Major comments:

      • Additional experiments are necessary to prove the hypothesis: for example, co-IF staining with endothelial markers should be provided. IF should be supported by western blots and qPCR.

      The objective of this study was to explore whether we could identify EPHB4 mutants in a DCM cohort. Interestingly we have shown that EPHB4 mutations are enriched in the DCM population when compared to the general population. Nevertheless, we agree with the reviewer that a more in depth mechanistic study would improve the significance of the study. We have included a limitations section that reads as follows:

      Although this study offers valuable insights to the potential implication of the Eph-ephrin signalling pathway in the development of DCM it has some limitations that need to be discussed. Despite finding increased presence of EPHB4 variants in the DCM population when compared to the healthy population, analysis of the identified variants in using different classifications (CADD and ACMG) not always predicted pathogenicity for these variants. For this reason, further experiments should be performed to determine the effect of every variant.

      It is also important to note that given the lower number of patients analysed these are not age and gender matched. The EPHB4 carrying DCM patients were younger than the DCM patients carrying a wild type EPHB4 sequence and mainly male. Finally, no biomaterial nor genetic testing from family related patients is available.

      • The DCM samples with wild type EPHB4, have no CD36: the mechanism by which a mutation in another gene could affect the Eph-ephrin signaling pathway should be at least discussed.

      These patients do not have any mutation on EPHB4. Based in the literature and the previous murine study show that the Eph-ephrin signaling pathway is upstream of CD36. For these reasons we believe that our observation that shows that CD36 expression is reduced in all DCM patients confirms the important role of CD36 in cardiac homeostasis and the development of DCM. We further, as indicated in the discussion, other genes crucial for fatty acid transport may be involved in cardiac disease and thus, this study may help identify new diagnostic or therapeutic targets.

      • The authors should discuss and possibly prove the correlation between mutant EPHB4 and CD36 and CAV1 expression and localization in endothelial cells vs cardiomyocytes and explain the mechanistic implications of co-localization of CAV1 with CD36.

      In a previous study we showed that the deletion of EphB4 or its ligand ephrinB2 would induce a phenotype similar to DCM in mice. At the molecular level, defects in the Ephb4 are linked to compromised caveolar function and reduced CAV1 phosphorylation, which involves the kinase Src, a known mediator of Eph receptor signalling. In the healthy heart, caveolar transport is required for the membrane translocation and correct function of fatty acid translocase FAT/CD36, which mediates the uptake of fatty acids. We have expanded the introduction to explain the relationship between these molecules. It reads as follows:

      Mechanistically, EPHB4 deficient endothelial cells are characterized by compromised caveolar function and reduced Caveolin 1 (CAV1) phosphorylation. EPHB4 is required for the phosphorylation of CAV1 at Tyr-149. The phosphorylation of CAV1 promotes the release of caveolae from the plasma membrane10. Caveolae are required for the correct membrane translocation of the fatty acid translocase FAT/CD3611 and fatty acids are used by cardiomyocytes to obtain about 50% to 70% of their energy12. Absence of CD36 in cardiomyocytes reduces fatty acid uptake by the cardiac muscle cells13 and accelerates the progression from compensated hypertrophy to heart failure14. Finally, some cardiomyopathies a causally related to defects in the synthesis of the proteins required for fatty acid uptake in the heart15.

      • The available snRNAseq raw data are from normal subjects and aortic stenosis patients who are different from DCM patients. A better dataset would be the one from Reichart D, et al. Pathogenic variants damage cell composition and single cell transcription in cardiomyopathies. Science 2022.

      The single nucleus RNA sequencing data was used in an exploratory manner to study whether EPHB4 would also be expressed in cardiomyocytes. We did not perform any study on gene expression comparing the two groups. We believe that the use of this dataset is justified. We hope that the reviewer agrees with us.

      • Furthermore, the link between the analysis done on the published snRNA seq datasets and the authors' own data is not clearly explained.

      As we stated above and in the methods, we have used the single nucleus RNA sequencing to explore whether cardiomyocytes express EPHB4. The sentence in the methods reads as follows:

      The single-nucleus-RNA-sequencing data set generated in the paper by Nicin et al.14 was used to explore EPHB4 expression in human cardiac cells

      • DCM1 and DCM 3 carry 2 EPHB4 variants: please describe if the phenotype was more severe.

      As discussed above in the response to reviewer 1, the two patients with multiple EPHB4 variants present an average LVEF (echo) of 17.5 compared to 38,67 for the remaining 4 patients with only one EPHB4 variant and 35,17 for the six non-EPHB4 variant-carriers. Although the sample number only allows for a semi-quantitively analysis, it still hints at a possible EPHB4-variant effect, which certainly needs verification in a larger cohort.

      This information has been included in the manuscript and reads as follows:

      and interestingly, the average LVEF of the two patients with multiple EPHB4 variants is 17.5 compared to 38,67 for the remaining 4 patients with only one EPHB4 variant and 35,17 for the six non-EPHB4 variant-carriers. Although the sample number only allows for a semi-quantitively analysis, it still hints at a possible EPHB4-variant effect, which certainly needs verification in a larger cohort.

      • Provide p values on suppl table 1. The 2 groups are not matched by age and maybe gender, and this could affect the histological findings.

      We have not performed any comparison between the two groups in the characteristics shown in supplementary table 1. Nevertheless, we agree with the reviewers that the fact that the patients are not matched in age and gender is a limitation to our study. We have acknowledged this in the new included limitations section that is mentioned above.

      • Please discuss why in the DCM population the EPHB4 variant is enriched as compared with controls. Causal role? Modifiers?

      The deletion of EphB4 and its ligand ephrin-B2 induce DCM in mouse. The objective of this study was to determine whether there would be mutations in EPHB4 associated to DCM. We agree with the reviewer that in depth mechanistic studies both in vivo and in vitro would be required to determine the exact role of the here identified mutations in the development of DCM. This has been acknowledged in the new limitations sections and indicated in the discussion of the results as follows:

      Finally, this study not only supports the crucial role of EPHB4 in the heart, but it also corroborates the importance of CD36 and CAV1 for the cardiac health, and has the potential to improve diagnosis and risk stratification tools for DCM. Nevertheless, whether mutations in EPHB4 are causative or modifiers of the disease should be further studied. In addition, as other genes crucial for fatty acid transport may be involved in cardiac disease, this study may help identify new diagnostic or therapeutic targets.

      • The data and the methods are presented in such a way that they could be reproduced however,

      We thank the reviewer for the positive comment on our methods section.

      • At least 2 more healthy controls should be included, and the DCM groups should be matched by gender and age.

      Healthy donor biopsies are very rare and difficult to obtain. Although we agree with the reviewer that this could strengthen our study, we cannot add more healthy biopsies. We hope the reviewer understands this.

      As stated above, we have included a limitation section in the manuscript discussing the issue with the gender and age.

      • The causal mutation of the DCM patients should be provided.

      Only 35% of DCM cases have been related to mutations in genes encoding cytoskeletal, sarcomere or nuclear envelope proteins. In our case, the DCM patients that we use do not carry a variant in any of the DCM known genes. We have now expanded the methods sections explaining the inclusion criteria for the DCM patients including this issue:

      The criteria to be included in the study was reduced left ventricular ejection fraction (LVEF) <50% validated either with two independent image techniques or at two different time points with the same imaging technique. Furthermore, patients should include left ventricular dilation (LVEDD) >117% corrected with age and body surface according to the Henry-Formel formula (LVEDD= 45,3 * BSA1/3 – 0,03*Age –7,2). In both cases the heart were analysed either by echocardiography or magnetic resonance tomography (MRT).

      Minor comments:

      • I would explain in more detail the interactions among EPHB4, CD36 and CAV1 in the introduction, as the readers may not be familiar with this pathway.

      We have completed the introduction expanding the paragraph where the relationship between EPHB4, CD36 and CAV1 is presented. It now reads as follows:

      Mechanistically, EPHB4 deficient endothelial cells are characterized by compromised caveolar function and reduced Caveolin 1 (CAV1) phosphorylation. EPHB4 is required for the phosphorylation of CAV1 at Tyr-149. The phosphorylation of CAV1 promotes the release of caveolae from the plasma membrane10. Caveolae are required for the correct membrane translocation of the fatty acid translocase FAT/CD3611 and fatty acids are used by cardiomyocytes to obtain about 50% to 70% of their energy12. Absence of CD36 in cardiomyocytes reduces fatty acid uptake by the cardiac muscle cells13 and accelerates the progression from compensated hypertrophy to heart failure14. Finally, some cardiomyopathies a causally related to defects in the synthesis of the proteins required for fatty acid uptake in the heart15.

      • Panel B in Fig 1 shows 4 variants and not 6.

      All variants are shown in the panel As stated in the response to reviewer 1, it the fact that some variants have the same value that induces to think that only four are shown. The variants that do not appear in the genomAD have been considered 0 for this analysis.

      • IF in Fig 1: make sure that control and DCM are at the same magnification.

      Both control and DCM are at the same magnification. The reason why it looks different is the DCM phenotype. Cardiomyocytes are hypertrophic in the in the disease samples giving the impression that they are shown in a higher magnification.

      • The authors analyze snRNA seq data from available datasets and not from their own patients: so, the paragraph title in the method section should be changed as it is misleading.

      We have changed the title of this section of the methods. We have labelled it now “Analysis of single-nucleus-RNA-sequencing”.

      Reviewer #3 (Significance): Despite the main focus of the manuscript is EPHB4, dysregulation of CD36 and its interaction with CAV1 seem to be a common mechanism in the pathogenesis of all DCM. The significance of these findings is higher than the role of EPHB4 alone and should be improved.<br /> Metabolic abnormalities, mainly affecting the fatty acid metabolism, have been described as causes or modifiers of DCM pathogenesis but in my knowledge the role of EPHDB4, CD36 and CAV 1 have not been studied in human tissues. The discovery of the mechanisms through which dysregulation of metabolism is induced by DCM genetic mutations would be an advance in the field. However, the paper in the present form is not going to have a significant impact. There is no clear connection between the sets of experiments and more mechanistic experiments should be provided to prove causality. This may take months or even years depending on the availability of human tissues and resources.

      The type of audience interested in this research are mainly translational scientists mainly in the field of genetic cardiomyopathies. Furthermore, the elucidation of the metabolic effects of genetic mutations on DCM evolution may be of interest in the field of heart failure in general.

      The focus of my research is genetic and molecular pathogenesis of cardiomyopathies.

    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Revision summary.

      Additional new data.

      • CYPA expression levels in Scrm Vs KO Vs R55A isogenic cell lines as new Fig 1C.
      • ATR signaling: western blot analysis of HU-induced p-CHK1 (S345) in Scrm, KO and R55A isogenic cell lines as new Suppl Fig 1B.
      • MRN expression: western blot analysis of expression of NBS1, MRE11, RAD50 and MCM2 is Scrm, KO and R55A isogenic cell lines as new Suppl Fig 7A.
      • NBS1 subcellular fractionation: western blot analysis of NBS1 from whole cell extract Vs cytoplasmic extract Vs nuclear extract comparing expression/distribution in Scrm, KO and R55A isogenic cell lines, as new Suppl Fig 7B.
      • CYPA immunofluorescence (IF) staining on untreated and HU treated U2OS, as new Suppl Fig 7C.
      • CYPA immunofluorescence (IF) staining on untreated and HU treated U2OS following pre-extraction, as new Suppl Fig 7D.
      • DepMap Project Score Cancer Gene Dependency cell survival (“fitness”) following PPIA/CYPA-KO in breast carcinoma cell lines mapped against BRCA2 status, as a new Suppl Table 5.
      • DepMap Project Score Cancer Gene Dependency cell fitness following PPIA/CYPA-KO in Neuroblastoma cell lines, as a new Suppl Spreadsheet 4.
      • DepMap Project Score Cancer Gene Dependency cell fitness following PPIA/CYPA-KO in Multiple Myeloma cell lines, as a new Suppl Spreadsheet 4.
      • DepMap Project Score Cancer Gene Dependency cell fitness following PPIA/CYPA-KO in Chronic Myelogenous Leukaemia cell lines, as a new Suppl Spreadsheet 4.

      Revised and/or additional text.

      The Abstract, Introduction, Materials & Methods, Results and Discussion have been amended as necessary, to facilitate the issues raised by the Reviewers.

      Reviewer #1: We thank this reviewer for their understanding and appreciation of our CYPA study as espoused by their comprehensive summary of the content, importance, and potential implications of our work; “The manuscript presents clear and comprehensive data, demonstrating the profound impact of CYPA on DNA repair.” Furthermore, we very much appreciate their robust and complementary words regarding the significance of our work and its wide appeal; “The significance of this study is twofold: it adds a new layer to our understanding of DNA repair mechanisms and, importantly, it could point the way to novel therapeutic strategies for cancer. It will spark interest from molecular biologists to clinicians and pharmaceutical researchers.”

      Query:

      It's surprising to find that the loss of CYPA abolished HU-induced NBS1 foci, as the MRE11 interactive domain of NBS1 should remain intact in CYPA deficient conditions and the N-terminus of NBS1 is dispensable for ATM activation (Kim et al., 2017; Stracker and Petrini, 2011). A more detailed mechanistic explanation of this phenotype would be appreciated. The authors should check the subcellular localization of NBS1 and the stability of MRN in wildtype and CYPA KO cells. Additionally, including the kinetics of NBS1 foci formation using multiple timepoints in wildtype and CYPA KO cells after damage will further support the observation.

      RESPONSE:

      Regarding NBS1 foci formation, we note that rather than abolish HU-induced NBS1 foci formation, CYPA loss (through KO) and/or inhibition (through p.R55A) in fact results in a “…spontaneously elevated yet unresponsive amount of NBS1 foci/cells when compared to scrambled” (see original Fig 9A legend and associated Results section text). We have reinforced this observation in the revised Results section entitled ‘CYPA influences NBS1 and MDC1 foci formation’ and in the Discussion section. We do describe a kinetic impairment of RAD51 foci formation in the CYPA-engineered lines up to 16hrs post HU-treatment (Fig 6D). Our mechanistic working model is that CYPA interacts directly with NBS1 via a Pro residue within the short linking peptide between the FHA and BRCT1, and that this likely influences the relative dynamic positioning of the FHA with BRCA1-BRCT2, at least following acute HU treatment; replication fork stalling, likely biased towards ATR-dependent signaling initially, rather than that of ATM. The relative positioning of these functional domains can impact MRN function, and we discuss this possible mechanism in the section entitled ‘CYPA and the MRN complex’, with reference to the detailed structure-function analyses and complementary DDR activation models described by<br /> - Williams, R.S., et al., Nbs1 flexibly tethers Ctp1 and Mre11-Rad50 to coordinate DNA double-strand break processing and repair. Cell, 2009. 139(1): p. 87-99.<br /> and<br /> - Lloyd, J., et al., A supramodular FHA/BRCT-repeat architecture mediates Nbs1 adaptor function in response to DNA damage. Cell, 2009. 139(1): p. 100-11.<br /> and<br /> - Rotheneder, M., et al., Cryo-EM structure of the Mre11-Rad50-Nbs1 complex reveals the molecular mechanism of scaffolding functions. Mol Cell, 2023. 83(2): p. 167-185.e9.

      The N-terminal FHA-BRCT region of NBS1 does indeed influence MRN recruitment and HRR execution, a point we highlight in the section entitled ‘CYPA influences NBS1 and MDC1 foci formation’, with reference to the seminal original observations of<br /> - Sakamoto, S., et al., Homologous recombination repair is regulated by domains at the N-<br /> and C-terminus of NBS1 and is dissociated with ATM functions. Oncogene, 2007. 26(41): p.6002-6009<br /> and<br /> - Tauchi, H., et al., The forkhead-associated domain of NBS1 is essential for nuclear foci formation after irradiation but not essential for hRAD50-hMRE11-NBS1 complex<br /> DNA repair activity. J Biol Chem, 2001. 276(1): p. 12-15.<br /> and<br /> - Zhao, S., W. Renthal, and E.Y. Lee, Functional analysis of FHA and BRCT domains of NBS1 in chromatin association and DNA damage responses. Nucleic Acids Res, 2002. 30(22): p. 4815-22.<br /> and<br /> - Cerosaletti, K.M. and P. Concannon, Nibrin forkhead-associated domain and breast cancer C-terminal domain are both required for nuclear focus formation and phosphorylation. J Biol Chem, 2003.<br /> 278(24): p. 21944-21951.

      HU-unresponsive NBS foci (indicative of MRN dysfunction) and MDC1 foci formation are consistent with the DNA-R (i.e., DR-GFP reporter systems: Fig 3A-C and impaired RAD51 foci formation: Fig 6D) and resection-related phenotypes (Fig 6A-B) we report here and are also consistent with the relative resistance to HU-induced killing we report for CYPA-KO and CYPA-R55A cells (Fig 11A and as reported by Manthey, K.C., et al., NBS1 mediates ATR-dependent RPA hyperphosphorylation following replication-fork stall and collapse. J Cell Sci, 2007. 120(Pt 23): p. 4221-9).

      At the reviewer’s request we include additional novel experimental data showing that MRN expression is stable and equivalent in control, CYPA-KO and CYPA-R55A cells (Suppl Fig 7A). We also provide evidence that NBS1 subcellular distribution (via extract fractionation) is not altered upon CYPA loss and/or inhibition (Suppl Fig 7B).

      Query:

      The authors showed that the interaction between CYPA and MRN didn't change after HU treatment. The authors should also include co-localization analysis of CYPA and NBS1 after HU.

      RESPONSE:

      At the reviewer’s suggestion we undertook a series of IF analyses concerning endogenous CYPA (i.e., +/- HU, +/- pre-extraction). We found that endogenous CYPA failed to form foci following HU thereby precluding CYPA-NBS1 foci co-localization analysis (Suppl Fig 7C-D).

      Query:

      The paper demonstrated that BRCA2 knockdown cells were sensitive to CsA. The authors should also examine CsA sensitivity in BRCA2 deficient cancer cells. In addition, the authors could elaborate more on their criteria for selecting cancers for CYPA inhibition, whether it is based on high genomic instability or an addiction to HRR for survival.

      RESPONSE:

      Despite repeated attempts we have been unable to successfully routinely culture the TNBC suspension line HCC1599 (BRCA2 c.4154_5572del1419 and p.K1517fs*23), consistent with its reported ~5 days population doubling time. Although not a tumour line per se, we also failed to effectively culture the FANC-D1 patient FB line HSC62 (BRCA2 c.8488-1 G>A (IVS19-1G>A)) to enable survival analysis. We provide new quantification analysis of the CsA survival on the H1299 conditional shBRCA2 line (Fig 11E). Additionally, we include a comprehensive new analysis of cell survival (“fitness”) of a range of breast carcinoma cell lines following PPIA/CYPA-KO, extracted from DepMap Project Score Cancer Gene Dependency portal (https://score.depmap.sanger.ac.uk/), and also specify the BRCA2 status of each line. Interestingly, we find that reduced BRCA2 copy number is more commonly associated with loss of fitness following PPIA/CYPA loss (Suppl Table 5). We also include similar cell line fitness datasets for each of the cancers for whom we demonstrate elevated sensitivity to CYPAi (i.e., Neuroblastoma, Multiple Myeloma and CML) (Suppl Spreadsheet 4). Fascinatingly, PPIA/CYPA loss clearly results in loss of fitness in most of these cancer cell lines. Collectively, these new independent comprehensive datasets support our argument that targeting CYPA in select cancer scenarios shows impact in the preclinical setting and may represent an effective new strategy.

      The unifying features of the cancers showing elevated sensitivity to CYPAi are indeed high genomic instability, denoted by elevated RS and hence a dependency upon replication fork protection machinery. This would be consistent with the observed lethality of our CYPA-panel to shBRCA2, siXRCC3 and siRAD51C. The cancers are additionally characterised by aberrantly elevated HRR (i.e. an addiction to/dependency on HRR). This would be consistent with the observed lethality of our CYPA-panel to siCtIP, siRAD52, siXRCC3, and siRAD51C. At the Reviewer’s request we have reinforced and better clarified this point in the section Potential rational applications of CYPA inhibition in select cancers and in the Discussion.

      Reviewer #2:

      We thank this reviewer for their positive and supportive comments concerning our work; “Authors have quite conclusively explored the interaction between NBS1 and cyclophilinA as well as the putative proline residue important for this interaction.” We appreciate the constructive feedback concerning the range of consequences/impacts of CYPA impairment and we concur with their contention that “This manuscript will have broad interest from groups working on genomic stability, immunology as well as cancer therapy.”; a general view also voiced by Reviewer #1.

      We do stress that whilst other prolyl isomerases have previously been linked to DNA repair (e.g., most notably the Parvulin family member PIN1), this is the first time that CYPA has been directly implicated in DNA repair, and the first time CYPA has been shown to directly interact with a known DNA-R protein (i.e. NBS1).

      We believe that the comprehensive CYPA-BioID we describe is worthy of report and should serve as a very useful starting point for additional studies concerning CYPA biology, which is undoubtedly complex. The interactome will also function as a useful tool in helping dissect the clinically significant wider biological consequences of CYPA inhibition. Our interactome findings demonstrate that CYPA may influence DNA-R via multiple, and not necessarily mutually exclusive, routes. We do not argue that CYPA’s role in DNA-R is exclusively via NBS1/MRN. This is clearly demonstrated by our validation of CYPA interactions via co-IP with endogenous CYPA with proteins including PCNA, 53BP1, CHAMP1 and ILF2-3 complex (Fig 5). These are completely novel observations that furthermore reinforce the validity and efficacy of our experimental approach in leveraging the CYPA-BioID to provide new biological insight into this druggable prolyl cis-trans isomerase.

      Query:

      Authors show delayed S-phase transit along with reduced replication speed indicating replication stall. However, authors have not discussed how cyclophilinA might regulate replication (other than hypothesizing regarding altered dynamism of FHA-BRCT). It is conceivable that it could be an indirect effect on cellular metabolism or if authors believe it could be due to direct disruption to core replication machinery or signaling. In this regard, it will be helpful to see if there is shortening of (premature entry) G1 phase and comment on the status of the associated G1/S checkpoint.

      RESPONSE:

      The reviewer makes a very interesting and astute observation concerning the DNA replication phenotypes we report following CYPA loss and/or inhibition. The bases of these phenotypes are likely multifactorial, and we have revised the associated Discussion text to reflect this. Specifically, we highlight the elevated and unresponsive NBS1 and MDC1 foci seen in the CYPA-KO lines (Fig 9. i.e., persistent protein-DNA complexes) and dependence upon fork protection factors (XRCC3, RAD51C, BRCA2: Fig 11). We also report that a range of DNA replication factors are found in the CYPA-BioID (Fig 5A). Untangling the functional significance of these putative interactions would involve further study. Are they direct/indirect interactors? If direct, are they prolyl isomerase substrates or chaperone clients or regulated by liquid-liquid phase separation (LLPS)? Similarly, the CYPA-BioID throws-up an extensive set of RNA binding factors (Suppl Table 2), many of whom may conceivably contribute to the replication–transcription fork conflicts/collisions under conditions of CYPA-dysfunction. As this is the first comprehensive report of the cellular impacts of CYPA loss and inhibition, we thought it worth reporting the DNA replication associated phenotypes specifically to demonstrate the pleiotropic impact of loss and inhibition of this particular prolyl isomerase, to underscore its significance/importance. Although we have indeed found cell cycle phase transition impairments in our CYPA-KO and CYPA-R55A cells (for both G1-S and G2-M), these constitute additional studies requiring more thorough molecular-mechanistic characterization. We chose to focus on DNA repair for this first manuscript, as the CYPA-NBS1 interaction was the physical relationship for which we have assembled the most detailed and interconnected datasets, to-date. We do intend to pursue the cell cycle work as it too is derived from our CYPA-BioID (Suppl Spreadsheet 1), and we have already validated some of those relevant interactions by CYPA co-IP, but this is very much a work-in-progress. With this manuscript we’re endeavoring to tread a fine line by showcasing a wide range of cellular phenotypes resultant from CYPA loss and inhibition, but then also showing a deeper level of characterisation with at least one relevant interactor known to function in a range of DNA-R pathways wherein we’ve found impairments and dependencies.

      Query:

      In connection to this, it will also be interesting to see if the ATR/Chk1 signaling axis is intact in CYPA KO cells with or without additional DNA damage compared to WT.

      RESPONSE:

      At the reviewer’s request we include new data showing that HU-induced ATR-dependent CHK1 phosphorylation is normal in CYPA-KO and CYPA-R55A cells, and that ATR does not appear to be spontaneously activated in the absence of replication stress in these cells (Suppl Fig 1B).

      Query:

      Authors show that the P112 residue of NBS1 is important for the binding of cyclophilinA. What is the status of interaction among components of the MRN complex in CYPAKO cells and P112G NBS1? Further, what are the authors' thoughts on rescue experiments and whether P112G containing NBS1 to perform resection function.

      RESPONSE:

      We include new data showing normal expression of MRN components and normal subcellular localisation of NBS1 in the CYPA-KO and CYPA-R55A cells (Suppl Fig 7A-B). Regarding the interaction status of P112G, we show that this fails to co-IP endogenous CYPA when transiently expressed in HEK293 cells, in marked contrast to WT-NBS1 (Fig 8A). Furthermore, we show that ablation of another FHA Pro residue (P64) does not impair co-IP with endogenous CYPA under similar conditions, suggesting P112G is unique in this regard. Our recombinant protein interaction work demonstrates that CYPA-Step directly interacts with a HIS-(FHA-BRCT1) peptide and that P112G abolishes this interaction (Fig 8B). Regarding rescue experiments, we’ve found that stable overexpression of NBS1 can be neomorphic, resulting in resistance to certain DNA damaging agents, thereby complicating cell-based rescue analyses. We stress that along with our engineered KO and R55A (isomerase-dead) lines we have employed the well-known CYPAi Cyclosporin A (CsA) to reproduce several of the DNA-R related phenotypes (e.g., Fig 1, Fig 3, Fig 6, Fig 10, Fig 11). To further examine impacts upon resection specifically, a logical approach would be to engineer P112G into a full-length recombinant (baculoviral produced) human MRN complex for in vitro kinetic assessment using various labelled DNA substrates. But we think that this specialist and not insignificant undertaking is outside the scope of our report of the extensive cellular consequences of CYPA loss and dysfunction and it’s potential (pre)clinical significance with regards CYPAi repurposing.

      Query:

      What are the protein levels of MRN, RAD51 etc. in CYPAKO cells? It will be important control to delineate the effects of CYPA on global transcription and translation vs specific and direct effect on end-resection. Can overexpression of NBS1 rescue the observed resection and focus phenotypes?

      RESPONSE:

      Basal levels of RAD51 foci/cell are comparable between Scrm and both CYPA-KO and R55A cells (Fig 6D). We also find comparable levels of MRN components between these lines (Suppl Fig 7A). Importantly, we observe the pRPA/resection defect following an acute (up to 3hrs) treatment with CsA; conditions unlikely to grossly impair translation to an extent that would result in reduced expression of the relevant DNA-R proteins. Furthermore, microarray based transcriptomic analyses of these isogenic lines did not show evidence of a global impact upon transcription following CYPA-KO or R55A, nor was there evidence of reduced expression of any genome stability/DNA-R genes. We did not include this negative data so as to maintain the focus on the functional link with DNA repair.

      Reviewer #3: This critically negative review is myopic, unbalanced, self-contradictory and frustratingly mis-represents some of our key findings. The dismissive tone of the text unnecessarily and unprofessionally crosses into the pejorative (“Either evidence is lacking or experiments were not performed in a convincing way”). The stark contrast between this review and the summations of Reviewer #1 and Reviewer #2 serve to highlight this hyper-negative approach.

      It is very frustrating that this reviewer describes our findings as “…an interesting story…”, that “…the identification of NBS1 as a novel substrate of CYPA is significant” , that the “..manuscript may provide new insight…”, and that “…the role of CYPA in DNA repair is fairly well described using its inhibitor or KO cells”, and yet then concludes by stating “… the current manuscript suffers lack of evidence to support the main conclusion”. This is self-contradictory and unbalanced. Again, the contrast with Reviewer #1 and Reviewer #2 in this regard is stark.

      Major critical theme no. 1.

      Expression of CYPA-R55A: “…vastly different…”

      RESPONSE.

      This reviewer dismisses the entirety of the R55A model cell line work based upon the apparent “…vastly different…” expression levels of the reconstituted lines. This is an overstatement of the situation and notably not an issue for either Reviewer #1 or Reviewer #2. Nonetheless, we have replaced the original CYPA blot in Fig 1C with a clearer and more representative depiction of expression levels between the engineered lines and control. Importantly, the pRPA/resection work, siRAD52 and siXRCC3 dependency work were all corroborated/reproduced using the CYPA PPI inhibitor Cyclopsorine A (CsA). The plurality of our complementary approaches showing the influence of CYPA upon DNA-R is minimised and/or ignored by this Reviewer. Although not shown in this study, we find that the R55A cells are selectively sensitive to DNA cross-linker melphalan, in contrast to the CYPA-KO cells. Although we don’t yet understand the basis of this observation, this clearly indicates that R55A expression is a valid model in our hands and is not a like-for-like mimic of CYPA-KO simply because of reduced expression. We appreciate the reviewer could not know this.

      Major critical theme no. 2.

      CYPA-NBS1 work: “Another major concern is that the evidence to support that NBS1 is the major substrate of CYPA is lacking since all the experiments were performed with the CYPA mutant or CsA treatment.”

      RESPONSE:

      We do not claim that NBS1 is ”… the major substrate of CYPA.” . We do not claim that all the DNA-R deficits we have identified are specifically a consequence of impaired NBS1 function. These are misrepresentations of our findings and how we’ve presented and discussed them. This Reviewer ignores our comprehensive CYPA-BioID, and specifically our discussion pertaining to the DNA-R and Replication factors found therein (section entitled ‘CYPA Interacting protein partners’ and Fig 5A). We explicitly discuss the fact that “A recurring theme amongst these CYPA interactors is that all are involved in end-resection” whilst also demonstrating CYPA co-IP with 53BP1, CHAMP1 and ILF2-3 (Fig 5C-E). In the ‘Discussion’ section we describe a “homesostatic role for CYPA in genome stability”, including possible contributions to controlling LLPS of well-known DNA-R factors and the fact that several mitotic, kinetochore, centrosomal and spindle proteins are found in the CYPA-BioID.

      Major critical theme no. 3.

      A major repeated criticism levelled by this reviewer as a basis for dismissing the entirety our findings is that we have failed to demonstrate that the catalytic activity of CYPA is required for DSB repair.

      • Their conclusion should be supported by additional key experiments to prove that the catalytic activity of CYPA is indeed required for DSB repair…

      • Another major concern is that the evidence to support that NBS1 is the major substrate of CYPA is lacking since all the experiments were performed with the CYPA mutant or CsA treatment.

      • One major weakness of this study is that it focuses on characterizing the interaction between CYPA and NBS1, then jumps into a conclusion that the catalytic activity of CYPA is required for DSB repair based on its direct interaction with NBS1

      RESPONSE:

      As this criticism is repeated, the impression created, and no doubt intended, is that the manuscript is irreparably flawed (“…major weakness…”). This is an over-simplification and a misdirection. It’s notable that this critique isn’t raised in such a manner by either Reviewer #1 or Reviewer #2. This is likely because any modest inferences we made concerning the possible role of CYPA catalytic isomerase activity were based on a combination of differing but complementary approaches. Firstly, we routinely used the p.R55A engineered CYPA variant, although this Reviewer regards our use of this as invalid. This longstanding peptidyl prolyl isomerase (PPI)-dead mutant model has frequently been employed to invoke the catalytic function of CYPA. The mutant was originally proposed and characterized as catalytically-dead using the in vitro chymotrypsin-coupled prolyl isomerase assay using N-succinyl-AAPF-p-nitroanilide as a substrate as far back as 1992 (Zydowsky, L.D., et al., Active site mutants of human cyclophilin A separate peptidyl-prolyl isomerase activity from cyclosporin A binding and calcineurin inhibition. Protein Science, 1992. 1(9): p.1092-1099). In addition, we routinely use Cyclopsorin A (CsA), the longstanding clinically relevant CYPA PPI inhibitor, and we also use a different and more potent CYPA PPI inhibitor, namely NIM811 (N-methyl-4-isoleucine-cyclosporine) for the DR-GFP reporter assays of individual DNA-R pathway function (i.e.’ NHEJ, HRR and SSA).

      With regards to our findings concerning CYPA-NBS1 interaction, in the Discussion section we clearly state that mechanistic analyses of prolyl isomerase on the dynamism of NBS1 FHA-BRCT would require specialist approaches outside the scope of this manuscript, as the manuscript is firmly within the realm of cellular biology. This is ignored by this Reviewer. Specifically, we state that “A regulated cis-trans isomerisation of the E111-P112 peptide bond could conceivably dynamically alter the relative positioning of the FHA domain with the tandem BRCTs of NBS1 (Fig 7C-D). This may then impact on these domains’ abilities to dynamically interact with their respective phospho-threonine (for FHA) and phospho-serine (BRCT) containing targets, consequently likely shaping/impacting NBS1 recruitment dynamics and/or plasticity of its interactome [120-122]. Demonstrating this hypothesis would require additional structural analysis using techniques such as 2D-NMR which is outside the scope of this manuscript.”

      Minor comments: 1.

      Fig. 1E; is the survival between KO and R55A statistically significant? If so, do the authors have an explanation? Why is the reconstitution of R55A more toxic than KO alone?

      RESPONSE:

      Yes, R55A is slightly more sensitive compared to KO for this endpoint. The irony that this observation runs contrary to the Reviewer’s dismissal of the R55A model line is not lost on us (Major critical theme no. 1). As is well-known for PARP1, inhibition is not equivalent to absence. A possible speculative explanation is that the R55A isomerase-dead could have additional dominant impacts compared to the KO situation. Nonetheless, we suspect this Reviewer would object to such speculation in the absence of ever more data.

      Minor comments: 2.

      In Fig. 3D, the NHEJ activity of CsA- or NIM811-treated cells is significantly downregulated in comparison to control, which raises the possibility of the pleiotropic effect of CYPA inhibition. The authors should discuss this issue.

      RESPONSE:

      Not necessarily indicative of a pleiotropic effect if one accepts that absence of a protein is not always biologically equivalent to the presence of an inhibited version the same protein. Of note, we do see somewhat reduced NHEJ following siCYPA (Fig 3A), something not mentioned by this Reviewer. Furthermore, we explicitly discuss and show interaction between CYPA and 53BP1, CHAMP1 and ILF2-3 complex, all players in NHEJ and in the intricate balance between NHEJ and resection-mediated recombination directed repair pathways.

      Minor comments: 3.

      In Figure 8A, since the expressions of Flag-NBS1 WT, P112G, and P64G are very different, the conclusion that the isomerization of CYPA is essential for NBS1 cannot be supported. Given the variation of input levels, it appears that the P64G mutation actually enhances the interaction with endogenous CYPA. Is this reproducible? This co-IP result may need to be quantified from independent sets for statistical analysis.

      RESPONSE:

      We do not claim that “…isomerization of CYPA is essential for NBS1…”. Fig 8A data is derived from a transient transfection. Whilst there is some variation in expression, we do not make any precise quantitative conclusions from these co-IPs. Nonetheless, FLAG-NBS1-P112G clearly interacts less with endogenous CYPA in this system. Importantly, and ignored by this Reviewer, the associated recombinant protein work shown in Fig 8B clearly confirms that NBS1-P112G is profoundly compromised in its ability to interact with CYPA.

      Minor comments: 4.

      A defect in DSB repair generally hypersensitizes cells to DNA replication stress, including HU. In this regard, resistance of the CYPA KO (or R55A cells) to HU is interesting, but it may be due to the nonspecific effect of the CYPA loss in multiple DNA damage signaling and repair processes. Alternatively, cell cycle may be affected nonspecifically, rendering cells resistant to replication-associated genotoxic stress. This needs to be addressed further. Analysis of overall cell cycle profile may be required.

      RESPONSE:

      Resistance to HU is likely multifactorial and cell cycle transition kinetics may be relevant here. That is why we linked the DNA replications phenotypes to this discussion in the section entitled “Impaired CYPA function reveals novel genetic dependencies/vulnerabilities”. A comprehensive analysis of cell cycle profile and phase transits is outside the scope of the current manuscript (see response to Reviewer #2).<br /> Impaired HU-induced pRPA has been linked to HU-resistance via NBS1 previously: Manthey, K.C., et al., NBS1 mediates ATR-dependent RPA hyperphosphorylation following replication-fork stall and collapse. J Cell Sci, 2007. 120(Pt 23): p. 4221-9.

      Minor comments: 5.

      Text not to mention Abstract is too dense. The manuscript will benefit a lot from extensive editing and rearrangement of figures to make the story more succinct for journal submission.

      RESPONSE:

      The Reviewer’s view concerning a lack of succinctness is not shared by Reviewer #1 and Reviewer #2. We have endeavored to draft a concise and accessible manuscript, the main body of which comes in at just over 23x sides of A4 (including Materials & Methods). Considering we guide the reader through 12x multipart figures, 5x supplementary tables and 8x supplementary figure, we believe we have achieved succinctness. Nonetheless, we will of course take direction from the appropriate journal editorial team regarding house style and format.

    1. Authors’ response (3 January 2024)

      GENERAL ASSESSMENT

      The TMEM16 protein family is composed of ten members in mammals, and fewer in lower eukaryotes. Members within this protein family play remarkably different roles: some serve as Ca<sup>2+</sup>-activated ion channels, others work as lipid scramblases in a Ca<sup>2+</sup>-dependent manner, and some combine the two functions. The molecular determinants responsible for lipid transport in TMEM16 scramblases are not fully defined. The current view of lipid scrambling is that, in presence of Ca<sup>2+</sup>, TMEM16 scramblases change their conformation to expose a hydrophilic ‘groove’ to the membrane. This destabilizes the lipid bilayer, enabling translocation of lipids (e.g. phosphatidylserine) from the inner to outer leaflet of the membrane. However, recent evidence suggests that scrambling can occur even when the hydrophilic groove is closed.

      The new study by Feng and colleagues aims to investigate the molecular basis of closed-groove scrambling using the fungal scramblase, nhTMEM16. This protein was previously reported to maintain closed groove conformations even in the presence of Ca<sup>2+</sup>. The authors resolved a series of WT nhTMEM16 structures in two different nanodisc scaffolds, as well as several mutants with impaired scrambling. Strikingly, the conformational landscape of nhTMEM16 was found to rely on the lipid composition and scaffold used: the smaller E3D1 scaffold favored closed groove states and the larger 2N2 scaffold permitted intermediate and open-groove conformations. A high-resolution closed-groove structure obtained in E3D1 allowed the identification of a continuous file of lipid molecules around the catalytic groove region, providing a structural basis for lipid interaction with the closed groove. This complements prior work from this group involving a closely-related homolog, afTMEM16, in which the authors were able to visualize lipid molecules around the open groove. Furthermore, the authors succeeded in capturing three novel states of nhTMEM16 (Ca<sup>2+</sup>-free closed, Ca<sup>2+</sup>-bound intermediate-open and Ca<sup>2+</sup>-bound wider open states), completing the picture of conformational transitions that this protein undergoes upon activation.

      Mutation of key residues interacting with outer leaflet lipids selectively impaired scrambling in the absence of Ca<sup>2+</sup>. Residues involved in groove opening (E313-R432) were also identified and a mutation at this site (R432A) locked the nhTMEM16 scramblase in a closed-groove conformation, providing new insights into residues critical for groove opening. Furthermore, the authors tested the activity of nhTMEM16 mutants in several lipid compositions and reported striking differences, clarifying discrepancies from the authors’ prior work on nhTMEM16 using different lipid compositions and consolidating some of the observations from other TMEM16 homologs. It is noteworthy that the authors probed the effect of nanodisc size and lipid composition on nhTMEM16 conformation, providing thought-provoking insights for the membrane protein field. This approach is particularly valuable for closed-groove mutant structures, to ensure that the observed conformation is not dictated by scaffold size.

      Overall, this is a piece of carefully executed experimental work. The results are interpreted carefully in the context of the published literature, and the work provides important insight into plasma membrane lipid homeostasis. While the study does not have technical weaknesses, it could be improved in its presentation in order to make it more accessible to readers who are not experts in the TMEM16 field.

      We wish to thank the Colab editor and reviewers for their insightful comments, helpful suggestions, and appreciation of our work. We have extensively revised our manuscript to address their comments and suggestions. Below is a detailed point-by-point response to their suggestions.

      RECOMMENDATIONS

      Essential revisions:

      1. For readers not familiar with the field, some technical details might need to be explained in greater detail. For example:

      - In the section “Residues coordinating outer leaflet lipids are important in closed groove scrambling”, please indicate the method of measuring scrambling (liposome-based activity assay etc.) and refer to some of your prior work where the method is described for readers not familiar with the TMEM16 field. Additionally, it needs to be stated clearly what is considered a significant change in scrambling, as liposome assays are usually quite variable.

      We thank the reviewers for this suggestion. We edited the text to indicate the use of the well-established in vitro assay and added the relevant references (Lines 236-238).

      We illustrate the reproducibility of the experimental results by reporting in the bar charts the mean ± St. Dev of the scrambling rate constants, and by showing the values obtained from individual experiments (red dots superimposed to the bar charts). Additionally, we evaluated the statistical significance of the reported changes using Student’s t-test with Bonferroni correction. Finally, we added text discussing the limitations of our assay in lines 318-322.

      - Since prior work done by the group indicates that membrane thinning is a determinant of scrambling, and an open groove further thins the membrane to potentiate scrambling, it is not intuitive why the R432A mutant scrambles with WT-like rates in the presence of Ca<sup>2+</sup>. If this is due to the limitation of the assay (e.g. rate of NBD lipids bleaching), this should be stated more explicitly. Do the authors have insights from their structures regarding membrane thinning by R432A with/without Ca<sup>2+</sup> and how that compares to WT protein?

      We thank the reviewers for raising this important point. In the presence of Ca2+ the fluorescence decay of N-NBD-PE in nhTMEM16 vesicles occurs with kinetics that are slightly slower than those of the chemical reduction step by dithionite. Therefore, while we can resolve two exponential components, it is possible we are underestimating the scrambling rate constants α and β. However, we note that a large slowing effect would be well resolved in our experimental conditions. In contrast, in the absence of Ca2+, which is the focus of our current analyses, scrambling is much slower than the chemical step and is well resolved. Finally, we note that the triple mutant Y327A/F330A/Y439A alone has no effect on scrambling in 0.5 mM Ca2+ but induces a ~8 fold reduction in the scrambling rate constants in 0 Ca2+. When this mutant is combined with R432A, which favors the closed groove conformation, we now see in the presence of Ca2+ the same ~8-fold reduction in the scrambling rate constants. This suggests that our assay can resolve effects even in the presence of Ca2+. This is discussed in Lines 318-322.

      We only determined the structure of R432A in the presence of Ca2+, therefore we cannot evaluate how Ca2+ binding affects membrane thinning in this mutant.

      - It is difficult to follow the reasoning for the R432A+Y327A/F330A/Y439A mutant phenotype. Is the assumption that Y327A/F330A/Y439A is in the open conformation with Ca<sup>2+</sup>, and therefore adding a mutation stabilizing the closed groove impairs scrambling in presence of Ca<sup>2+</sup>?

      We have expanded the rationale for this experiment in lines 307-315.

      - What the authors believe about the lipid pathway when the groove is open should be discussed in more detail and with reference to Alvadia et al 2019.

      We thank the reviewers for this important suggestion. We now explicitly state that: “With a closed groove, thinning is less pronounced, and scrambling is slower than when the groove is open, rationalizing the Ca2+ dependence of this process (Extended Data Fig. 10d-f).” (Lines 432-434) Since the present work is focused on the mechanism of closed groove scrambling, we prefer to refrain from adding more speculations on what happens when the groove is open, especially since this topic was the focus of a paper we recently published (Falzone, Feng et al., Nat Comms, 2022).

      2. A more detailed account of the physiological significance of the findings should be presented in the Discussion to offer reader the authors’ view on the broader implications of the work. Relevant points include:

      - Do the authors believe that conformational bias in nhTMEM16 in various cryo-EM conditions may be reflective of physiological regulation? Is it likely to happen in cells in vivo?

      This is an excellent point. We do hypothesize that the various observed conformation are physiological and indeed we explicitly state “…that the 7 observed conformations represent intermediates along the transition from apo closed to Ca2+ bound open” (Lines 444-445). Beyond this, we cannot speculate on whether the environmental dependent bias on nhTMEM16 can happen in a physiological context. We imagine that subtle changes in membrane composition can affect TMEM16 function, and indeed we see quite dramatic effects of lipid composition of scrambling activity, however whether these changes are reflective of shifts in the conformational landscape of groove opening, of effects of membrane properties, or both, it remains to be seen. Gaining definitive insights into this would require extensive additional structural experiments in unbounded membranes (i.e., from reconstituted liposomes of different composition or native vesicles, cell membranes) that are outside of the scope of the present work.

      - Do the authors believe that such regulation may also apply to mammalian TMEM16 scramblases or even channels?

      We consider this is a definite possibility, and now added a sentence stating that “This raises the possibility that unbounded membranes, such as those of liposomes, might perturb less the conformational landscape of the imaged proteins.” (Lines 499-501) However, without direct evidence we prefer to avoid speculating on this fascinating topic.

      - What implications do these findings have for our understanding of lipid scrambling mechanisms by TMEM16 scramblases that work in intracellular (thinner) membranes (such as TMEM16K)?

      We agree this is an important point. We now added a sentence stating “The strong dependence of closed groove scrambling on membrane properties could provide a mode of regulation of TMEM16 activity in cellular membranes, such as the cholesterol rich plasma membrane or the thinner ER membrane.” (Lines 434-436)

      - What implications might the knowledge of residues involved in lipid scrambling of closed scramblases potentially have for medicine and therapy? Can the authors speculate as to whether the identified residues have the potential to be tackled pharmacologically and what use could this have?

      We do not know whether the residues we identified as important for closed groove scrambling could provide a pathway to pharmacological manipulation of TMEM16 scramblase activity. This is a fascinating topic, especially in light of the very poor availability of pharmacological tools to manipulate TMEM16 scramblase activity. However, at present it remains speculative and outside the scope of the present manuscript.

      More generally, what is the physiological role of lipid transport in the absence of Ca<sup>2+</sup>? Does this constitute a lipid "leak”?

      This is an excellent question. One possibility is that scramblases have a basal activity, that in cellular homeostasis is counteracted by the activity of flippases and floppases. Alternatively (or complementarily), it is possible that in the context of an unperturbed native membrane the basal activity is negligible. However, we do not have data addressing the present point and therefore our hypotheses remain limited to pure speculations, therefore we prefer to maintain the focus of the present manuscript on the mechanism of closed groove scrambling and on the potential effects that the environment can have on the interpretation cryoEM imaging experiments.

      Optional suggestions:

      1. Regarding residues involved in groove opening (E313-R432), it would be very interesting to expand the work by studying additional mutants and investigating more fully the role of E313 in DOPC:DOPG lipids, since at present only a mutation in R432 was tested experimentally in this lipid composition.

      We agree with the reviewers that expanding the analysis to other residues, such as E313, would be interesting. However, initial functional experiments suggested this mutant behaves similarly to R432A, and thus we did not think it would provide much additional mechanistically insights to what we already have.

      2. Measurements of ion transport in nhTMEM16 would also be useful to further validate the closed groove conformation of R432A. This could shed new light onto whether ion transport and lipid transport are coupled in TMEM16 proteins.

      This is an excellent suggestion, one that indeed we considered at length during this project. Ultimately, we decided not to pursue this avenue of investigations because of the limitations of the flux assay for non-specific ion channels. While flux assays can provide quantitative measures of effects for anion or cation selective channels, for non-selective channels these assays only provide very coarse yes/no answers (i.e., whether the construct mediates any channel activity or not). Since we expected these mutants might have intermediate phenotypes, rather than completely ablating channel activity, we were concerned that the experiments would be inconclusive at best or, at worst, misleading. These limitations are extensively discussed in our previous manuscripts (Lee et al., Nat Comms, 2018; Falzone and Accardi, Methods Mol Biol, 2020).

      3. Since the authors found significant differences in their new structures with previously reported, how do Ca<sup>2+</sup>-bound closed structures of nhTMEM16 in POPC/POPG (previously published) and DOPC/DOPG (obtained in this study) compare to each other?

      We thank the reviewers for this suggestion. In Lines 167-168 we now state: “The Ca2+ bound closed conformations in MSP1E3 DOPC/DOPG (PDBID: 6QMB) and MSP2N2 POPC/POPG are nearly identical (Cα r.m.s.d ~0.50 Å).”

      4. The purpose of creating composite symmetric maps from symmetry expanded monomers is questionable – if it is not possible to isolate this symmetric state by classification approaches, it is probably very transient, or not present at all. However, there are no strict guidelines, and it is acceptable as long as everything is described in MM and all the maps deposited. Are composite and monomer E3D1 apo maps deposited alongside the main map as EMD-41477?

      We agree with the reviewers that depositing the maps of the unexpanded dimers is appropriate and opportune, and indeed we did so

      i. the combined dimer map which was primarily used for model building is deposited as EMDB: 41453 and the model as PDB: 8TOI;

      ii. the local refined monomer map was deposited as EMDB: 41458

      iii. the dimer consensus map used for map combination was deposited as EMDB: 41457

      The rationale to generate a combined dimer map is that this allows for a better visualization of the protein-bilayer interface and the ensuing distortions. When viewing the map of a single monomer it is difficult to appreciate these effects.

      5. The authors show that Ca<sup>2+</sup>-dependent α6 straightening is important for closed-groove scrambling. This is directly relevant for TMEM16F, for which this is the only conformational change observed. The authors note that extracellular α4 is more mobile in R432A mutant, is this in any way similar to the conformations reported for more active TMEM16F mutants (Arndt et al., 2022)?

      What we see is that the density for the top of TM4 becomes very weak. This is quite different from what Arndt et al. reported, where they see a significant and defined movement of both TM4 and TM3. While we think many of the basic mechanisms of closed-groove scrambling we and many others are beginning to unravel are likely conserved across TMEM16 homologues, it is very likely that differences will exist between homologues. We now make this important point in Lines 432-434.

      (This is a response to peer review conducted by Biophysics Colab on version 1 of this preprint.)

    1. Authors’ response (11 February 2024)

      GENERAL ASSESSMENT

      Ionotropic glutamate receptors mediate the large majority of excitatory synaptic transmission in the brain. These receptors consist of four classes: AMPA, kainate, NMDA and delta receptors. NMDA receptors are obligate tetramers composed of two GluN1 and two GluN2 (or GluN3) subunits. Compared to other iGluRs, they have the particularity of requiring two different agonists for their channel to open: glycine binding on GluN1 and glutamate on GluN2.

      Seljeset et al. investigate the molecular determinants controlling ligand potency and NMDAR activity at the level of the ligand-binding domains (LBDs), where the agonists bind. They identify a specific position, D732, whose mutation to either leucine or phenylalanine leads to a constitutively active GluN1 subunit, and thus to NMDARs activated solely by glutamate. This aspartate is well known in the field, since it is a highly conserved, signature residue in iGluRs that binds amino acid ligands, together with an arginine in the LBD upper lobe. Surprisingly, although glycine cannot further activate GluN1-D732L/GluN2Awt receptors, glycine site antagonists like 5,7-DCKA or CGP-78608 can still bind and inhibit NMDAR activity. This study is therefore very intriguing, as it raises new questions about something that was previously thought to be understood. By using a combination of unnatural amino acids and conventional mutagenesis, the authors propose that D732 contributes to glycine-mediated effects by changing local interactions with nearby residues. In addition, they show that this behavior is specific for the GluN1 subunit, since mutation of the equivalent aspartate in the GluN2 subunit does not yield constitutively activated GluN2 subunits. Finally, the authors identify a homomeric iGluR from the placozoan Trichoplax adhaerens, Trichoplax AKDF<sup>19383</sup>, in which this conserved aspartate is replaced by a tyrosine. When expressed in Xenopus oocytes, the channel shows constitutive activity. Mutation of the tyrosine into an aspartate, to convert Trichoplax AKDF<sup>19383</sup> into a “classical” iGluR, decreases Trichoplax AKDF<sup>19383</sup> constitutive current and allows this channel to be activated by glycine and D-serine. Interestingly, an adjacent residue that is a serine in most mammalian subunits is also a tyrosine in Trichoplax AKDF<sup>19383</sup>, and mutation of both tyrosines yields a glutamate-gated ion channel comparable to mammalian receptors. All of this suggests that the nature of the residue at position 732 influences not only ligand binding but also channel gating.

      The study is technically sound, with appropriate controls, and uncovers intriguing properties of a position in GluN1 LBD at which specific side chain mutations can lock the subunit in an active state. Investigation of Trichoplast iGluR further reinforces these findings. This study should lead to a better understanding of how LBDs prime channel opening in iGluRs in the absence of agonists. In addition, co-agonist insensitive GluN1-D732L containing NMDARs could be used as tools to investigate the physiological consequences of NMDAR regulation by their co-agonist site. In contrast to previously engineered NMDARs activated solely by glutamate, which rely on the LBD being locked in its active state by cysteine bridges (Blanke and VanDongen, J Biol Chem 2008), GluN1-D37L/GluN2A NMDARs remain druggable (i.e. they can still be inhibited by glycine-site competitive antagonists). This is a great advantage when investigating the function of these receptors in a native context. The study identifies a few gaps that remain in our mechanistic understanding of D732’s role in channel gating. Particularly, it is unclear how subtle modification of residue side chains at position D732 lead to such drastic changes in function and why these effects are specific to GluN1 LBD. Also, why does mutation of D732 into isoleucine lead to a constitutively active GluN1 subunit, while mutation of a closely related leucine residue prevents activation of the receptor by glycine? The idea of a “hydrophobic plug” formed by D732L or D732F sidechains leading to constitutive activation would benefit from further validation since other hydrophobic substitutions (A, V, I, Y, and W) do not produce similar effects. Finally, it would be interesting to carry out further investigations of the role of the interaction between D732 and Q536 in open conformation stability. Thus, this paper puts forth interesting questions that could be addressed by future studies, for example molecular dynamics simulations and exploration of the LBD free energy landscapes (as in Yao et al., Structure 2013), to understand the impact of the GluN1-D732L mutation on GluN1 LBD conformational mobility.

      RECOMMENDATIONS

      Essential revisions:

      1. Page 2, “These data show that essentially all substitutions at the GluN1-732 position decrease glycine potency, but leucine and phenylalanine substitutions also remove the requirement for glycine co-agonism in GluN1/GluN2A NMDA receptors”: One other hypothesis for the lack of glycine dependence of GluN1-D732I and D732Y + GluN2A receptors could be that the mutated receptors have a glycine potency so high that GluN1 LBD is already saturated by contaminating, ambient glycine. At this point in the paper, the authors cannot distinguish between one hypothesis or the other, therefore we suggest that this sentence be rephrased. Later in the text, control experiments with GluN1-R523K mutations that kill glycine binding and competition with 5,7-DCKA show that glycine-independent activation of GluN1-D732L/GluN2A mutants is not due to constitutive occupancy of GluN1 LBD by contaminating glycine.

      ER1) We have now changed this to (page 4): “These data show that most substitutions at the GluN1-732 position decrease glycine potency, but leucine and phenylalanine substitutions alter GluN1 activity in such a way that leads to single-mutant NMDA receptors activated solely by glutamate.”

      1. Does glycine insensitivity in GluN1-D732L/GluN2A NMDARs reflect a constitutively active GluN1 subunit or is this subunit locked in another conformational state that cannot be further modified by glycine? This could be answered by estimating the maximum open probability of GluN1-D732L/GluN2A NMDARs compared to their wt counterparts. To estimate Po, the authors could measure the kinetics of NMDA receptor current inhibition by MK801 (the slower MK801 inhibition, the lower the Po; see Chen et al., J. Neurosci 1999; Blanke and VanDongen, JBC 2008) in the presence of saturating agonist concentrations (100 μM Glu, 100 μM Gly for wt and only 100 μM Glu for mutant).

      ER2) We have now assessed the rate of MK-801 block in glutamate-gated mutant and glycine + glutamate-gated WT receptors, and reshuffled text/figures, as this ties in well with ER4) below. MK-801 results now in Figure 3 on page 6, and main text on page 5: “In order to understand whether the glycine-insensitive GluN1-D732L subunit is in a constantly activated state or occupies a different conformation that may reflect an alternative to typical channel gating, we compared the kinetics of WT receptor and GluN1-D732L-containing receptor inhibition by the open-channel blocker MK-801, which can be used to evaluate maximum open probability of NMDARs <sup>26,30</sup>. We observed very similar kinetics of inhibition of WT and mutant receptors (Fig. 3A), indicating similar open probability in solely glutamate-gated GluN1-D732L-containing receptors and glutamate and glycine-gated WT receptors. This reflects unchanged maximum open probability in solely glutamate-gated NMDARs with disulfide-locked GluN1 LBDs assayed by single channel recordings <sup>27</sup>. This suggests that the GluN1-D732L subunit is in a constantly activated state.”

      When viewed alongside high sensitivity of mutant subunits to DCKA - OS1) below - it’s difficult to conclude what sort of active state the mutant subunit adopts. We’ve assessed the best we can at the moment, and in this paper we’ll have to leave it at “here is the observation; here is some evidence ruling out various possibilities; and here is a receptor from another family that shows something remarkably consistent”. Future studies will have to establish exactly what state the mutant subunit adopts.

      1. Page 4: The term “hydrophobic plug” is not fully justified since other hydrophobic residues do not lock GluN1 LBD in its active state.

      ER3) We have replaced nearly all use of this term, in the title and in the main text, to e.g. “certain hydrophobic substitutions” or “L/F substitutions”.

      1. Figure 2, redox sensitivity of GluN1-D732L/GluN2Awt: It would be helpful to explain the point of this experiment – maybe to investigate if the D732L mutation has an impact on the receptor gate rather than on the LBD? In any case, the authors should investigate the effect of DTT on the activity of wt GluN1/GluN2A receptors to determine whether there is an absence of an effect of the D732L mutant on redox sensitivity.

      ER4) Indeed we were curious if D732L affected the gate via this allosteric route, rather than by just altering LBD conformation. And we have now shown the effect of DTT on WT receptors.

      In addition to re-writing to better explain the point, as suggested, we have also re-written to follow on from new data/text on the whether the D732L mutation affects LBD, gating, etc: “We next questioned if D732L/F substitutions affect channel gating, rather than simply altering the LBD conformation. The gating machinery is complex, but it includes the peptide segment linking the C-terminal end of the LBD to membrane-spanning helix 4 (LBD-M4 linker, (11)). The LBD and LBD-M4 linker are confined by a C744—C798 disulfide, just four helical turns after D732, whose disruption by reduction enhances channel gating (28)). We considered that if the D732L/F substitution is coupled to channel gating via this route, then removal of the C744—C798 disulfide via the C744A mutation might alter glutamate-gated currents in GluN1-D732L-containing receptors. Alternatively, the typical enhancement by the reducing agent dithiothreitol (DTT) might differ in GluN1-D732L compared to WT receptors.”

      And new Figure 3 now includes DTT effects on WT receptors.

      1. Page 6: The authors find that mutation of Q536 decreases glycine potency and conclude there is an interaction between D732 and Q536. However, the effects of D732 and Q536 mutations could be independent, therefore the authors should consider mutating both residues together to look at the additive/non-additive effects of the mutations. Or perhaps, note in the Discussion that some sort of mutant cycle analysis or molecular dynamics simulation would be needed to rigorously test these ideas.

      ER5) We have now made and tested a double mutant combining D732E and Q536N and performed mutant cycle analysis.

      (We also tried to do this for Q536 side chain (regular mutations) and A734 main chain (non-canonical substitutions), but double mutants involving non-canonical amino acids at A734 were not successful – Figure S1.)

      As is now shown in Figure 4D, the effects of the mutations are decidedly non-additive, yielding an Ω value of 0.05, corresponding to a reasonably high energetic coupling of ~7 kJ/mol. We have now added to the relevant section of the Results on page 8: “If an interaction between Q536 and D732 were energetically important for receptor activation, the effects of their mutations should be non-additive <sup>31</sup>. We therefore tested glycine potency at double-mutant GluN1-Q536N/D732E-containing receptors and observed non-additive changes in EC<sub>50</sub>, with a strong coupling value, Ω, of 0.05 (Fig. 4D). This deviation of Ω from unity, corresponding to an interaction energy of 7.4 kJ/mol is relatively high <sup>31</sup>, confirming that Q536 and D732 are energetically coupled. We tried to analyse energetic coupling between Q536 and A734 via double mutants incorporating nonsense suppression at the A734 position, but unfortunately, attempts to incorporate Aah into such double mutants via nonsense suppression were unsuccessful (Fig. S1B).”

      1. Page 6, “A hydrophobic plug does not cause constitutive activity in all NMDA receptor subtypes”: This title is misleading as it raises the expectation that the effect of GluN1-D732L has been investigated in the context of GluN1/GluN2A, GluN1/GluN2B, etc NMDARs. Instead, the equivalent mutation is made in the GluN2 subunit. We suggest using the word “subunit” rather than “subtype”.

      ER6) We have changed this Results section title (page 8) to: “L/F substitutions do not cause constitutive activity in all NMDA receptor subunits”

      1. Page 7, effect of GluN1-D732L in the context of GluN1/GluN3 NMDARs: We would not expect current to be observed with GluN1-D732L/GluN3 NMDARs, since locking GluN1 LBD in its active state desensitizes the receptors. The effect of the D732L mutation seems therefore conserved between GluN1/GluN2 and GluN1/GluN3 NMDARs. In addition, when using CGP, please cite Grand et al., Nat. Commun. 2018 since they were the first to use CGP as a tool to record GluN1/GluN3 currents.

      ER7) We have now cited that paper specifically here (page 8) and inserted the following (page 8/9): “While this seems like inactivity of the mutant GluN1 subunit in GluN1(4a)/GluN3A, it could yet reflect the activity of constitutively active mutant GluN1 subunits in GluN1/GluN2A receptors, as GluN1 activity in GluN1/GluN3A receptors is known to cause more desensitization than activation (Grand et al 2018).”

      1. Figure 5C: It is stated in the text that the aspartate position is “highly” conserved. However, no actual number or percentages are given for this statement. How does it compare to the residues in the highly conserved SYTANLAAF motif or other conserved positions? This sort of analysis does not need to be done for the entire receptor, but perhaps for glycine and glutamate binding residues and SYTANLAAF motif, to give a quantitative feel for statements about conservation. In addition, what other types of residues occupy this position in other species? And what was the number of species/subunits included in the analysis?

      ER8) To clarify the level of conservation, we have added Table 1 (page 10) listing the % conservation of amino acids at selected positions.

      In analyzing % conservation, we noticed that several iGluR sequences with gaps in the ligand-binding domain or channel-forming helices had escaped our filtering out incomplete sequences in our phylogenetic analysis. We therefore revisited our phylogenetic analysis, removed several incomplete sequences, and replaced Crassostrea gigas (a mollusc spiralian) iGluR sequences with Schmidtea mediterranea (a flatworm spiralian) sequences. This (1) means less sequences with gaps in the ligand-binding domain in our alignment/tree and (2) better covers the diversity of the lineage Spiralia now that we have sequences of Lingula anatina and Schmidtea mediterranea, which are more distantly related than Lingula anatina and Crassosttrea gigas (Laumer et al 2019, PMID:31690235; Marlétaz et al., 2019, PMID:30639106).

      The result is a phylogenetic and amino acid sequence analysis of 204 iGluR genes (previous version had 212 genes) with the same overall topology as the previous version, including lambda, NDMA, epsilon, and AKDF iGluR families (Fig. 5B, page 9).

      The number of subunits/genes used is stated in the Figure legend. The number of and reasoning behind the number of species used is described under Methods, Bioinformatic analyses: in exploring the conservation of the D732 residue, we have not tried to use as many iGluR sequences as possible; rather we have tried to assess this residue in a broad sample covering all (animal) iGluR families and from a careful selection of different animal lineages, while also avoiding fast-evolving species like Drosophila, which complicate tree topology. Hence our description of “two ctenophores, one poriferan, etc” under Methods, Bioinformatic analyses. In the main text (Results, page 9), we retain our original description: “We assembled diverse iGluR sequences, covering all animal lineages and animal iGluR families (Fig. 6A,B)…”

      1. Figure 5, panel F: From what we understand, the authors created dose-response curves for wt Trichoplast AKDF<sup>193863</sup> based on steady-state currents and for Y742D/Y743S mutants based on peak currents. If this is the case, one cannot compare the two dose-response curves since peak current potentiation and steady-state inhibition likely reflect different conformational transitions.

      ER9) We acknowledge this issue and that we can’t really say that ligand-activated D742 channels bind D-serine better than ligand-deactivated Y742 channels. But we think it’s fair to point out that mutant D742 channels react (by conducting current) to micromolar ligand concentrations whereas wildtype Y742 channels react (with decreased current) only to millimolar concentrations, and we have re-written to acknowledge the issue raised for this comparison (page 11): “Finally, we tried to assess whether position 742 determines ligand potency in addition to channel activity in AKDF<sup>19383</sup> receptors. For these experiments we employed D-serine, as recovery from glycine-induced deactivation (Fig. 6C, far-left) and activation/desensitization (Fig. 6C, far-right) was very slow. Substantial deactivation of WT receptors was only induced by millimolar D-serine concentrations, whereas Y742D-containing mutants were activated by micromolar concentrations (Fig. 6D,E), with an EC<sub>50</sub> of 490 ± 120 µM at Y742D/Y743S (n = 4; Y742D EC<sub>50</sub> not assessed due to slow recovery from desensitization). Our measure of potency is confounded by the fact that deactivation (in WT channels) and activation (in mutant channels) are presumably coupled to D-serine binding via different conformational transitions. Nonetheless, we observe that a naturally occurring large hydrophobic side chain at the top of the β-strand preceding the αI helix leads to an AKDF homo-tetramer that shows constitutive activity and responds only to millimolar concentrations of D-serine. In contrast, “re-introducing” an aspartate to this position reinstates more typical ligand-dependent activation and sensitivity to micromolar concentrations of D-serine.”

      Optional suggestions:

      1. Figure 2, glycine/DCKA competition: It is difficult to understand how a GluN1 LBD-locked closed (active state) could still bind DCKA. If the open-to-close equilibrium of GluN1 LBD is displaced towards its closed state, then DCKA Ki should be shifted to the right compared to wt receptors. Additionally, DCKA inhibition kinetics should be slower if DCKA needs to “wait” for rare resting-like conformational changes to bind. Did the authors investigate DCKA potency and inhibition kinetics?

      OS1) We have now investigated DCKA potency. DCKA capably inhibits GluN1-D732L/GluN2A-WT activity, and perhaps surprisingly, potency of DCKA at the mutant is greater than at wildtype. We suspect this is due to (1) the introduction of a hydrophobic leucine residue right next to an aryl group of DCKA, increasing DCKA affinity directly, (2) the absence of glycine binding to this site, so no need for competition, and (3) potentially other mechanisms such as cooperativity between subunits. Again, establishing the precise nature of our mutant LBD conformation here is for future structural and molecular dynamics studies. But we have described the results, along with our following interpretation, (page 4): “Whether increased DCKA potency in GluN1-D732L subunits derives from the now non-competitive nature of the inhibition in mutant receptors or from the introduction of a favourable hydrophobic interaction with the dichlorobenzene moiety of the inhibitor is unclear. But the high DCKA potency would suggest that the constitutively active GluN1-D732L subunit is, unexpectedly, not due to a permanently clamshell-closed LBD in the mutant. This may reflect the fact that extent of LBD closure is poorly correlated with agonist efficacy in GluN1 subunits, in contrast to AMPA receptor GluA2 subunits <sup>21</sup>.”

      1. The authors show in many panels that GluN1/GluN2A currents desensitize (e.g. Fig.1B, 3C, 4A). In Xenopus oocytes, NMDAR currents do not normally desensitize. We fear this desensitization might stem from contamination of the NMDA current by calcium-activated chloride channels, which can be activated by high quantities of barium when large NMDAR currents are measured. To avoid this problem, we advise that NMDA currents above 2 µA are avoided.

      OS2) We have moved forward presuming that potential changes in current amplitude due to a small chloride flux doesn’t affect our measures of potency or ligand-selectivity. But in our new experiments, we’ve especially tried to avoid large currents.

      1. Page 5, investigation of D732 state-dependent interactions: Mutation of residues near D732 to unnatural amino acids to replace the peptidic NH do not bring much information about the mechanisms of D732 action. The fact that the 734Aah and 735Vah cannot mimic the effect of the D732L mutation could be due to many factors, including the fact that changing the peptide bond probably changes the local structure of the LBD. Perhaps mention this in the discussion.

      OS3) We have now acknowledged this possibility in the Results, right after we describe the decrease in glycine potency caused by the 734Aah mutation (page 7): “Although this may be due to local conformational changes due to altered main chain structure,…”

      1. It is intriguing that the D732L mutation locks an active conformation of the GluN1 subunit but not the GluN2 subunit, suggesting two different mechanisms of LBD closure by glutamate and glycine. It would be interesting to look at the effect of the equivalent mutation on the GluN3 subunit to investigate if this locking effect is specific to glycine-binding LBDs or just to the GluN1 subunit.

      OS4) We have now made and tested mutant GluN3A subunits D485L and D485F. Simply decreases glycine activity altogether (reflecting the effects of the mutations in GluN2A). Described on page 9: “Similarly, at oocytes injected with GluN1(4a)-WT and GluN3A-D845L or -D845F mRNAs, we saw no response to glycine alone or glycine in the presence of CGP 78608 (Fig 5D). Together, these results indicate that the induction of a constitutively active state by the D732L/F substitution is an exclusive feature of the GluN1 subunit, and the only conserved feature of the mutation in different subunits is a decrease in agonist potency.”

      1. Page 9: Discussing the position of residue side chains from structures with 4 Å resolution does not seem relevant and would benefit from a caveat.

      OS5) We want to retain our comparison of experiments with available structural data, so we have kept this but re-written to more openly acknowledge the caveat (page 12): “Indeed, in a cryo-electron microscopy (cryo-EM) study of GluN1/GluN2B receptors, D732 has only swung toward the ligand and away from A734 in a second of two putative pre-gating step structural models, although this is speculative considering the poor resolution of D732 side chains in those cryo-EM maps (12).”

      1. Page 10: We don’t understand the point that the authors want to make with the activation of Aplysia californica. Please clarify.

      OS6) He we were trying to say that “not much is required to change NMDARs from requisite co-agonism to single-ligand agonism”, either (a) in the lab via the D732L mutation or (b) naturally, as invertebrate NMDA receptors apparently show single-ligand agonism (results on invertebrate NDMARs in the literature). Further, we want to say that “by extension, we wonder if (c) in certain physiological situations, vertebrate NMDARs might indeed need only a single ligand.” We acknowledge this was unclear and – although it’s still speculative – we have now changed to (page 13): “Our work shows that only small changes in the GluN1 LBD are required for solely glutamate-gated currents in vertebrate GluN1/GluN2 receptors, and previous work suggests that invertebrate Drosophila melanogaster and Aplysia californica GluN1/GluN2 receptors can be activated by single ligands <sup>50,51</sup>. This suggests that NMDA receptors’ requirement of co-agonism is easily alleviated by certain mutations or conditions. As iGluR-modulatory proteins vary across cell types or even across neuronal compartments <sup>52,53</sup> and NMDA receptor sequence varies across animals, it is foreseeable that in certain physiological settings, certain NMDA receptors might be activated by glutamate alone. But in most settings, certainly in vertebrates, it seems that glutamate-induced activation of NMDA receptors relies on a system of ambient glycine or D-serine <sup>54,55</sup>.”

      1. In iGluRs, constitutive currents are often induced by mutations in the gate region, near the SYTANLAAF motif (e.g. lurcher mutations). Does the sequence around the gate of Trichoplast AKDF<sup>193863</sup> predict channel constitutive activity?

      OS7) Our results with WT, single mutant Y742D, and double mutant Y742D/Y743S Trichoplax AKDF<sup>19383</sup> receptors already show convincing evidence that the constitutive activity is via the Y742 and Y743 position: the tyrosine residues are unique to this leaky channel, and their mutation to more typical residues removes the leak current (Fig. 7B, page 11, revised manuscript).

      But a look at upper M3 is warranted. As shown in Fig. 6C, AKDF<sup>19383</sup> (YTANMAAFL) is quite similar to typical iGluRs (e.g. GluA2 YTANLAAFL). But one might ask about the single M/L difference in that motif, and we have therefore made and tested the M657L AKDF<sup>19383</sup> mutant, comparing it with WT. Results show that this small M3 difference has little effect on channel activity. We have added this data in new Figure 7D and described it (page 11): “As channel activity of iGluRs also relies on the upper segment of the third membrane-spanning helix (M3, (34)), we also examined this segment in AKDF<sup>19383</sup>. AKDF<sup>19383</sup> differs only subtly from most iGluRs with a methionine residue (M657) instead of leucine here (Fig. 6C), but we tested potential effects of this difference by mutating M657 to leucine. M657L activity was much like WT (Fig. 7D), however, confirming that divergence at Y742/Y743 and not the upper M3 segment determines the unique activity of AKDF<sup>19383</sup>.”

      1. D-serine is another co-agonist that binds the GluN1 subunit. Compared to glycine, D-serine can make additional interactions with the lower lobe of GluN1 LBD. It would be interesting to look at D-serine dose-response curves in GluN1-D732L/GluN2A receptors: are these receptors also D-serine insensitive or can they be further activated by D-serine?

      OS8) We have now measured the effects of D-serine on GluN1-D732L/GluN2A-WT receptors. As we now show in Figure 1B (green symbols), D-serine at increasing concentrations (100 nM through 100 μM) activates no additional current on top of the glutamate-gated current in mutant receptors. We have added to the end of the first Results paragraph (page 3): “Similarly, large currents were activated in mutant GluN1-D732L/GluN2A-WT receptors when 100 nM through 100 μM D-Serine was applied the presence of 100 µM glutamate (green in Fig. 1B).”

      (This is a response to peer review conducted by Biophysics Colab on version 1 of this preprint.)

    1. Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors compared four types of hiPSCs and four types of hESCs at the proteome level to elucidate the differences between hiPSCs and hESCs. Semi-quantitative calculations of protein copy numbers revealed increased protein content in iPSCs. Particularly in iPSCs, proteins related to mitochondrial and cytoplasmic were suggested to reflect the state of the original differentiated cells to some extent. However, the most important result of this study is the calculation of the protein copy numbers per cell, and the validity of this result is problematic. In addition, several experiments need to be improved, such as using cells of different genders (iPSC: female, ESC: male) in mitochondrial metabolism experiments.

      Strengths:

      The focus on the number of copies of proteins is exciting and appreciated if the estimated calculation result is correct and biologically reproducible.

      Weaknesses:

      The proteome results in this study were likely obtained by simply looking at differences between clones, and the proteome data need to be validated. First, there were only a few clones for comparison, and the gender and number of cells did not match between ESCs and iPSCs. Second, no data show the accuracy of the protein copy number per cell obtained by the proteome data.

      We agree with the reviewer in their assessment that more independent stem cell clones and an equal gender balance would be preferable. We will mention these considerations as limitations of our study and encourage a larger-scale follow-up.

      Regarding the estimated copy numbers, we would like to highlight that they have been extensively in the field, with direct validation of the differences in copy numbers with orthogonal methods like FACS2-4,7,10. Furthermore, the original paper directly compared the copy numbers estimated using the “proteomic ruler” to spike-in protein epitope signature tags and found remarkable concordance. This was performed with a much older generation mass spectrometer with reduced peptide coverage, and the author predicted that higher coverage would increase the quantitative performance.

      Reviewer #2 (Public Review):

      Summary:

      Pluripotent stem cells are powerful tools for understanding development, differentiation, and disease modeling. The capacity of stem cells to differentiate into various cell types holds great promise for therapeutic applications. However, ethical concerns restrict the use of human embryonic stem cells (hESCs). Consequently, induced human pluripotent stem cells (ihPSCs) offer an attractive alternative for modeling rare diseases, drug screening, and regenerative medicine.

      A comprehensive understanding of ihPSCs is crucial to establish their similarities and differences compared to hESCs.

      This work demonstrates systematic differences in the reprogramming of nuclear and non-nuclear proteomes in ihPSCs.

      We thank the reviewer for the positive assessment.

      Strengths:

      The authors employed quantitative mass spectrometry to compare protein expression differences between independently derived ihPSC and hESC cell lines. Qualitatively, protein expression profiles in ihPSC and hESC were found to be very similar. However, when comparing protein concentration at a cellular level, it became evident that ihPSCs express higher levels of proteins in the cytoplasm, mitochondria, and plasma membrane, while the expression of nuclear proteins is similar between ihPSCs and hESCs. A higher expression of proteins in ihPSCs was verified by an independent approach, and flow cytometry confirmed that ihPSCs had larger cell sizes than hESCs. The differences in protein expression were reflected in functional distinctions. For instance, the higher expression of mitochondrial metabolic enzymes, glutamine transporters, and lipid biosynthesis enzymes in ihPSCs was associated with enhanced mitochondrial potential, increased ability to uptake glutamine, and increased ability to form lipid droplets.

      Weaknesses:

      While this finding is intriguing and interesting, the study falls short of explaining the mechanistic reasons for the observed quantitative proteome differences. It remains unclear whether the increased expression of proteins in ihPSCs is due to enhanced transcription of the genes encoding this group of proteins or due to other reasons, for example, differences in mRNA translation efficiency. Another unresolved question pertains to how the cell type origin influences ihPSC proteomes. For instance, whether ihPSCs derived from fibroblasts, lymphocytes, and other cell types all exhibit differences in their cell size and increased expression of cytoplasmic and mitochondrial proteins. Analyzing ihPSCs derived from different cell types and by different investigators would be necessary to address these questions.

      We agree with the Reviewer that our study does not provide a mechanistic reason for the quantitative differences between the two cell types. However, we will include an expanded section in the discussion where we discuss the potential causes.<br /> We also agree studying hiPSCs reprogrammed from different cell types, such as blood lymphocytes, would be of great interest and will include a section about this within the discussion to encourage further research into the area.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Brenes and colleagues carried out proteomic analysis of several human induced pluripotent (hiPSC) and human embryonic stem cell (hESC) lines. The authors found quantitative differences in the expression of several groups of cytoplasmic and mitochondrial proteins. Overall, hiPSC expressed higher levels of proteins such as glutamine transporters, mitochondrial metabolism proteins, and proteins related to lipid synthesis. Based on the protein expression differences, the authors propose that hiPSC lines differ from hESC in their growth and metabolism.

      Strengths:

      The number of generated hiPSC and hESC lines continues to grow, but potential differences between hiPSC and hESC lines remain to be quantified and explained. This study is a promising step forward in understanding of the differences between different hiPSC and hESC lines.

      Weaknesses:

      It is unclear whether changes in protein levels relate to any phenotypic features of cell lines used. For example, the authors highlight that increased protein expression in hiPSC lines is consistent with the requirement to sustain high growth rates, but there is no data to demonstrate whether hiPSC lines used indeed have higher growth rates.

      We respectfully disagree with the reviewer on this point. Our data shows that hESCs and hiPSCs show significant differences in protein mass and cell size, validated by the EZQ assay and FACS, while having no significant differences in their cell cycle profiles. Thus increased size and protein content would require higher growth rates to sustain the increased mass, which is what we show.

      The authors claim that the cell cycle of the lines is unchanged. However, no details of the method for assessing the cell cycle were included so it is difficult to appreciate if this assessment was appropriately carried out and controlled for.<br /> We apologise for this omission; the details will be included in the revised version of the document.

      Details and characterisation of iPSC and ESC lines used in this study were overall lacking. The lines used are merely listed in methods, but no references are included for published lines, how lines were obtained, what passage they were used at, their karyotype status, etc. For details of basic characterisation, the authors should refer to the ISSC Standards for the use of human stem cells in research. In particular, the authors should consider whether any of the changes they see may be attributed to copy number variants in different lines.

      We agree with the reviewer on this. The hiPSC lines were generated by the HipSci consortium in the Wellcome Sanger Centre as described in the flagship HipSci paper13. We cite the flagship paper which specifies in great detail the reprogramming protocols and quality control measures, including looking at copy number variations13. However, we agree that we did not make this information easily accessible for readers. We also believe it is relevant to also explicitly include this information on our manuscript instead of expecting readers to look at the flagship paper. These details will be added to the revised version.

      The expression data for markers of undifferentiated state in Figure 1a would ideally be shown by immunocytochemistry or flow cytometry as it is impossible to tell whether cultures are heterogeneous for marker expression.

      We agree with the reviewer on this. FACS is indeed much more quantitative and a better method to study heterogeneity. However, we did not have protocols to study these markers using FACS.

      TEM analysis should ideally be quantified.

      We agree with the reviewer that it would be nice to have a quantitative measure.

      All figure legends should explicitly state what graphs are representing (e.g. average/mean; how many replicates (biological or technical), which lines)? Some data is included in Methods (e.g. glutamine uptake), but not for all of the data (e.g. TEM).

      We agree with the reviewer completely. These points will be remediated in the revised version of the manuscript.

      Validation experiments were performed typically on one or two cell lines, but the lines used were not consistent (e.g. wibj_2 versus H1 for respirometry and wibj_2, oaqd_3 versus SA121 and SA181 for glutamine uptake). Can the authors explain how the lines were chosen?

      We will include these details within the updated manuscript.

      The authors should acknowledge the need for further functional validation of the results related to immunosuppressive proteins.

      We agree with the reviewer and will add a clear sentence in the discussion making this point explicitly.

      Differences in H1 histone abundance were highlighted. Can the authors speculate as to the meaning of these differences?

      Regarding H1 histones, our study of the literature as well as interaction with chromatin and histone experts both within our institute and externally have not shed light into what the differences could imply. We think this is an interesting result that merits further study, but we don’t have a clear hypothesis on the consequences.

      In summary, we thank the reviewers for their comments and will prepare a revised version that addresses their suggestions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Assessment:

      The manuscript titled 'Rab7 dependent regulation of goblet cell protein CLCA1 modulates gastrointestinal 1 homeostasis' by Gaur et al discusses the role of Rab7 in the development of ulcerative colitis by regulating the lysosomal degradation of Clca1, a mucin protease. The manuscript presents interesting data and provides a potential molecular mechanism for the pathological alterations observed in ulcerative colitis. Gaur et al demonstrate that Rab7 levels are lowered in UC and CD. However, a similar analysis of Rab7 levels in ulcerative colitis (UC) and Crohn's disease (CD) patient samples was conducted recently (Du et al, Dev Cell, 2020) which showed that Rab7 levels are found to be elevated under these conditions. While Gaur et al have briefly mentioned Du et al's paper in passing in the discussion, they need to discuss these contradictory results in their paper and clarify these differences. Additionally, Du et al are not included in the list of references.

      Strengths:

      The manuscript used a multi-pronged approach and compares patient samples, mouse models of DSS, and protocols that allow differentiation of goblet cells. They also use a nanogel-based delivery system for siRNAs, which is ideal for the knockdown of specific genes in the gut.

      Weaknesses:

      (1) Du et al, Dev Cell 2020 (https://doi.org/10.1016/j.devcel.2020.03.002) have previously shown that Rab7 levels are elevated in a similar set of colonic samples (age group, number etc.) from UC and CD patients. Gaur et al have not discussed this paper or its findings in detail, which directly contradicts their results. Clarification regarding this should be provided.

      We thank and appreciate the reviewer for bringing this point.

      The results shown by Du et al, Dev Cell, 2020 depict elevated expression of Rab7 in UC and CD patients compared to controls. In first occurrence, these results appear contradictory, but there may be a few possible explanations for this.

      Firstly, Rab7 expression levels may fluctuate in the tissue depending on the degree of the gut inflammation. This can be concluded from our observations in DSS-mice dynamics model and the human patient samples with mild and moderate UC. Furthermore, Du et al provide no information of the severity of the condition among the patients employed in the study. Our motive, in the current work, was to emphasize this aspect. This point was mentioned in the discussion section of the manuscript. However, in view of the reviewer’s concern, we have now added a detailed comment on this in the main text of the revised version of the manuscript.

      Secondly, the control biopsies in our investigation were acquired from non-IBD patients, and not what was done by Du et al., wherein biopsies from the normal para-carcinoma region of the colorectal cancer patients were used. One cannot overlook the fact that physiological and molecular changes are apparent even in non-inflamed regions in the gut of an IBD or CRC patient. It is possible that the observed discrepancy arises due to the differences in the sample type used for comparing the Rab7 expression.

      Finally, the main sub-tissue region showing a decrease in Rab7 expression in UC samples, appeared to be the Goblet cells which was not covered by Du et al.

      Keeping these points in mind we do not think that there is a contradiction in our findings with that of Du et al., 2020. In the revised submission some of these explanations are incorporated (Lines 106-109).

      This was an oversight from our side. We have actually mentioned Du et al., 2020 in the discussion (line number 345) but somehow the reference was missing in the main list. We have ensured that the reference is included in the revised version and that their findings are included both in main text and in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors report a role for the well-studied GTPase Rab7 in gut homeostasis. The study combines cell culture experiments with mouse models and human ulcerative colitis patient tissues to propose a model where, Rab7 by delivering a key mucous component CLCA1 to lysosomes, regulates its secretion in the goblet cells. This is important for the maintenance of mucous permeability and gut microbiota composition. In the absence of Rab7, CLCA1 protein levels are higher in tissues as well as the mucus layer, corroborating with the anticorrelation of Rab7 (reduced) and CLCA1 (increased) from ulcerative colitis patients. The authors conclude that Rab7 maintains CLCA1 level by controlling its lysosomal degradation, thereby playing a vital role in mucous composition, colon integrity, and gut homeostasis.

      Strengths:

      The biggest strength of this manuscript is the combination of cell culture, mouse model, and human tissues. The experiments are largely well done and, in most cases, the results support their conclusions. The authors go to substantial lengths to find a link, such as alteration in microbiota, or mucus proteomics.

      Weaknesses:

      (1) There are also some weaknesses that need to be addressed. The association of Rab7 with UC in both mice and humans is clear, however, claims on the underlying mechanisms are less clear. Does Rab7 regulate specifically CLCA1 delivery to lysosomes, or is it an outcome of a generic trafficking defect?

      We thank the reviewer for the insightful comment. We would like to bring forth the following explanation for each these concerns:

      Our immunofluorescence imaging experiments revealed co-localization of Rab7 protein with CLCA1 and the lysosomes (Fig 7I). In addition, the absence of Rab7 affects the transport of CLCA1 to lysosomes (Fig 7J). This demonstrates that Rab7 may be involved in regulation of CLCA1 transport (presumably along with other cargo), to lysosomes selectively. However, we do recognize that the point raised by the reviewer about possible effect of a generic trafficking defect is valid.

      (2) CLCA1 is a secretory protein, how does it get routed to lysosomes, i.e., through Golgi-derived vesicles, or by endocytosis of mucous components? Mechanistic details on how CLCA1 is routed to lysosomes will add substantial value.

      As mentioned in the manuscript, the trafficking of CLCA1 protein or CLCA1-containing vesicles within the goblet cell is unknown, with no information on the proteins involved in its mobility. The switching of CLCA1 containing vesicles from the secretory route to lysosomes needs extensive investigation involving overall trafficking of the protein. Taken together, the complete answer to both these important questions will need a series of experiments and those may be interesting avenues for future research.

      (3) Why does the level of Rab7 fluctuate during DSS treatment (Fig 1B)?

      This is a very thoughtful point from the reviewer. We detected a distinct pattern of Rab7 expression fluctuation in intestinal epithelial cells after DSS-dynamics treatment in mice. Perhaps, these changes are the result of complex cellular signaling in response to the DSS treatment. Rab7, being a fundamental protein involved in protein sorting pathway, is expected to undergo alteration based on cells requirement. Presently there are no reports suggesting the regulatory mechanisms that govern Rab7 levels in the gut.

      (4) Does the reduction seen in Rab7 levels (by WB) also reflect in reduced Rab7 endosome numbers?

      We observed reduction in Rab7 expression both at RNA and protein levels. To confirm whether this alteration will lead to reduced Rab7 positive endosome numbers may require detailed investigations.

      (5) Are other late endosomal (and lysosomal) populations also reduced upon DSS treatment and UC? Is there a general defect in lysosomal function?

      There are no direct evidences showing reduction in the late endosomal and lysosomal population during gut inflammation, but few studies link lysosomal dysfunction with risk for colitis (doi: 10.1016/j.immuni.2016.05.007).

      (6) The evidence for lysosomal delivery of CLCA1 (Fig 7 I, J) is weak. Although used sometimes in combination with antibodies, lysotracker red is not well compatible with permeabilization and immunofluorescence staining. The authors can substantiate this result further using lysosomal antibodies such as Lamp1 and Lamp2. For Fig 7J, it will be good to see a reduction in Rab7 levels upon KD in the same cell.

      We used Lysotracker red in live cells followed by fixation. So, permeabilization issues were resolved. Lamp1, as suggested by the reviewer, is definitely a better marker for lysosomes in immunofluorescence studies, but is also shown to mark late endosomes (doi: 10.1083/jcb.132.4.565). As Rab7 protein also marks the late endosomes, using Lamp1 may leave the ambiguity of CLCA1 in Rab7 positive late endosomes versus lysosomes. Nevertheless, we have carried out this experiment, as suggested by the reviewer, by staining the cells with LAMP1 (author response image 1). As demonstrated in our previous data, the colocalization of CLCA1 with LAMP1 positive vesicles decreased upon Rab7 knockdown. Also, we observed a decrease in the intensity of LAMP1 staining in cells with Rab7 knockdown. Additionally, we noted a reduction in the LAMP1 staining intensity in cells where Rab7 was knocked down. This observation can be attributed to the decrease in the presence of Rab7-positive vesicles or late endosomes which also exhibit LAMP1 staining.

      Author response image 1.

      (A) Representative confocal images of HT29-MTX-E12 cells transfected with either scrambled siRNA (control) or Rab7 siRNA (Rab7Knockdown). Cells are stained with CLCA1 (green) using antiCLCA1 antibody and lysosomes with LAMP1. (B) Graph shows quantitation of colocalization between CLCA1 and LAMP1 from images (n=20) using Mander’s overlap coefficient. Inset shows zoomed areas of the image with colocalization puncta (yellow) marked with arrows.

      (7) In this connection, Fig S3D is somewhat confusing. While it is clear that the pattern of Muc2 in WT and Rab7-/- cells are different, how this corroborates with the in vivo data on alterations in mucus layer permeability -- as claimed -- is not clear.

      The data in Fig. S3D suggest the involvement of Rab7 in packaging of Muc2. The whole idea for doing this experiment was to support our observation in the Rab7KD-mice model where mucus layer was seen to be loose and more permeable in Rab7 deficient mice.

      (8) Overall, the work shows a role for a well-studied GTPase, Rab7, in gut homeostasis. This is an important finding and could provide scope and testable hypotheses for future studies aimed at understanding in detail the mechanisms involved.

      We thank the reviewer for this comment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific questions to the authors:

      (1) Why is the dotted line in Fig. 1c at -7.5? What does this signify?

      Response: The dotted line was intended to represent the baseline; in the revised manuscript it is corrected and placed at y=0.

      (2) Du et al should be cited. Fig 6 K-Q from Du et al should be discussed and reasons for contradictory findings should be given in greater detail, rather than a single sentence in the discussion.

      Response: The reference for Du et al is included in the list and the possible reasons the findings of the current work are discussed in the main text (Line 106-109).

      (3) Fig1. Why are Rab7 levels low even in remission patient samples? Can DSS be withdrawn to induce remission followed by analysis of colonic samples?

      Response: A possible explanation for this observation could be that the restoration of Rab7 levels may not immediately follow the resolution of clinical symptoms in remission patients. After the remission initiation, the normalization of cellular processes, including the regulation of Rab7 expression, might exhibit a time lag. A thorough investigation of Rab7 levels and the allied pathways at different time points during the remission phase could provide deeper insights into the gradual dynamics of recovery. As suggested by the reviewer, DSS withdrawal induced recovery model can be utilized for understanding the same and could be a good approach for future investigations.

      (4) Fig. 2: Single-channel fluorescence should be shown.

      Response: The single channel fluorescence images are incorporated in Fig. S2.

      (5) Line 456 should be modified. 'Blind pathologist' does not read well!

      Response: The line has been modified with ‘Blinded pathologist’.

      (6) Other inflammatory markers, cytokine levels should be looked at in addition to TNF alpha.

      Response: TNF-α is a crucial mediator in intestinal inflammation, actively contributing to the development of IBD. Elevated levels of TNF-α are observed in patients of IBD (Billmeier U. et al, World J Gastroenterol. 2016). In the current work, while probing for TNF-α our primary objective was to examine this significant indicator of colitis following Rab7 knockdown in mice, aiming to gain insights into heightened gut inflammation.

      (7) Quantitation of S3D should be provided.

      Response: The dispersed expression of Muc2 was observed in n=20 cells per sample and it was a qualitative observation. The aim was to identify any changes in Muc2 packaging under Rab7 knockout conditions.

      (8) Microbiota analysis should include Rab7KD+DSS mice.

      Response: We understand the importance of this point, however, in the current work our primary objective was to specifically investigate changes in microbial diversity and abundance in Rab7KD mice compared to both DSS+CScr and CScr mice. Rab7KD+DSS mice is expected to show higher dysbiosis in comparison to DSS+CScr.

      (9) Fig 6 H and I, G. How do Clca1 levels reduce in Rab7kd +DSS relative to Scr+DSS while they are higher in Rab7kd compared to Scr. Comment.

      Response: The decreased expression of CLCA1 in the mucus of DSS+Rab7KD mice can be attributed to a consequence of significant reduction in goblet cell numbers in these mice, as evidenced by the observed loss of these cells (Fig.S3 B and Fig. S3C). CLCA1 is exclusively secreted by goblet cells, so a decline in their numbers directly affects CLCA1 levels.

      (10) How are Rab7 levels downregulated? What is the predicted mechanism?

      Response: While our current study didn't explore this aspect, it's worth noting that Rab7 protein levels undergo regulation through various mechanisms, including post-translational modifications such as Ubiquitination and SUMOylation. These modifications are known to regulate Rab7 stability, transport and recycling. Specific experiments conducted during this study (work not included in the manuscript) indicated the participation of SENP7, a deSUMOylase, in controlling the stability of Rab7 protein, particularly in the context of colitis. Additionally, goblet cell specific mechanisms are also likely to be controlling the Rab7 in the gut.

      (11) What is the explanation for opposite changes in CLCa1 RNA (down) and protein (up).

      Response: The reduction in CLCA1 at the RNA level could be associated with the decrease in goblet cell numbers during colitis. Our investigation indicates that Rab7 predominantly influences CLCA1 at the protein level by impacting its degradation pathway. It is important to acknowledge that not all the alterations in CLCA1 observed during colitis can be solely attributed to Rab7, but our study has identified a connection between Rab7 and CLCA1.

      (12) In light of Du et al, it would be interesting to see how the number of peroxisomes changes upon alteration of Rab7 levels.

      Response: The suggestion by the reviewer is noteworthy. Since, being an altogether different domain, it deviates from the primary objectives of current work. Here, our goal was specifically on exploring the role of Rab7 in goblet cell functioning. Thus is an attractive theme for future investigations.

      (13) While Gaur et al suggest in their discussion that Du et al may have observed an upregulation in Rab7 levels in different cell types of the intestine, this is not apparent from the data provided. Tissue sections should be carefully analysed to provide data supporting this observation. Differences in reagents used (antibodies) should also be considered. As far as the human patient data is concerned, it does not appear that the sample stages are very different across the two manuscripts (based on age, inclusion criteria etc.).

      Response: This has been explained in detail in our public comments.

      Reviewer #2 (Recommendations For The Authors):

      (1) In general, image-based measurements could be done better (for example, object-based statistics than pixel-based overlaps) and represented differently. It is difficult to appreciate the reduction in Rab7 levels in goblet cells in Fig 2 A, C. It might be good to show the channels separately, and perhaps use an intensity gradient LUT for the Rab7 channel.

      Response: The single channel fluorescence images are incorporated in Fig. S2.

      (2) The EM images, and particularly Fig 2F are not convincing, with an oddly square-shaped vesicle. I'm not sure what value they are adding to the interpretation.

      Response: The observed square-shaped vesicle in Fig. 2F could be attributed to the dynamic nature of vesicles within a cell. This dynamicity allows them to adopt various shapes depending on their state and function within the cell. The presence of Rab7 near vacuoles of goblet cells signify its probable involvement in the regulation of secretory function of these cells which is the key aspect being covered in this work.

      (3) A general method question concerns the definition of the distal colon. How is this decided, particularly when colon lengths are reduced upon DSS treatment?

      Response: The murine colon is divided into proximal and distal colon of mouse and has a visual difference of inner folds which are quite prominent in proximal colon. Additionally, the portion towards the rectum (predominantly distal colon) was majorly utilized for the experiments. In each case the various experimental groups were matched for the respective areas.

      (4) The use of an in vivo intestine-specific Rab7 silencing model is good. Why does Rab7 KD itself not capitulate aspects of DSS treatment, rather it seems to exacerbate it.

      Response: Our objective was to determine whether the downregulation of Rab7 during colitis was the cause or consequence of gut inflammation. Interestingly, our investigation using the murine Rab7 knockdown model revealed that the reduction of Rab7 expression in the intestine exacerbates inflammation. Subsequent analysis demonstrated that the absence of Rab7 disrupts goblet cell secretory function, consequently contributing to heightened inflammation. Our findings overall suggest that Rab7 downregulation is not merely a consequence but plays a contributory role in aggravating inflammation in the context of colitis.

      (5) The axes labels in Fig 5 are not readable. It is unclear how Rab7 KD is more similar in gut microbiota phenotypes to DSS than to CScr.

      Response: The microbial analysis revealed an abnormal composition of gut microbiota in Rab7KD mice compared to CScr. Interestingly, this composition exhibited some similarity to the inflamed gut microbiota observed in DSSScr mice. The analysis further demonstrated a shift in microbial diversity in Rab7KD mice, showcasing characteristics akin to those observed in inflamed mice. This similarity in gut microbiota phenotypes between Rab7KD and DSSScr suggests a potential link or influence of Rab7 downregulation on the microbiota, contributing to the observed similarities with DSS-induced inflammation.

      (6) The use of mucous proteomics to identify mechanisms of Rab7-mediated phenotype is a good approach. The replicates in the proteomics dataset (Fig 6F) do not seem to match. Detailing of methodology used for analysis will help to overcome these doubts.

      Response: The identified proteins in different samples of mucus proteomics were subjected to label free quantification. Subsequently, the significantly altered proteins were subjected to analysis with the False Discovery Rate (FDR) to control for potential false positives and ascertain the validity of the findings.

      (7) It will be good to see the immunoblots showing the negative correlation between Rab7 and CLCL1 in Fig 7D.

      Response: Fig. 7C shows western blot for protein expression of CLCA1of the same control and UC samples which were used in Fig. 1F to show Rab7 expression. Fig. 7D is the quantitative correlation plot for Fig. 1F (Rab7 expression) and Fig. 7C (CLCA1 expression).

      (8) Why is UC different from the DSS model for Rab7 gene expression but not protein levels? Endosomal counts could help address this.

      Response: We encountered challenges in accurately counting the individual puncta of Rab7 expression in immunofluorescence images due to the nature of tissue samples. Locating endosomes within a single cell proved to be challenging, and the proximity of many puncta made it difficult to delineate them individually. Despite these technical difficulties, the intriguing prospect of correlating Rab7 expression with endosomal counts remains a compelling aspect that may well be area for future investigations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study uses a multi-pronged empirical and theoretical approach to advance our understanding of how differences in learning relate to differences in the ways that male versus female animals cope with urban environments, and more generally how reversal learning may benefit animals in urban habitats. The work makes an important contribution and parts of the data and analyses are solid, although several of the main claims are only partially supported or overstated and require additional support.

      Public Reviews:

      We thank the Editor and both Reviewers for their time and for their constructive evaluation of our manuscript. We worked to address each comment and suggestion offered by the Reviewers in our revision—please see our point-by-point responses below.

      Reviewer #1 (Public Review):

      Summary:

      In this highly ambitious paper, Breen and Deffner used a multi-pronged approach to generate novel insights on how differences between male and female birds in their learning strategies might relate to patterns of invasion and spread into new geographic and urban areas.

      The empirical results, drawn from data available in online archives, showed that while males and females are similar in their initial efficiency of learning a standard color-food association (e.g., color X = food; color Y = no food) scenario when the associations are switched (now, color Y = food, X= no food), males are more efficient than females at adjusting to the new situation (i.e., faster at 'reversal learning'). Clearly, if animals live in an unstable world, where associations between cues (e.g., color) and what is good versus bad might change unpredictably, it is important to be good at reversal learning. In these grackles, males tend to disperse into new areas before females. It is thus fascinating that males appear to be better than females at reversal learning. Importantly, to gain a better understanding of underlying learning mechanisms, the authors use a Bayesian learning model to assess the relative role of two mechanisms (each governed by a single parameter) that might contribute to differences in learning. They find that what they term 'risk sensitive' learning is the key to explaining the differences in reversal learning. Males tend to exhibit higher risk sensitivity which explains their faster reversal learning. The authors then tested the validity of their empirical results by running agent-based simulations where 10,000 computersimulated 'birds' were asked to make feeding choices using the learning parameters estimated from real birds. Perhaps not surprisingly, the computer birds exhibited learning patterns that were strikingly similar to the real birds. Finally, the authors ran evolutionary algorithms that simulate evolution by natural selection where the key traits that can evolve are the two learning parameters. They find that under conditions that might be common in urban environments, high-risk sensitivity is indeed favored.

      Strengths:

      The paper addresses a critically important issue in the modern world. Clearly, some organisms (some species, some individuals) are adjusting well and thriving in the modern, human-altered world, while others are doing poorly. Understanding how organisms cope with human-induced environmental change, and why some are particularly good at adjusting to change is thus an important question.

      The comparison of male versus female reversal learning across three populations that differ in years since they were first invaded by grackles is one of few, perhaps the first in any species, to address this important issue experimentally.

      Using a combination of experimental results, statistical simulations, and evolutionary modeling is a powerful method for elucidating novel insights.

      Thank you—we are delighted to receive this positive feedback, especially regarding the inferential power of our analytical approach.

      Weaknesses:

      The match between the broader conceptual background involving range expansion, urbanization, and sex-biased dispersal and learning, and the actual comparison of three urban populations along a range expansion gradient was somewhat confusing. The fact that three populations were compared along a range expansion gradient implies an expectation that they might differ because they are at very different points in a range expansion. Indeed, the predicted differences between males and females are largely couched in terms of population differences based on their 'location' along the rangeexpansion gradient. However, the fact that they are all urban areas suggests that one might not expect the populations to differ. In addition, the evolutionary model suggests that all animals, male or female, living in urban environments (that the authors suggest are stable but unpredictable) should exhibit high-risk sensitivity. Given that all grackles, male and female, in all populations, are both living in urban environments and likely come from an urban background, should males and females differ in their learning behavior? Clarification would be useful.

      Thank you for highlighting a gap in clarity in our conceptual framework. To answer the Reviewer’s question—yes, even with this shared urban ‘history’, it seems plausible that males and females could differ in their learning. For example, irrespective of population membership, such sex differences could come about via differential reliance on learning strategies mediated by an interaction between grackles’ polygynous mating system and malebiased dispersal system, as we discuss in L254–265 (now L295–306). Population membership might, in turn, differentially moderate the magnitude of any such sex-effect since an edge population, even though urban, could still pose novel challenges—for example, by requiring grackles to learn novel daily temporal foraging patterns such as when and where garbage is collected (grackles appear to track this food resource: Rodrigo et al. 2021 [DOI: 10.1101/2021.06.14.448443]). We now introduce this important conceptual information— please see L89–96.

      Reinforcement learning mechanisms:

      Although the authors' title, abstract, and conclusions emphasize the importance of variation in 'risk sensitivity', most readers in this field will very possibly misunderstand what this means biologically. Both the authors' use of the term 'risk sensitivity' and their statistical methods for measuring this concept have potential problems.

      Please see our below responses concerning our risk-sensitivity term.

      First, most behavioral ecologists think of risk as predation risk which is not considered in this paper. Secondarily, some might think of risk as uncertainty. Here, as discussed in more detail below, the 'risk sensitivity' parameter basically influences how strongly an option's attractiveness affects the animal's choice of that option. They say that this is in line with foraging theory (Stephens and Krebs 2019) where sensitivity means seeking higher expected payoffs based on prior experience. To me, this sounds like 'reward sensitivity', but not what most think of as 'risk sensitivity'. This problem can be easily fixed by changing the name of the term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We have added this information to our manuscript in L494–497. We further apologise for not clearly explaining how our lambda parameter estimates such risk-sensitive foraging. To do so here, we need to consider our Bayesian reinforcement learning model in full. This model uses observed choice-behaviour during reinforcement learning to infer our phi (information-updating) and lambda (risksensitivity) learning parameters. Thus, payoffs incurred through choice simultaneously influence estimation of each learning parameter—that is, in a sense, they are both sensitive to rewards. But phi and lambda differentially direct any reward sensitivity back on choicebehaviour due to their distinct definitions. Glossing over the mathematics, for phi, stronger reward sensitivity (bigger phi values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For lambda, stronger reward sensitivity (bigger lambda values) means stronger internal determinism about seeking the non-risk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’. We hope this information, which we have incorporated into our revised manuscript (please see L153–161), clarifies the rationale and mechanics of our reinforcement learning model, and why lamba measures risk-sensitivity.

      In addition, however, the parameter does not measure sensitivity to rewards per se - rewards are not in equation 2. As noted above, instead, equation 2 addresses the sensitivity of choice to the attraction score which can be sensitive to rewards, though in complex ways depending on the updating parameter. Second, equations 1 and 2 involve one specific assumption about how sensitivity to rewards vs. to attraction influences the probability of choosing an option. In essence, the authors split the translation from rewards to behavioral choices into 2 steps. Step 1 is how strongly rewards influence an option's attractiveness and step 2 is how strongly attractiveness influences the actual choice to use that option. The equation for step 1 is linear whereas the equation for step 2 has an exponential component. Whether a relationship is linear or exponential can clearly have a major effect on how parameter values influence outcomes. Is there a justification for the form of these equations? The analyses suggest that the exponential component provides a better explanation than the linear component for the difference between males and females in the sequence of choices made by birds, but translating that to the concepts of information updating versus reward sensitivity is unclear. As noted above, the authors' equation for reward sensitivity does not actually include rewards explicitly, but instead only responds to rewards if the rewards influence attraction scores. The more strongly recent rewards drive an update of attraction scores, the more strongly they also influence food choices. While this is intuitively reasonable, I am skeptical about the authors' biological/cognitive conclusions that are couched in terms of words (updating rate and risk sensitivity) that readers will likely interpret as concepts that, in my view, do not actually concur with what the models and analyses address.

      To answer the Reviewer’s question—yes, these equations are very much standard and the canonical way of analysing individual reinforcement learning (see: Ch. 15.2 in Computational Modeling of Cognition and Behavior by Farrell & Lewandowsky 2018 [DOI: 10.1017/CBO9781316272503]; McElreath et al. 2008 [DOI: 10.1098/rstb/2008/0131]; Reinforcement Learning by Sutton & Barto 2018). To provide a “justification for the form of these equations'', equation 1 describes a convex combination of previous values and recent payoffs. Latent values are updated as a linear combination of both factors, there is no simple linear mapping between payoffs and behaviour as suggested by the reviewer. Equation 2 describes the standard softmax link function. It converts a vector of real numbers (here latent values) into a simplex vector (i.e., a vector summing to 1) which represents the probabilities of different outcomes. Similar to the logit link in logistic regression, the softmax simply maps the model space of latent values onto the outcome space of choice probabilities which enter the categorial likelihood distribution. We can appreciate how we did not make this clear in our manuscript by not highlighting the standard nature of our analytical approach—we now do so in our revised manuscript (please see L148–149). As far as what our reinforcement learning model measures, and how it relates cognition and behaviour, please see our previous response.

      To emphasize, while the authors imply that their analyses separate the updating rate from 'risk sensitivity', both the 'updating parameter' and the 'risk sensitivity' parameter influence both the strength of updating and the sensitivity to reward payoffs in the sense of altering the tendency to prefer an option based on recent experience with payoffs. As noted in the previous paragraph, the main difference between the two parameters is whether they relate to behaviour linearly versus with an exponential component.

      Please see our two earlier responses on the mechanics of our reinforcement learning model.

      Overall, while the statistical analyses based on equations (1) and (2) seem to have identified something interesting about two steps underlying learning patterns, to maximize the valuable conceptual impact that these analyses have for the field, more thinking is required to better understand the biological meaning of how these two parameters relate to observed behaviours, and the 'risk sensitivity' parameter needs to be re-named.

      Please see our earlier response to these suggestions.

      Agent-based simulations:

      The authors estimated two learning parameters based on the behaviour of real birds, and then ran simulations to see whether computer 'birds' that base their choices on those learning parameters return behaviours that, on average, mirror the behaviour of the real birds. This exercise is clearly circular. In old-style, statistical terms, I suppose this means that the R-square of the statistical model is good. A more insightful use of the simulations would be to identify situations where the simulation does not do as well in mirroring behaviour that it is designed to mirror.

      Based on the Reviewer’s summary of agent-based forward simulation, we can see we did a poor job explaining the inferential value of this method—we apologise. Agent-based forward simulations are posterior predictions, and they provide insight into the implied model dynamics and overall usefulness of our reinforcement learning model. R-squared calculations are retrodictive, and they say nothing about the causal dynamics of a model. Specifically, agent-based forward simulation allows us to ask—what would a ‘new’ grackle ‘do’, given our reinforcement learning model parameter estimates? It is important to ask this question because, in parameterising our model, we may have overlooked a critical contributing mechanism to grackles’ reinforcement learning. Such an omission is invisible in the raw parameter estimates; it is only betrayed by the parameters in actu. Agent-based forward simulation is ‘designed’ to facilitate this call to action—not to mirror behavioural results. The simulation has no apriori ‘opinion’ about computer ‘birds’ behavioural outcomes; rather, it simply assigns these agents random phi and lambda draws (whilst maintaining their correlation structure), and tracks their reinforcement learning. The exercise only appears circular if no critical contributing mechanism(s) went overlooked—in this case computer ‘birds’ should behave similar to real birds. A disparate mapping between computer ‘birds’ and real birds, however, would mean more work is needed with respect to model parameterisation that captures the causal, mechanistic dynamics behind real birds’ reinforcement learning (for an example of this happening in the human reinforcement learning literature, see Deffner et al. 2020 [DOI: 10.1098/rsos.200734]). In sum, agent-based forward simulation does not access goodness-of-fit—we assessed the fit of our model apriori in our preregistration (https://osf.io/v3wxb)—but it does assess whether one did a comprehensive job of uncovering the mechanistic basis of target behaviour(s). We have worked to make the above points on the method and the insight afforded by agent-based forward simulation explicitly clear in our revision—please see L192–207 and L534–537.

      Reviewer #2 (Public Review):

      Summary:

      The study is titled "Leading an urban invasion: risk-sensitive learning is a winning strategy", and consists of three different parts. First, the authors analyse data on initial and reversal learning in Grackles confronted with a foraging task, derived from three populations labeled as "core", "middle" and "edge" in relation to the invasion front. The suggested difference between study populations does not surface, but the authors do find moderate support for a difference between male and female individuals. Secondly, the authors confirm that the proposed mechanism can actually generate patterns such as those observed in the Grackle data. In the third part, the authors present an evolutionary model, in which they show that learning strategies as observed in male Grackles do evolve in what they regard as conditions present in urban environments.

      Strengths:

      The manuscript's strength is that it combines real learning data collected across different populations of the Great-tailed grackle (Quiscalus mexicanus) with theoretical approaches to better understand the processes with which grackles learn and how such learning processes might be advantageous during range expansion. Furthermore, the authors also take sex into account revealing that males, the dispersing sex, show moderately better reversal learning through higher reward-payoff sensitivity. I also find it refreshing to see that the authors took the time to preregister their study to improve transparency, especially regarding data analysis.

      Thank you—we are pleased to receive this positive evaluation, particularly concerning our efforts to improve scientific transparency via our study’s preregistration (https://osf.io/v3wxb).

      Weaknesses:

      One major weakness of this manuscript is the fact that the authors are working with quite low sample sizes when we look at the different populations of edge (11 males & 8 females), middle (4 males & 4 females), and core (17 males & 5 females) expansion range. Although I think that when all populations are pooled together, the sample size is sufficient to answer the questions regarding sex differences in learning performance and which learning processes might be used by grackles but insufficient when taking the different populations into account.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results (it might; so might muchneeded population replicates—see L310), but our Bayesian models still allow us to learn a lot from our current data. We now explain this in our revised manuscript—please see L452–457.

      Another weakness of this manuscript is that it does not set up the background well in the introduction. Firstly, are grackles urban dwellers in their natural range and expand by colonising urban habitats because they are adapted to it? The introduction also fails to mention why urban habitats are special and why we expect them to be more challenging for animals to inhabit. If we consider that one of their main questions is related to how learning processes might help individuals deal with a challenging urban habitat, then this should be properly introduced.

      In L74–75 (previously L53–56) we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We hope this included information sufficiently answers the Reviewer’s question. We have worked towards flushing out how urban-imposed challenges faced by grackles, such as the wildlife management efforts introduced in L64–65 (now L85–86), may apply to animals inhabiting urban environments more broadly; for example, we now include an entire paragraph in our Introduction detailing how urban environments may be characterised differently to nonurban environments, and thus why they are perhaps more challenging for animals to inhabit— please see L56–71.

      Also, the authors provide a single example of how learning can differ between populations from more urban and more natural habitats. The authors also label the urban dwellers as the invaders, which might be the case for grackles but is not necessarily true for other species, such as the Indian rock agama in the example which are native to the area of study. Also, the authors need to be aware that only male lizards were tested in this study. I suggest being a bit more clear about what has been found across different studies looking at: (1) differences across individuals from invasive and native populations of invasive species and (2) differences across individuals from natural and urban populations.

      We apologise for not including more examples of such learning differences. We now include three examples (please see L43–49), and we are careful to call attention to the fact that these data cover both resident urban and non-urban species as well as urban invasive species (please see L49–50). We also revised our labelling of the lizard species (please see L44). We are aware only male lizards were tested but this information is not relevant to substantiating our use of this study; that is, to highlight that learning can differ between urbandwelling and non-urban counterparts. We hope the changes we did make to our manuscript satisfy the Reviewer’s general suggestion to add biological clarity.

      Finally, the introduction is very much written with regard to the interaction between learning and dispersal, i.e. the 'invasion front' theme. The authors lay out four predictions, the most important of which is No. 4: "Such sex-mediated differences in learning to be more pronounced in grackles living at the edge, rather than the intermediate and/or core region of their range." The authors, however, never return to this prediction, at least not in a transparent way that clearly pronounces this pattern not being found. The model looking at the evolution of risk-sensitive learning in urban environments is based on the assumption that urban and natural environments "differ along two key ecological axes: environmental stability 𝑢 (How often does optimal behaviour change?) and environmental stochasticity 𝑠 (How often does optimal behaviour fail to pay off?). Urban environments are generally characterised as both stable (lower 𝑢) and stochastic (higher 𝑠)". Even though it is generally assumed that urban environments differ from natural environments the authors' assumption is just one way of looking at the differences which have generally not been confirmed and are highly debated. Additionally, it is not clear how this result relates to the rest of the paper: The three populations are distinguished according to their relation to the invasion front, not with respect to a gradient of urbanization, and further do not show a meaningful difference in learning behaviour possibly due to low sample sizes as mentioned above.

      Thank you for highlighting a gap in our reporting clarity. We now take care to transparently report our null result regarding our fourth prediction; more specifically, that we did not detect credible population-level differences in grackles’ learning (please see L130). Regarding our evolutionary model, we agree with the Reviewer that this analysis is only one way of looking at the interaction between learning phenotype and apparent urban environmental characteristics. Indeed, in L282–288 (now L325–329) we state: “Admittedly, our evolutionary model is not a complete representation of urban ecology dynamics. Relevant factors—e.g., spatial dynamics and realistic life histories—are missed out. These omissions are tactical ones. Our evolutionary model solely focuses on the response of reinforcement learning parameters to two core urban-like (or not) environmental statistics, providing a baseline for future study to build on”. But we can see now that ‘core’ is too strong a word, and instead ‘supposed’, ‘purported’ or ‘theorised’ would be more accurate—we have revised our wording throughout our manuscript to say as much (please see, for example, L24; L56; L328). We also further highlight the preliminary nature of our evolutionary model, in terms of allowing a narrow but useful first-look at urban eco-evolutionary dynamics—please see L228–232. Finally, we now detail the theorised characteristics of urban environments in our Introduction (rather than in our Results; please see L56–71), and we hope that by doing so, how our evolutionary results relate to the rest of our paper is now better set up and clear.

      In conclusion, the manuscript was well written and for the most part easy to follow. The format of eLife having the results before the methods makes it a bit harder to follow because the reader is not fully aware of the methods at the time the results are presented. It would, therefore, be important to more clearly delineate the different parts and purposes. Is this article about the interaction between urban invasion, dispersal, and learning? Or about the correct identification of learning mechanisms? Or about how learning mechanisms evolve in urban and natural environments? Maybe this article can harbor all three, but the borders need to be clear. The authors need to be transparent about what has and especially what has not been found, and be careful to not overstate their case.

      Thank you, we are pleased to read that the Reviewer found our manuscript to be generally digestible. We have worked to add further clarity, and to tempter our tone (please see our above and below responses).

      Reviewer #1 (Recommendations For The Authors):

      Several of the results are based on CIs that overlap zero. Tone these down somewhat.

      We apologise for overstating our results, which we have worked to tone down in our revision. For instance, in L185–186 we now differentiate between estimates that did or did not overlap zero (please also see our response to Reviewer 2 on this tonal change). We note we do not report confidence intervals (i.e., the range of values expected to contain the true estimate if one redoes the study/analysis many times). Rather, we report 89% highest posterior density intervals (i.e., the most likely values of our parameters over this range). We have added this definition in L459, to improve clarity.

      The literature review suggesting that urban environments are more unpredictable is not convincing. Yes, they have more noise and light pollution and more cars and planes, but does this actually relate to the unpredictability of getting a food reward when you choose an option that usually yields rewards?

      To answer the Reviewer’s question—yes. But we can see that by not including empirical examples from the literature, we did a poor job of arguing such links. In L43–49 we now give three empirical examples; more specifically, we state: “[...] experimental data show the more variable are traffic noise and pedestrian presence, the more negative are such human-driven effects on birds' sleep (Grunst et al., 2021), mating (Blickley et al., 2012), and foraging behaviour (Fernández-Juricic, 2000).” We note we now detail such apparently stable but stochastic urban environmental characteristics in our Introduction rather than our Results section, to hopefully improve the clarity of our manuscript (please see L56–71). We further note that we cite three literature reviews—not one—suggesting urban environments are stable in certain characteristics and more unpredictable in others (please see L59–60). Finally, we appreciate such characterisation is not certain, and so in our revision we have qualified all writing about this potential dynamic with words such as “apparent”, “supposed”, “theorised”, “hypothesised” etc.

      It would be interesting to see if other individual traits besides sex affect their learning/reversal learning ability and/or their learning parameters. Do you have data on age, size, condition, or personality? Or, the habitat where they were captured?

      We do not have these data. But we agree with the Reviewer that examining the potential influence of such covariates on grackles’ reinforcement learning would be interesting in future study, especially habitat characteristics (please see L306–309).

      For most levels of environmental noise, there appears to be an intermediate maximum for the relationship between environmental stability and the risk sensitivity parameter. What does this mean?

      There is indeed an intermediate maximum for certain values of environmental stochasticity (although the differences are rather small). The most plausible reason for this is that for very stable environments, simulated birds essentially always “know” the rewarded solution and never need to “relearn” behaviour. In this case, differences in latent values will tend to be large (because they consistently get rewarded for the same option), and different lambda values (in the upper range) will produce the same choice behaviour, which results in very weak selection. While in very unstable environments, optimal choice behaviour should be more exploratory, allowing learners to track frequently-changing environments. We now note this pattern in L240–248.

      Reviewer #2 (Recommendations For The Authors):

      L2: I'd encourage the authors to reconsider the term "risk-sensitive learning", at least in the title. It's not apparent to me how 'risk' relates to the investigated foraging behaviour. Elsewhere, risk-reward sensitivity is used which may be a better term.

      We apologise for not clearly introducing the field of risk-sensitive foraging, which focuses on how animals evaluate and choose between distinct food options, and how such foraging decisions are influenced by pay-off variance i.e., risk associated with alternative foraging options (seminal reviews: Bateson 2002 [DOI: 10.1079/PNS2002181]; Kacelnik & Bateson 1996 [DOI: 10.1093/ICB/36.4.402]). We have added this information to our manuscript in L494–497. In explaining our reinforcement model, we also now detail how risk relates to foraging behaviour. Specifically, in L153–161 we now state: “Both learning parameters capture individual-level internal response to incurred reward-payoffs, but they differentially direct any reward sensitivity back on choice-behaviour due to their distinct definitions (full mathematical details in Materials and methods). For 𝜙, stronger reward sensitivity (bigger values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For 𝜆, stronger reward sensitivity (bigger values) means stronger internal determinism about seeking the nonrisk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’.” We hope this information clarifies why lamba measures risk-sensitivity, and why we continue to use this term.

      L1-3: The title is a bit misleading with regard to the empirical data. From the data, all that can be said is that male grackles relearn faster than females. Any difference between populations actually runs the other way, with the core population exhibiting a larger difference between males and females than the mid and edge populations.

      It is customary for a manuscript title to describe the full scope of the study. In our study, we have empirical data, cognitive modelling, and evolutionary simulations of the background theory all together. And together these analytical approaches show: (1) across three populations, male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—outperform female counterparts in reversal learning; (2) they do this via risk-sensitive learning, so they’re more sensitive to relative differences in reward payoffs and choose to stick with the ‘safe’ i.e., rewarding option, rather than continuing to ‘gamble’ on an alternative option; and (3) risk-sensitive learning should be favoured in statistical environments characterised by purported urban dynamics. So, we do not feel our title “Leading an urban invasion: risk-sensitive learning is a winning strategy” is misleading with regard to our empirical data; it just doesn’t summarise only our empirical data. Finally, as we now state in L312–313, we caution against speculating about any between-population variation, as we did not infer any meaningful behavioural or mechanistic population-level differences.

      L13: "Assayed", is that correctly put, given that the authors did not collect the data?

      Merrian-Webster defines assay as “to analyse” or “examination or determination as to characteristics”, and so to answer the Reviewer’s question—yes, we feel this is correctly put. We note we explicitly introduce in L102–103 that we did not collect the data, and we have an explicit “Data provenance” section in our methods (please see L342–347).

      L42-46: The authors provide a single example of how learning can differ between populations from more urban and more natural habitats. I would like to point out that many of these studies do not directly confirm that the ability in question has indeed led to the success of the species tested (e.g. show fitness consequences). Then the authors could combine these insights to form a solid prediction for the grackles. As of now, this looks like cherry-picking supportive literature without considering negative results.

      Here are some references that might be helpful in identifying relevant literature to cite:

      Szabo, B., Damas-Moreira, I., & Whiting, M. J. (2020). Can cognitive ability give invasive species the means to succeed? A review of the evidence. Frontiers in Ecology and Evolution, 8, 187.

      Griffin AS, Tebbich S, Bugnyar T, 2017. Animal cognition in a human-dominated world. Anim Cogn 20(1):1-6.

      Kark, S., Iwaniuk, A., Schalimtzek, A., & Banker, E. (2007). Living in the city: Can anyone become an "urban exploiter"? Journal of Biogeography, 34(4), 638-651.

      We apologise for not including more examples of such learning differences. We now include three examples (please see L43–49). We are aware that direct evidence of fitness consequences is entirely lacking in the scientific literature on cognition and successful urban invasion; hence why such data is not present in our paper. But we now explicitly point out a role for likely fitness-affecting anthropogenic disturbances on sleep, mate, and foraging behaviour on animals inhabiting urban environments (please see L63–68). We hope these new data bolster our predictions for our grackles. Finally, the Reviewer paints a (in our view) inaccurate picture of our use of available literature. Nevertheless, to address their comment, we now highlight a recent meta-analysis advocating for further research to confirm apparent ‘positive’ trends between animal ‘smarts’ and successful ‘city living’ (please see L43).

      L64: Is their niche historically urban, or have they recently moved into urban areas?

      In L74–75 (previously L53–56) we introduce that the estimated historical niche of grackles is urban environments, and that shifts in habitat breadth—e.g., moving into more arid, agricultural environments—is the estimated driver of their rapid North American colonisation. We hope this included information sufficiently answers the Reviewer’s question.

      L66-67: This is an important point that is however altogether missing from the discussion.

      We thank the Reviewer for highlighting a gap in our discussion regarding populationlevel differences in grackles’ reinforcement learning. In L310–312 we now state: “The lack of spatial replicates in the existing data set used herein inherently poses limitations on inference. Nevertheless, the currently available data do not show meaningful population-level behavioural or mechanistic differences in grackles’ reinforcement learning, and we should thus be cautious about speculating on between-population variation”.

      L68-71: The paper focuses on cognitive ability. The whole paragraph sets up the prediction of why male grackles should be better learners due to their dispersal behaviour. This example, however, focuses on aggression, not cognition. Here is a study showing differences in learning in male and female mynas that might be better suited:

      Federspiel IG, Garland A, Guez D, Bugnyar T, Healy SD, Güntürkün O, Griffin AS, 2017. Adjusting foraging strategies: a comparison of rural and urban common mynas (Acridotheres tristis). Anim Cogn 20(1):65-74.

      We thank the Reviewer for suggesting this paper. We feel it is better suited to substantiating our point in the Discussion about reversal learning not being indicative of cognitive ability—please see L276–277.

      L73: Generally, I suggest not writing "for the first time" as this is not a valid argument for why a study should be conducted. Furthermore, except for replication studies, most studies investigate questions that are novel and have not been investigated before.

      The Reviewer makes a fair point—we have removed this statement.

      L80-81: Here again, this is left undiscussed later on.

      By ‘this’ we assume the Reviewer is referring to our hypothesis, which is that sex differences in dispersal are related to sex differences in learning in an urban invader— grackles. At the beginning of our Discussion, we state how we found support for this hypothesis (please see L250–261); and in our ‘Ideas and speculation’ section, we discuss how these hypothesis-supporting data fit into the literature more broadly (please see L294–331). We feel this is therefore sufficiently discussed.

      L77-81: This sentence is very long and therefore hard to read. I suggest trying to split it into at least 2 separate sentences which would improve readability.

      Per the Reviewer’s useful suggestion, we have split this sentence into two separate sentences—please see L97–115.

      L83: Please explain choice-option switches. I am not aware of what that is and it should be explained at first mention.

      We apologise for this operational oversight. We now include a working definition of speed and choice-option switches at first mention. Specifically, in L107–108 we state: “[...] we expect male and female grackles to differ across at least two reinforcement learning behaviours: speed (trials to criterion) and choice-option switches (times alternating between available stimuli)”.

      L83-87: Again, a very long sentence. Please split.

      We thank the Reviewer for their suggestion. In this case we feel it is important to not change our sentence structure because we want our prediction statements to match between our manuscript and our preregistration.

      L96-97: Important to not overstate this. It merely demonstrates the potential of the proposed (not detected) mechanism to generate the observed data.

      As in any empirical analysis, our drawn conclusions depend on causal assumptions about the mechanisms generating behaviour (Pearl, J. (2009). Causality). Therefore, we “detected” specific learning mechanisms assuming a certain generative model, namely reinforcement learning. As there is overwhelming evidence for the widespread importance of value-based decision making and Rescorla-Wagner updating rules across numerous different animals (Sutton & Barto (2018) Reinforcement Learning), we would argue that this assumed model is highly plausible in our case. Still, we changed the text to “inferred” instead of “detected” learning mechanisms to account for this concern—please see L123–124.

      L99: "urban-like settings" again a bit confusing. The authors talk about invasion fronts, but now also about an urbanisation gradient. Is the main difference between the size and the date of establishment, or is there additionally a gradient in urbanisation to be considered?

      We now include a paragraph in our Introduction detailing apparent urban environmental characteristics (please see 56–71), and we now refer to this dynamic specifically when we define urban-like settings (please see L126–127). To answer the Reviewer’s question—we consider both differences. Specifically, we consider the time since population establishment in our paper (with respect to our behavioural and mechanistic modelling), as well as how statistical environments that vary in how similar they are to apparently characteristically urban-like environments, might favour particular learning phenotypes (with respect to our evolutionary modelling). We hope the edits to our Introduction as a whole now make both of the aims clear.

      L11-112: Above the authors talk about a comparable number of switches (10.5/15=0.7), and here of fewer number of switches (25/35=0.71), even though the magnitude of the difference is almost identical and actually runs the other way. The authors are probably misled by their conservative priors, which makes the difference appear greater in the second case than in the first. Using flat priors would avoid this particular issue.

      Mathematically, the number of trials-to-finish and the number of choice-optionswitches are both a Poisson distributed outcome with rate λ (we note lambda here is not our risk-sensitivity parameter; just standard notation). As such, our Poisson models infer the rate of these outcomes by sex and phase—not the ratio of these outcomes by sex and phase. So comparing the magnitude of divided medians of choice-option-switches between the sexes by phase is not a meaningful metric with respect to the distribution of our data, as the Reviewer does above. For perspective, 1 vs. 2 switches provides much less information about the difference in rates of a Poisson distribution than 50 vs 100 (for the former, no difference would be inferred; for the latter, it would), but both exhibit a 1:2 ratio. To hopefully prevent any such further confusion, and to focus on the fact that our Poisson models estimate the expected value i.e., the mean, we now report and graph (please see Fig. 2) mean and not median trialsto-finish and total-switch-counts. Finally, we can see that our use of the word “conservative” to describe our weakly informative priors is confusing, because conservative could mean either strong priors with respect to expected effect size (not our parameterisation) or weak priors with respect to such assumptions (our parameterisation). To address this lack of clarity, we now state that we use “weakly informative priors” in L457–458.

      L126: It is not clear what risk sensitivity means in the context of these experiments.

      Thank you for pointing out our lack of clarity. In L153–161 we now state: “Both learning parameters capture individual-level internal response to incurred reward-payoffs, but they differentially direct any reward sensitivity back on choice-behaviour due to their distinct definitions (full mathematical details in Materials and methods). For 𝜙, stronger reward sensitivity (bigger values) means faster internal updating about stimulus-reward pairings, which translates behaviourally into faster learning about ‘what to choose’. For 𝜆, stronger reward sensitivity (bigger values) means stronger internal determinism about seeking the nonrisk foraging option (i.e., the one with the higher expected payoffs based on prior experience), which translates behaviourally into less choice-option switching i.e., ‘playing it safe’.” We hope this information clarifies what risk sensitivity means and measures, with respect to our behavioural experiments.

      L128-129: I find this statement too strong. A plethora of other mechanisms could produce similar patterns, and you cannot exclude these by way of your method. All you can show is whether the mechanism is capable of producing broadly similar outcomes as observed

      In describing the inferential value of our reinforcement learning model, we now qualify that the insight provided is of course conditional on the model, which is tonally accurate. Please see L161.

      L144: As I have already mentioned above, here is the first time we hear about unpredictability related to urban environments. I suggest clearly explaining in the introduction how urban and natural environments are assumed to be different which leads to animals needing different cognitive abilities to survive in them which should explain why some species thrive and some species die out in urbanised habitats.

      Thank you for this suggestion. We now include a paragraph in our Introduction detailing as much—please see L56–71.

      L162: "almost entirely above zero" again, this is worded too strongly.

      In reporting our lambda across-population 89% HPDI contrasts in L185–186, we now state: “[...] across-population contrasts that lie mostly above zero in initial learning, and entirely above zero in reversal learning”. Our previous wording stated: ““[...] across-population contrasts that lie almost entirely above zero”. The Reviewer was correct to point out that this previous wording was too strong if we considered the contrasts together, as, indeed, we find the range of the contrast in initial learning does minimally overlap zero (L: -0.77; U: 5.61), while the range of the contrast in reversal learning does not (L: 0.14; U: 4.26). This rephrasing is thus tonally accurate.

      L178-179: I think it should be said instead that the model accounts well for the observed data.

      We have rephrased in line with the Reviewer’s suggestion, now stating in L217–218 that “Such quantitative replication confirms our reinforcement learning model results sufficiently explain our behavioural sex-difference data.”

      L188-190: I am not convinced this is a general pattern. It is quite a bold claim that I don't find to be supported by the citations. Why should biotic and abiotic factors differ in how they affect behavioural outcomes? Also, events in urban environments such as weekend/weekday could lead to highly regular optimal behaviour changes.

      Please see our response to Reviewer 1 on this point. We note we now touch on such regular events in L94–96.

      L209-211: The first sentence is misleading. The authors have found that males and females differ in 'risk sensitivity', that their learning model can fit the data rather well, and that under certain, not necessarily realistic assumptions, the male learning type is favoured by natural selection in urban environments. A difference between core, middle, and edge habitats however is barely found, and in fact seems to run the other way than expected.

      In our study, we found: (1) across three populations, male grackles—the dispersing sex in this historically urban-dwelling and currently urban-invading species—outperform female counterparts in reversal learning; (2) they do this via risk-sensitive learning, so they’re more sensitive to relative differences in reward payoffs and choose to stick with the ‘safe’ i.e., rewarding option, rather than continuing to ‘gamble’ on an alternative option; (3) we are sufficiently certain risk-sensitive learning generates our sex-difference data, as our agentbased forward simulations replicate our behavioural results (not because our model ‘fits’ the data, but because we inferred meaningful mechanistic differences—see our response to Reviewer 1 on this point); and (4) under theorised dynamics of urban environments, natural selection should favour risk-sensitive learning. We therefore do not feel it is misleading to say that we mapped a full pathway from behaviour to mechanisms through to selection and adaptation. Again, as we now state in L311–313, we caution against speculating about any between-population variation, as we did not infer any meaningful behavioural or mechanistic population-level differences. And we note the Reviewer is wrong to assume an interaction between learning, dispersal, and sex requires population-level differences on the outcome scale—please see our discussion on phenotypic plasticity and inherent species trait(s) in L313–324.

      L216: "indeed explain" again worded too strongly.

      We have tempered our wording. Specifically, we now state in L218: “sufficiently explain”. This wording is tonally accurate with respect to the inferential value of agent-based forward simulations—please see L192–207 on this point.

      L234: "reward-payoff sensitivity" might be a better term than risk-sensitivity?

      Please see our earlier response to this suggestion. We note we have changed this text to state “risk-sensitive learning” rather than “reward-payoff sensitivity”, to hopefully prevent the reader from concluding only our lambda term is sensitive to rewards—a point we now include in L153–154.

      L234-237: I think these points may be valuable, but come too much out of the blue. Many readers will not have a detailed knowledge of the experimental assays. It therefore also does not become clear how they measure the wrong thing, what this study does to demonstrate this, or whether a better alternative is presented herein. It almost seems like this should be a separate paper by itself.

      We apologise for this lack of context. We now explicitly state in L275 that we are discussing reversal learning assays, to give all readers this knowledge. In doing so, we hope the logic of our argument is now clear: reversal learning assays do not measure behavioural flexibility, whatever that even is. The Reviewer’s suggestion of a separate paper focused on what reversal learning assays actually measure, in terms of mechanism(s), is an interesting one, and we would welcome this discussion. But any such paper should build on the points we make here.

      L270-288: Somewhere here the authors have to explain how they have not found differences between populations, or that in so far as they found them, they run against the originally stated hypothesis.

      We thank the Reviewer for these suggestions. In L310—313 we now state: “The lack of spatial replicates in the existing data set used herein inherently poses limitations on inference. Nevertheless, the currently available data do not show meaningful population-level behavioural or mechanistic differences in grackles’ reinforcement learning, and we should thus be cautious about speculating on between-population variation”.

      L284: should be "missing" not "missed out"

      We have made this change.

      L290-291: It is unclear what "robust interactive links" were found. A pattern of sexbiased learning was found, which can potentially be attributed to evolutionary pressures in urban environments. An interaction e.g. between learning, dispersal, and sex can only be tentatively suggested (no differences between populations). Also "fully replicable" is a bit misleading. The analysis may be replicable, but the more relevant question of whether the findings are replicable we cannot presently answer.

      We apologise for our lack of clarity. By “robust” we mean “across population”, which we now state in L333. We again note the Reviewer is wrong to assume an interaction between learning, dispersal, and sex requires population-level differences on the outcome scale— please see our discussion on phenotypic plasticity and inherent species trait(s) in L313–324. Finally, the Reviewer makes a good point about our analyses but not our findings being replicable. In L334 we now make this distinction by stating “analytically replicable”.

      L306-315: I think you have a bit of a sample size issue not so much when populations are pooled but when separated. This might also factor in the fact that you do not really find differences across the populations in your analysis. When we look at the results presented in Figure 2 (and table d), we can see a trend towards males having better risk sensitivity in core (HPDI above 0) and middle populations (HPDI barely crossing 0) but the difference is very small. Especially the results on females are based on the performance of only 8 and 4 females respectively. I suggest making this clear in the manuscript.

      In Bayesian statistics, there is no strict lower limit of required sample size as the inferences do not rely on asymptotic assumptions. With inferences remaining valid in principle, low sample size will of course be reflected in rather uncertain posterior estimates. We note all of our multilevel models use partial pooling on individuals (the random-effects structure), which is a regularisation technique that generally reduces the inference constraint imposed by a low sample size (see Ch. 13 in Statistical Rethinking by Richard McElreath [PDF: https://bit.ly/3RXCy8c]). We further note that, in our study preregistration (https://osf.io/v3wxb), we formally tested our reinforcement learning model for different effect sizes of sex on learning for both target parameters (phi and lambda) across populations, using a similarly modest N (edge: 10 M, 5 F; middle: 22 M, 5 F ; core: 3 M, 4 F) to our actual final N, that we anticipated to be our final N at that time. This apriori analysis shows our reinforcement learning model: (i) detects sex differences in phi values >= 0.03 and lambda values >= 1; and (ii) infers a null effect for phi values < 0.03 and lambda values < 1 i.e., very weak simulated sex differences (see Figure 4 in https://osf.io/v3wxb). Thus, both of these points together highlight how our reinforcement learning model allows us to say that across-population null results are not just due to small sample size. Nevertheless the Reviewer is not wrong to wonder whether a bigger N might change our population-level results; it might; so might muchneeded population replicates—see L310. But our Bayesian models still allow us to learn a lot from our current data, and, at present, we infer no meaningful population-level behavioural or mechanistic differences in grackles’ behaviour. To make clear the inferential sufficiency of our analytical approach, we now include some of the above points in our Statistical analyses section in L452–457. Finally, we caution against speculating on any between-population variation, as we now highlight in L311—313 of our Discussion.

      Figure 2: I think the authors should rethink their usage of colour in this graph. It is not colour-blind friendly or well-readable when printed in black and white.

      We used the yellow (hex code: #fde725) and green (hex code: #5ec962) colours from the viridis package. As outlined in the viridis package vignette (https://cran.rproject.org/web/packages/viridis/index.html), this colour package is “designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency. The color maps are also perceptually-uniform, both in regular form and also when converted to black-and-white for printing”.

      Figure 3B: Could the authors turn around the x-axis and the colour code? It would be easier to read this way.

      We appreciate that aesthetic preferences may vary. In this case, we prefer to have the numbers on the x-axis run the standard way i.e., from small to large. We note we did remove the word ‘Key’ from this Figure, in line with the Reviewer’s point about these characteristics not being totally certain.

      I also had a look at the preregistration. I do think that there are parts in the preregistration that would be worth adding to the manuscript:

      L36-40: This is much easier to read here than in the manuscript.

      We changed this text generally in the Introduction in our revision, so we hope the Reviewer will again find this easier to read.

      L49-56: This is important information that I would also like to see in the manuscript.

      We no longer have confidence in these findings, as our cleaning of only one part of these data revealed considerable experimenter oversight (see ‘Learning criterion’).

      L176: Why did you remove the random effect study site from the model? It is not part of the model in the manuscript anymore.

      The population variable is part of the RL_Comp_Full.stan model that we used in our manuscript to assess population differences in grackles’ reinforcement learning, the estimates from which we report in Table C and D (please note we never coded this variable as “study cite”). But rather than being specified as a random effect, in our RL_Comp_Full.stan model we index phi and lambda by population as a predictor variable, to explicitly model population-level effects. Please see our code:

      https://github.com/alexisbreen/Sex-differences-in-grackles- learning/blob/main/Models/Reinforcement%20learning/RL_Comp_Full.stan

      L190-228: I am wondering if the model validation should also be part of the manuscript as well, rather than just being in the preregistration?

      We are not sure how the files were presented to the Reviewer for review, but our study preregistration, which includes our model validation, should be part of our manuscript as a supplementary file.

    1. how do we educate our children so they have a sense of cultural identity and so that we can pass on the cultural genes of our 00:00:53 communities while being part of the process of globalization how do you square that circle the problem is they're trying to meet the future by doing what they did in the past

      I think it is important to know your culture. In my opinion it's important because it's where I came from and where my family comes from. It also may provide reasoning as to why some things are the way they are in your life.

    1. Let’s face it, very few people read the “terms and conditions,” or the “terms of use” agreements prior to installing an application (app). These agreements are legally binding, and clicking “I agree” may permit apps (the companies that own them) to access your: calendar, camera, contacts, location, microphone, phone, or storage, as well as details and information about your friends.

      This is so important it's not something I ever considered or worried about when thinking of privacy and security. I never read the "terms and conditions" when getting on to new apps or websites. I didn't think about how I could be agreeing to things that I would never agree to if someone asked me directly. This could not only be harmful to me, but to family and friends too because their information could be embedded into what I allowed access to. This caused me to think about how I can say something or see something and an ad for that specific thing pops up on my social media or google account right after. I'm agreeing for social media accounts to listen and look at everything on my device and share it with people. This is scary to think about, especially for young children. Anyone could hack into these accounts and get information about everything dealing with a child's life. As educators, we need to be cautious about reading the terms and conditions and we need to teach our students to be cautious of them too for their safety.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1.1) This work introduces a new method of imaging the reaction forces generated by small crawling organisms and applies this method to understanding locomotion of Drosophila larva, an important model organism. The force and displacement data generated by this method are a qualitative improvement on what was previously available for studying the larva, improving simultaneously the spatial, temporal, and force resolution, in many cases by an order of magnitude. The resulting images and movies are quite impressive.

      We thank the reviewer for their recognition of the achievements our work presents and for their feedback with regard to what they consider our most important findings and the points raised in their review. We will address these points individually below.

      (1.2) As it shows the novel application of recent technological innovations, the work would benefit from more detail in the explanation of the new technologies, of the rationales underlying the choice of technology and certain idiosyncratic experimental details, and of the limitations of the various techniques. In the methods, the authors need to be sure to provide sufficient detail that the work can be understood and replicated. The description of the results and the theory of motion developed here focus only on forces generated when the larva pushes against the substrate and ignores the equally strong adhesive forces pulling the larva onto the substrate.

      As the reviewer correctly points out, our present work adapts a recently developed set of methods (namely, ERISM and WARP) for use with small soft-bodied animals. The foundational methods have been described in detail in previous publications (refs, 23 and 26). However, upon reflection, we agree that more information can be provided to ensure our work is more accessible and reproducible. We also agree that some additional clarifying information on our approach could be helpful. We have addressed this in the following ways:

      (1) We have included a detailed Key Resources table in the methods section to allow for maximum transparency on equipment and reagent sourcing. This can now be found on Pages 16-19.

      (2) We have modified the ‘Freely behaving animals force imaging’ section of the Materials and Methods section to include more detailed information on practical aspects of conducting experiments. These changes can be found on page 23-24 (lines 566–567, 571-577).

      (3) We have re-ordered the Materials and Methods section, such that microcavity fabrication and microcavity characterisation occur prior to the description of ERISM and WARP experiments - this change should hopefully aid replication. Details regarding the application of a silicone well to the surface of microcavities have also been added (lines 472-474).

      (4) We have added additional text in the Introduction and Results (Pages 3-4 and 7, lines 56-86, and 152-153) to explain our rationale for using ERISM/WARP and additional text in the discussion that discusses the potential role(s) of adhesive forces in larval locomotion (Page 12, lines 301307).

      (1.3) The substrate applies upward, downward, and horizontal forces on the larva, but only upward and downward forces are measured, and only upward forces are considered in the discussions of "Ground Reactive Forces." An apparent weakness of the WARP technique for the study of locomotion is that it only measures forces perpendicular to the substrate surface ("vertical forces" in Meek et al.), while locomotion requires the generation of forces parallel to the substrate ("horizontal forces"). It should be clarified that only vertical forces are studied and that no direct information is provided about the forces that actually move the larva forward (or about the forces which impede this motion and are also generated by the substrate). Along with this clarification, it would be helpful to include a discussion of other techniques, especially micropillar arrays and traction force microscopy, that directly measure horizontal forces and of why these techniques are inappropriate for the motions studied here.

      We attempted to provide a streamlined Introduction in our initial submission and then compared ERISM/WARP to other methods in our discussion. We are happy to provide a brief overview of substrate force measurement methods in the introduction to help set the stage for readers. The Introduction section of our revised manuscript now contains the following comparison of different mechanobiological imaging techniques on pages 3-4 lines 56-86:

      ‘However, in the field of cellular mechanobiology, many new force measuring techniques have been developed which allow measurement of comparatively small forces from soft structures exhibiting low inertia (15–17) often with relatively high spatial-resolution. Early methods such as atomic force microscopy required the use of laser-entrained silicon probes to make contact with a cell of interest (15). This approach is problematic for studying animal behaviour due to the risk of the laser and probe influencing behaviour. Subsequently, techniques have been developed which allow indirect measurement of substrate interactions. One such approach is Traction Force Microscopy (TFM) in which the displacement of fluorescent markers suspended in a material with known mechanical properties relative to a zero-force reference allows for indirect measurement of horizontally aligned traction forces (17–19). This technique allows for probe-free measurement of forces, but the need to obtain a precise zero-force reference would make time-lapse measurements on behaving animals challenging; further, depending on the version used, it has insufficient temporal resolution for the measurement of forces produced by many behaving animals, despite recent improvements (20). A second approach revolves around the use of micropillar arrays; in this technique, horizontally-aligned traction forces are measured by observing the deflection of pillars made of an elastic material with known mechanical properties. This approach can be limited in spatial resolution and introduces a non-physiological substrate that may influence animal behavior (21,22).

      Recently we have introduced a technique named Elastic Resonator Interference Stress Microscopy (ERISM) which allows for the optical mapping of vertically aligned GRFs in the pico and nanonewton ranges with micrometre spatial resolution by monitoring local changes in optical resonances of soft and deformable microcavities. This technique allows reference-free mapping of substrate deformations and calculation of vertically directed GRFs; it has been used to study a range of questions related to exertion of cellular forces (23–25). Until recently, this technique was limited by its low temporal resolution (~10s), making it unsuitable for recording substrate interaction during fast animal movements, but a further development of ERISM known as wavelength alternating resonance pressure microscopy (WARP), has been demonstrated to achieve down to 10 ms temporal resolution (26). Given ERISM/WARP allows for probe-free measurement of vertical ground reaction forces with high spatial and temporal resolution, it becomes an attractive method for animal-scale mechanobiology.’

      (1.4) The larvae studied are about 1 mm long and 0.1 mm in cross-section. Their volumes are therefore on order 0.01 microliter, their masses about 0.01 mg, and their weights in the range of 0.1 micronewton. This contrasts with the force reported for a single protpodium of 1 - 7 micronewtons. This is not to say that the force measurements are incorrect. Larvae crawl easily on an inverted surface, showing gravitational forces are smaller than other forces binding the larva to the substrate. The forces measured in this work are also of the same magnitude as the horizontal forces reported by Khare et al. (ref 32) using micropillar arrays.

      I suspect that the forces adhering the larva to the substrate are due to the surface tension of a water layer. This would be consistent with the ring of upward stress around the perimeter of the larva visible in S4D, E and in video SV3. The authors remark that upward deflection of the substrate may be due to the Poisson's ratio of the elastomer, but the calibration figure S5 shows that these upward deflections and forces are much smaller than the applied downward force. In any case, there must be a downward force on the larva to balance the measured upward forces and this force must be due to interaction with the substrate. It should be verified that the sum of downward minus upward forces on the gel equals the larva's weight (given the weight is neglible compared to the forces involved, this implies that the upward and downward forces should sum to 0).

      We have carefully calculated the forces exerted by protopodia and are confident in the accuracy of our measurements as reported. We further agree with the reviewer’s suggestion that gravitational forces can be largely neglected.

      As the reviewer points out, one would expect forces due to upward and downward deflections to cancel when considering the entire system. However, we see indications that the counteracting / balancing force often acts over a much larger area than the acting force, e.g. a sharp indentation by a protopodium might be counteracted by an upward deflection over a 10-20 fold larger radius and hence 100 to 400-fold larger area, thereby reducing the absolute value of the upward deflection at any given pixel surrounding the indentation. This in turn increases error in determining the integrated upward deformation, making it difficult to perform an absolute comparison of acting and counteracting force. Further, recording the entire counteracting force induced deformation would require acquiring data with a prohibitively large field of view.

      We agree that in some situations, water surface tension may be adhering animals to the substrate. Importantly, this is a challenge that the animal faces outside the lab in its natural environment of moist rotting fruit and yeast. The intricate force patterns seen in our study in the presence of water surface tension are therefore ecologically relevant. In other situations (e.g. preparing for pupation), larvae are able to stick to dry surfaces, suggesting that other adhesive forces such as mucoid adhesion can also come into play in certain behavioural contexts. A full characterization of the effects of water tension and mucoid adhesion are beyond the scope of this study. However, we have now added a sentence on pages 8 and 12 commenting on these other biomechanical forces at play:

      ‘We also observed that the animals travel surrounded by a relatively large water droplet (lines 189-190).’

      ‘We observed that larvae travel surrounded by moisture from a water droplet, which produces a relatively large upwardly directed force in a ring around the animal. The surface tension produced by such a water droplet likely serves a role in adhering the animal to the substrate. However, during forward waves, we found that protopodia detached completely during SwP, suggesting this surface tensionrelated adhesion force can be easily overcome by the behaving animal. (lines 301-307) .’

      (1.5) Much of the discussion and the model imply that the sites where the larva exerts downward force on the gel are the sites where horizontal propulsion is generated. This assumption should be justified. Can the authors rule out that the larva 'pulls' itself forward using surface tension instead of 'pushing' itself forward using protopodia?

      Determining the exact ‘sites’ where horizontal propulsion is generated is challenging. In our conceptual model, movement is not initiated by protopodia per se, but rather by a constellation of muscle contractions, which act upon the hydrostatic skeleton, which in turn causes visceral pistoning that heaves larvae forward. This is based on previous findings in Ref 31. While there are indeed downward protopodial ‘vaulting’ forces prior to initiation of swing, we propose that the main function of protopodia is not to push the larvae forward, but rather to provide anchoring to counteract opposing forces generated by muscles. We agree that water surface tension could also be sculpting biomechanical interactions; however, a full characterization of how water surface tension shapes larval locomotion is beyond the scope of this study.

      Since we have observed larvae move over dry terrain (e.g. glass) without an encasing water bubble, we do not believe that an encasing water bubble is strictly required for locomotion. We have also seen no obvious locomotion related modulations in the pulling forces created by water bubbles encasing larva, which would be expected if animals were somehow using water tension to pull themselves forward. Overall, the most likely explanation is that larvae use a mixture of biomechanical tactics to suit the moment in a given environment. This represents a challenge but also an opportunity for future research.

      We have now added additional text in the ‘Functional subdivisions within protopodia’ subsection to discuss these nuances (page 14, lines 382-387):

      ‘This increased force transmitted into the substrate is unexpected as the forces generated for the initiation of movement should arise from the contraction of the somatic muscles. We propose that the contraction of the musculature responsible for sequestration acts to move haemolymph into the protopodia thus exerting an increased pressure onto the substrate while the contact area decreases as a consequence of the initiation of sequestration.’

      and (page 15, lines 398-399):

      ‘Water surface films appear to facilitate larval locomotion in general but the biomechanical mechanisms by which they do this remain unclear.’

      (1.6) More detail should be provided about the methods, their limitations, and the rationale behind certain experimental choices.

      We thank the reviewer for this comment. As this significantly overlaps with a point raised earlier, we kindly direct them to our answer to comment #1.2 above.

      (1.7) Three techniques are introduced here to study how a crawling larva interacts with the substrate: standard brightfield microscopy of a larva crawling in an agarose capillary, ERISM imaging of an immobilized larva, and WARP imaging of a crawling larva. The authors should make clear why each technique was chosen for a particular study - e.g. could the measurements using brightfield microscopy also be accomplished using WARP? They should also clarify how these techniques relate to and possibly improve on existing techniques for measuring forces organisms exert on a substrate, particularly micropillar arrays and Traction Force Microscopy.

      Indeed, each of the three methods used has a specific merit. The brightfield microscopy was selected to track features on the animal’s body and to provide a basic control for the later measurements. However, this technique cannot directly measure the substrate interaction, it only allows inferences to be made from tracked features at the substrate interface. ERISM provides high resolution maps of the indentation induced by the larva; it is also extensively validated for mapping cell forces and the data analysis is robust against defects on the substrate (refs 23, 24 and 25). However, as we explain in the manuscript, ERISM lacks the temporal resolution needed to monitor mechanical activity of behaving larva. Its use was therefore limited to the study of anaesthetised animals. For mapping forces exerted by behaving larva, we used WARP which is a further development of ERISM that offers higher frame rates but at the cost of requiring more extensive calibration (Supplementary Figure S4). The streamlined introduction of the different methods in our original manuscript originates from our attempt to be as concise as possible. However, as state in response to comment #1.2, we agree that additional explanation and discussion will be helpful for readers and that it will helpful to briefly refer to other methods for force mapping. We have now added references to a variety of techniques in the Introduction (Page 3-4, lines 56-86) as stated in a prior response.

      (1.8) As written, "(ERISM) (19) and a variant, Wavelength Alternating Resonance Pressure microscopy (WARP) (20) enable optical mapping of GRFs in the nanonewton range with micrometre and millisecond precision..." (lines 53-55) may generate confusion. ERISM as described in this work has a much lower temporal resolution (requires the animal to be still for 5 seconds - lines 474-5); In this work, WARP does not appear to have nanonewton precision (judging by noise on calibration figures) and it is not clear that it has millisecond precision (the camera used and its frame rate should be specified in the methods).

      Previous studies have demonstrated the capabilities and limitations of ERISM and WARP. Upon reflection, we agree that our wording here could be more precise. To clarify our claim, we now separate the statements on ERISM and WARP in the introduction as follows (page 4, lines 78-83):

      “Until recently, this technique was limited by its low temporal resolution (~10s) making it unsuitable for use in recording substrate interaction during fast animal movements, but a further development of ERISM known as wavelength alternating resonance pressure microscopy (WARP), has been demonstrated to achieve down to 10 ms temporal resolution (26)”

      While WARP can achieve comparable force resolution as ERISM when used in a cellular context (c.f. Ref 26), we agree that for the present study, the resolution was in the 10s of nanonewton range, due to the need to use stiffer substrates and larger fields of view.

      The camera used in our work was specified in the appropriate subsection of the Materials and Methods (“All WARP and ERISM images were acquired using an Andor Zyla 4.2 sCMOS camera (Andor Technology, Belfast, UK)”). We apologise that the exact frame rate used in our current work was not mentioned in our original manuscript; this has now been added to the ‘Freely behaving animals force imaging’ section of the Materials and Methods (page 23, lines 574-577).

      (1.9) It would be helpful to have a discussion of the limits of the techniques presented and tradeoffs that might be involved in overcoming them. For instance, what is the field of view of the WARP microscope, and could it be increased by choosing a lower power objective? What would be required to allow WARP microscopy to measure horizontal forces? Can a crawling larva be imaged over many strides by recentering it in the field of view, or are there only particular regions of the elastomer where a measurement may be made?

      We agree with the reviewer that some discussion of the limitations of our technique will allow readers to have a more informed appreciation of what we are capable of measuring using WARP. However, as this is the first work to ever demonstrate such measurements, the limitations and tradeoffs cannot all be known with certainty at the present stage.

      To answer your individual questions:

      (1) There is a trade-off between numerical aperture and the ability to resolve individual interference fringes. Since our approach to calculate displacement from reflection maps relies upon counting of individual fringe transitions, going to a lower powered objective risks having these fringes blend and thus the identification of the individual transitions becoming impossible. The minimum numerical aperture of the objective will therefore generally depend on the steepness of indentations produced by the animals; the steeper an indentation, the closer the neighbouring fringes and thus the higher the required magnification to resolve them.

      (2) From WARP and ERISM data, one can make inferences about horizontal forces, as is described in detail in our earlier publications about ERISM (ref, 23). However, quantitation of horizontal forces at sufficient temporal resolution to allow the investigation of behaving Drosophila larva is currently not possible.

      (3) Many strides can indeed be imaged using our technique, however, this comes with additional technical challenges. Whether or not the animal itself can be recentred is an ongoing challenge. We have found that the animals are amenable to recentring themselves within the field of view if chasing an attractive odorant. However, manual recentering using a paintbrush risks destroying the top surface of the soft elastic resonator and recentering the microscope stage would require real-time object tracking which has been outside the scope of this original work, given the other challenging requirements on hardware and optics for obtaining high quality force maps.

      To provide more information on limitations of our technique, we have added the following text into the discussion (pages 13-14, lines 356-370).

      ‘Despite the substantial advances they have provided, the use of WARP and ERISM also brings challenges and has several technical limitations. For example, fabrication of resonators is much more challenging than preparation of the agarose substrates conventionally used for studying locomotion of Drosophila. This problem is compounded by the fragility of the devices owing to the fragility of the thin gold top mirror. This becomes problematic when placing animals onto the microcavities, as often the area local to the initial placement of the animal is damaged by the paintbrush used to move the animals. Further, as a result of the combining of the two wavelengths, the effective framerate of the resultant displacement and stress maps is equal to half of the recorded framerate of the interference maps. To be able to monitor fast movements, recording at very high framerates is therefore necessary which, depending on hardware, might require imaging at reduced image size, but this in turn reduces the number of peristaltic waves that can be recorded before the animal escapes the field of view. A further limitation is that WARP and ERISM are sensitive mainly to forces in the vertical direction; this is complementary to TFM, which is sensitive to forces in horizontal directions. Using WARP in conjunction with high speed TFM (possibly using the tuneable elastomers presented here) could provide a fully integrated picture of underlying vertical and horizontal traction forces during larval locomotion.’ And further on page 13, lines 337-341:

      ‘More detailed characterisation of this behaviour remains a challenge owing to the changing position of the mouth hooks. Due to their rigid structure and the relatively large forces produced in planting, mouth hooks produce substrate interaction patterns which our technique struggles to map accurately due to overlapping interference fringes ambiguating the fringe transitions.’

      We trust that the above discussion and our modifications to our manuscript resulting from these will address the reviewer’s concerns.

      Reviewer #2 (Public Review):

      (2.1) With a much higher spatiotemporal resolution of ground dynamics than any previous study, the authors uncover new "rules" of locomotory motor sequences during peristalsis and turning behaviors. These new motor sequences will interest the broad neuroscience community that is interested in the mechanisms of locomotion in this highly tractable model. The authors uncover new and intricate patterns of denticle movements and planting that seem to solve the problem of net motion under conditions of force-balance. Simply put, the denticulated "feet" or tail of the Drosophila larva are able to form transient and dynamic anchors that allow other movements to occur.

      We thank the reviewer for their feedback and the information regarding which of our results is likely to resonate most impactfully with readers from a biological background.

      The biology and dynamics are well-described. The physics is elementary and becomes distracting when occasionally overblown. For example, one doesn't need to invoke Newton's third law, per se, to understand why anchors are needed so that peristalsis can generate forward displacements. This is intuitively obvious.

      We are sorry to hear that the reviewer found some of the physics details distracting. To address this concern, we have simplified some of the language while still attempting to keep the core arguments intact. For context and analogy, we still believe that including a brief reference to the laws of motion is helpful for some readers to explain some of our results and highlight their general implications, especially with regard to anchoring against reaction forces.

      One of our objectives is to make this article accessible and interesting for biologists and physicists at all levels. We feel it is important to reach out to both communities and try to be inclusive as possible in our writing. Newton’s 3rd law is clearly relevant for our study and it is a common point of reference for anyone with a highschool education, and so we feel it is appropriate to mention it as a way to help readers across disciplines understand the biophysical challenges faced by the animals we study.

      (2.2) Another distracting allusion to "physics" is correlating deformation areas with displaced volume, finding that "volume is a consequence of mass in a 2nd order polynomial relationship". I have no idea what this "physics" means or what relevance this relationship has to the biology of locomotion.

      Upon reflection, we agree that this language may be overly complex and distracts from what is, at its core, a simple, but important principle governing how Drosophila larvae interact with their substrates. The point we are trying to make is that our data show that forces exerted by an animal are proportional in a non-linear way to contact area. This suggests that to increase force exerted on the substrate, an animal must increase contact area. We do not observe contact area remaining constant while force increases, or vice versa. To make this result more clear, we have made several changes in our revised manuscript. Figure 5B no longer shows the relationship between the protopodial contact area and the displaced volume of the elastic resonator, but instead now shows the protopodial contact area and recorded force transmitted into the substrate. This then shows that in order to increase force transmitted into the substrate, these animals must increase their contact area. We have made changes to the figure legend of Figure 5 and the statements in the Results section accordingly (Page 9, lines 220-222).

      2.3 The ERISM and WARP methods are state-of-the-art, but aside from generally estimating force magnitudes, the detailed force maps are not used. The most important new information is the highly accurate and detailed maps of displacement itself, not their estimates of applied force using finite element calculations. In fact, comparing displacements to stress maps, they are pretty similar (e.g., Fig 4), suggesting that all experiments are performed in a largely linear regime. It should also be noted that the stress maps are assumed to be normal stresses (perpendicular to the plane), not the horizontal stresses that are the ones that actually balance forces in the plane of animal locomotion.

      We largely agree with the statement made by the reviewer here. However, we have found that in many contexts, audiences appreciate having the absolute number of the forces and stresses involved reported. Therefore, where possible, we have used stress maps, rather than displacement maps. We also observe that while stress and displacement maps show similar patterns, features sometimes appear sharper in the stress map, which is a result of the finite element algorithm being able to attribute a broad indentation to a somewhat more localised downward force. We have thus opted to keep to original stress maps. We have been more explicit about WARP and ERISM being more tuned to recording vertically directed forces throughout the revised manuscript (lines 75, 78, 86, 162, 301, 305, 336).

      We have also modified our Discussion section to encourage further investigation of our proposed model using a technique more tuned to horizontal stresses (pages 12-13, lines 324-328):

      ‘However, WARP microscopy is best suited to measurements of forces in the vertical direction, and though we can make inferences such as this as they are a consequence of fundamental laws of physics, we present this conclusion as a testable prediction which could be confirmed using a force measurement technique more tuned to horizontally directed forces relative to the substrate.’

      (2.4) But none of this matters. The real achievements are the new locomotory dynamics uncovered with these amazing displacement measurements. I'm only asking the authors to be precise and down-to-earth about the nature of their measurements.

      We thank the reviewer for their perceptiveness in finding that though the forces are interesting, the interactions themselves are the most noteworthy result here. We trust that with the changes made in our revised manuscript, the description is now more “down-to-earth”, more concise where appropriate, and accurate as to which results are particularly important and novel.

      (2.5) It would be good to highlight the strength of the paper -- the discovery of new locomotion dynamics with high-resolution microscopy -- by describing it in simple qualitative language. One key discovery is the broad but shallow anchoring of the posterior body when the anterior body undertakes a "head sweep". Another discovery is the tripod indentation at the tail at the beginning of peristalsis cycles.

      We thank the reviewer for this recommendation. We agree that including a more explicit statement of some of our findings, especially with regards to these new posterior tripod structures and the whole-abdomen preparatory anchoring prior to head sweeps, would make the paper more impactful. As a result, we have modified the discussion section to include a statement for each new result and have also amended our abstract as a result (lines 407-416):

      “Here we have provided new insights into the behaviour of Drosophila larval locomotion. We have provided new quantitative details regarding the GRFs produced by locomoting larvae with high spatiotemporal resolution. This mapping allowed the first detailed observations of how these animals mitigate friction at the substrate interface and thus provide new rules by which locomotion is achieved. Further, we have ascribed new locomotor function to appendages not previously implicated in locomotion in the form of tripod papillae, providing a new working hypothesis of how these animals initiate movement. These new principles underlying the locomotion outlined here may serve as useful biomechanical constraints as called for by the wider modelling community (39).”

      (2.6) As far as I know, these anchoring behaviors are new. It is intuitively obvious that anchoring has to occur, but this paper describes the detailed dynamics of anchoring for the first time. Anchoring behavior now has to be included in the motor sequence for Drosophila larva locomotion in any comprehensive biomechanical or neural model.

      We agree with the reviewer on this. We think it is best to let our colleagues reflect on our findings and then decide how best to include them in future models.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please be sure to describe in a figure caption or in the methods the details of the optical setup, especially the focal lengths of all the lenses, including the objective, and part numbers of the LEDs and filters. It would be helpful to have a figure in the main paper explaining the principles of ERISM/WARP microscopy along with the calibration measurements and computational pipeline (this would mainly combine elements already in the supplement). Such a figure should also include details of the setup that are alluded to in the methods but not fully explained (for instance, a "silicone well" is referred to in the methods but never described). The calibration of elastomer stiffness that now appears in the main text could be made a supplementary figure, unless there is some new art in the fabrication of the elastomers that should be highlighted as an advance in the main text.

      We appreciate the importance of explaining our methods to readers.

      In response to the public comments, we have added further details in our methods section to clarify practical aspects and ensure that readers will be able to reproduce our work.

      In Supplemental Figure 2, we show the full optical light path for ERISM and WARP along with named components. In addition, the principles of ERISM and WARP microscopy have already been extensively described in previous publications (See Refs 23-26). In light of this, we feel that the best approach in this paper is to direct readers to those publications.

      We feel that it is appropriate to present the calibration of elastomer stiffness in the main text because this is indeed a new innovation that is not just about making the elastomers but making force sensors based on these different materials. This is really important because it shows how researchers can tune the stiffness of an ERISM/WARP elastomer to match the type of tissue or organism under study. This is really the key technical advance that enables whole animal biomechanics across a range of animal sizes, so we think it is appropriate to keep it in the main text.

      We want to make sure that we do not oversell this point, and we feel that we make it sufficiently clear in the main text of our manuscript that making elastomer based force sensors of appropriate stiffness is important, when we state

      “First, we developed optical microcavities with mechanical stiffnesses in the range found in hydrogel substrates commonly used for studying Drosophila larval behaviour, i.e. Young’s modulus (E) of 10-30kPa (36–38).” (p. 5, ll. 124) and later

      “Here we used Drosophila larvae as a test case, but our methods now allow elastic optical resonators to be tuned to a wide range of animal sizes and thus create new possibilities for studying principles of neuro-biomechanics across an array of animals.” (p. 12, ll. 337)

      I would appreciate a description of the "why" behind some experimental choices, as understanding the motivation would be helpful for other researchers looking to adopt these techniques.

      We have now added additional text in the introduction and discussion that explains the rationale behind our experimental choices. in more detail. Please see our response to Reviewer 1’s public comments on the same point.

      (1) The WARP and ERISM experiments were conducted on a collagen coated gold surface rather than agarose. Why? EG does agarose not adhere to the gold, or would its thickness interfere with the measurement?

      The gold layer is applied above the elastomer and the collagen on top of the gold layer makes the gold a more natural biological surface for the animals. Agarose is unsuitable as an elastomer because it would dry during the vacuum based deposition of the gold. It is also unsuitable as a surface coating on top of the gold as the coating on the gold needs to very thin to preserve the spatial and mechanical resolution of our sensors. Further, processing of agarose generally requires temperatures of 60°C and higher which we find can damage the elastomer / gold films.

      (2) The ERISM measurements are made on a cold anesthetized animal right as it starts to wake up (visible mouth-hooks movement), which presents some difficulty. Why not start imaging while the animal is still completely immobile? Or why not use a dead larva?

      This approach allowed us to get measurements of forces exerted by denticles that are physiologically and biomechanically accurate. In dead or fully anesthetized animals, one cannot be sure that the forces exerted by denticles and denticle bands are representative of the forces exerted by an animal with active hydrostatic control.

      (3) In the ERISM setup the monochromator is spatially filtered by focusing through pinhole, while in the WARP setup, the LEDs are not.

      Yes that’s correct. The LED light sources used in WARP have better spatial homogeneity than the tungsten filament used in ERISM and so a pinhole is not required in WARP.

      (4) SV4 shows the interference image of a turning larva (presumably from one illumination wavelength) rather than a reconstruction of the displacement or stresses. Why?

      We felt that in this particular case the interference images provided a clearer representation of the behavioural sequence, showing both the small indentations generated by individual denticles and the larger indentations of the animal overall.

      Lines 49-50 "a lack of methods with sufficient spatiotemporal resolution for measuring GRFs in freely behaving animals has limited progress." This needs a discussion of what sufficient spatial and temporal resolutions would be and how existing methods fall short of these goals.

      We have now rewritten the introduction to include an overview of other alternative approaches and of what we see as the requirements here. See our response to the public comments.

      Figure caption 1B (line 789) refers to "concave areas of naked cuticle (black line) which generally do not interact with the substrate" While I think this might be supported by later WARP images, it's not clear how the technique of figure 1 measures interaction, which could e.g. be mediated by surface tension of a transparent fluid.

      The technique of Figure 1 provides qualitative information which as the reviewer points out is validated by WARP measurements later.

      Lines 184-189 "However, unexpectedly, we observed an additional force on the substrate when protopodia leave the substrate (SI) and when they are replanted (ST). To investigate whether this force was due to an active behaviour or due to shifting body mass, we plotted integrated displacement (i.e. displaced volume) against the contact area for each protopodium, combining data from multiple forwards waves (Figure 5B). Area is correlated with displaced volume for most time points, indicating that volume is a consequence of mass in a 2nd order polynomial relationship." I couldn't follow this argument at all.

      We have now reworded this section and explained our rationale. Also see our response to a similar critique in Reviewer 2’s public comments.

      Generally the authors might reconsider their use of acronyms. e.g. (244-246) "SI latencies were much more strongly correlated with wave duration across most segments than ST latencies. SIs scale with SwP and this could be mediated by proprioceptor activity in the periphery" is made more difficult to parse by the abbreviations.

      As we need to refer to these terms multiple times throughout the manuscript, we feel the use of acronyms is appropriate here.

      The video captions are inadequate. Please expand on them to explain clearly what is shown, and also describe in the methods how the data were acquired and processed. For instance, it seems that in SV3 a motion correction algorithm is applied so that the larva appears stationary even as it crawls forward. I think "fourier filtered" means that the images were processed with a spatial high pass filter - this should be explained and the parameters noted.

      We have revisited the video captions provided in the supplementary information document and conclude that these contain the important information. The mode of acquisition are described in the methods, e.g. Video 1 and 2 see section in Methods on “Denticle band kinematic imaging” and Videos 3 and 4 see section in Methods on WARP. Supplementary Video 3 does not make use of motion correction; indeed, one can see the larvae moving upwards/forwards in the field of view. We apologize for not explaining the Fourier filtering process for Video 3. We have now modified the video caption to read as follows:

      Video SV3. WARP imaging during forwards peristalses.

      Video showing high frame rate displacement maps produced by a freely behaving Drosophila larva. Displacement maps were Fourier filtered to make denticulated cuticle more readily visible and projected in 3D to show the effects of substrate interaction. Details of the Fourier filtering procedure were described elsewhere [Kronenberg et al, Nat Cell Biol 19, 864–872 (2017)].

      What were the reflectances of the bottom (10 nm Au/Cr) and top (15nm Au) metal layers at the wavelengths used? I imagine the bottom layer should be less than 38%, the top layer higher, and the product of the square of the bottom transmission and the top reflectance coefficients equal to the bottom reflectance (to make the two paths of the interferometer contribute equal intensity), but none of this is stated.

      The reflectance of the gold mirrors was studied in detail in prior work on ERISM. See Kronenberg et al, Nat Cell Biol 19, 864–872 (2017). We therefore refrained from adding a complete optical characterization of the ERISM sensors again here. In brief, we found that a reflectance >13% at each Au mirror is required for reliable ERISM measurements.

      The description of the gold coated elastomer as a microcavity is confusing to me. Does the light really make multiple round trips between the plates before returning to the detector? The loss of light on each round trip would depend on the reflectance and parallelism of the top and bottom mirrors. From the WARP calculation it's appears that there is only one round trip - a pi/2 phase shift results from the calculation for one round trip: 2pi*2nL 5nm/(630nm)^2, with n = 1.4 and L = 8 microns - if there were two round trips, the phase shift would be pi etc. Would this better be described as a mostly common path interferometer?

      The physics of our devices is best described within the framework of thin film interference and (weak) microcavity optics. Indeed, light can make multiple roundtrips, though it gets attenuated with each reflection. The complete calculation of the multiple roundtrips is only required to obtain quantitative information on the amount of light that is reflected. The spectral position of minima in reflectance can also be obtained from assuming one roundtrip which is what is done in the description of the WARP calculations.

      Figure 2 e,f: the line fits appear to be dominated by the data points at 2 s. If these are removed, do the fits change? To support the argument that 2e shows a correlation and 2f does not, some kind of statistical test, ideally a hierarchical bootstrap, should be conducted to compare between the two measurements.

      If we remove the data points at 2 s, then R^2’s for swing initiation latencies change as follows: A2: 0.35 to 0.005; A4: 0.78 to 0.31; A6: 0.61 to 0.01. The data in 2e,f are the averages from 3 waves in each animal and so the data points at 2 s are not simply the result of single ‘rogue’ waves but rather averages of several trials. Further, if all individual waves are plotted, we can see that the overall trends are still visible.

      We don’t think it is appropriate to remove the data at 2 s from our analysis, but we take the point regarding statements about presence or absence of correlation in a formal sense. We have therefore changed the wording in the description of 2e,f to refer simply to the fact that wave duration can ‘largely determine' latencies in some instances, but is less able to in other instances, as is suggested by the R^2 (coefficient of determination) data. In discussion, we have also adjusted our wording.

      Figure 4 - please provide in the main figure or as a supplement the full images (i.e. not cropped to the assumed shape of the larva)

      We do not feel that it is necessary or helpful to provide the full images given that the focus of the analysis is on dynamics of protopodia movements.

      Figure 5e top: single data points around wave duration 0.6s appear to dominate fit lines. Does removing these points alter the fits? To support the argument that 5e top shows a correlation and 5e bottom does not, some kind of statistical test, ideally a hierarchical bootstrap, should be conducted to compare between the two measurements.

      In Figure 5e, we are showing all waves analysed across animals. If we remove the datapoints at 0.6 s, A2 R^2 changes from 0.24 to 0.05, A4 R^2 changes from 0.48 to 0.11, A6 R^2 changes from 0.69 to 0.34; however we don’t feel it is appropriate to remove these data from our analysis. We take the point about needing to be cautious about making claims about correlation versus no correlation and have now reworded description of these results along same lines as Figure 4.

      It appears from the methods (467-489) that animals were kept wet for warp imaging but not for ERISM imaging. Please confirm or explain further the presence or absence of a water layer in these two sets of measurements, as this could affect the adhesion forces.

      In each case, the animals were transferred onto experimental substrates with a moistened paintbrush. We have added text explicitly stating this in the methods section.

      Kim et al. Nature Methods 2017 (10.1038/nmeth.4429) describes recording two images separated by less than 60 microseconds using a scientific CMOS camera with a frame rate of 200 Hz. This is accomplished by triggering a pulsed LED once at the end of one frame's capture window and then a second time at the beginning of the next frame's window (see Supplementary Figure 10). I'm not sure if this trick is widely known, but it's worth considering if the authors are running into a problem with movement between the two wavelength exposures in their WARP setup.

      Thank you for this tip. We will take this under consideration for future work.

      Is the setup compatible with optogenetics? (EG is the red light dim enough that it wouldn't activate CsChrimson, or could a longer wavelength led be used for interferometry?) If so, activation of mooncrawler descending neuron (MDN) could be used to study backward crawling (or thermogenetic activation of MDN), e.g. to contrast the sites and order of "anchoring" between the two directions of crawling.

      The set-up is potentially compatible with optogenetics. We are in the process of exploring this in current ongoing work.

      Reviewer #2 (Recommendations For The Authors):

      Simplify/reduce the commentary about force measurements, and highlight the clear, qualitative descriptions of the novel locomotion patterns that they have observed. The microscopy and movements seem to matter more than the ground force estimations.

      We have addressed these issues in our responses to Reviewer 2’s public comments.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank the reviewers for their valuable feedback which has improved this work greatly from its original form, and are elated to have such glowing reviews of the revised work published alongside the revised preprint. Reviewer 3 raises some final salient points, which deserve a brief address here.

      Teeth: We thank the reviewer for clarifying their points. We do make the assumption that the ecological parameter space of toothed and beaked organisms will be comparable. Both are governed by the same set of physical principles and have the jaw bone as the most likely point of failure (teeth are harder than bone, and keratinous rhamphothecae are malleable and can be regrown with relative ease when deformed). Differences in stress/strain distribution between toothed and beaked organisms will occur but are already accounted for in our methods as we model both the teeth and rhamphotheca and will observe these different effects. We have added an explicit statement of this hypothesis to the Methods section of the manuscript.

      Cranial kinesis: In our opinion, it is a safe assumption that the lower jaws of extant birds and enantiornithines are comparable. We do not see why the acquisition of kinesis in the upper jaw would generally affect the functional role of or constraints on the lower jaw. One possibility we discussed is that a quickly-moving kinetic premaxilla could let the lower jaw move a shorter distance during effective prey capture and lower the selection for speed (i.e. allow jaw-closing MA to remain higher). While we have added this possibility to our call for the investigation of cranial kinesis, we consider it too speculative to begin altering interpretations of fossil taxa. All raw measurement data remains available so that, if evidence is found for cranial kinesis having predictable effects on our measured parameters, future researchers can re-analyse our data and update any ecological predictions accordingly.

      Organization: To our knowledge eLife format incorporates what one would think of as a Conclusions section into the Discussion. Our Discussion section currently contains 18 subheadings which should guide a reader to any specific topic of interest. The Discussion also progresses from a more narrow to broad focus which we and several colleagues find intuitive.

      We thank all three reviewers once again for their feedback that has improved this work and their kind words throughout the process.


      The following is the authors’ response to the original reviews.

      We thank all three reviewers for their detailed reviews, and generally agree with their feedback. To accompany the reviewed preprint of this manuscript, we wished to respond to comments from the reviewers so that they (and the public) will know what we are planning to incorporate in the revised manuscript we are currently preparing. If there are any comments on our plans in the meantime, please let us know.

      • Reviewer 1, on concerns regarding identification of ontogenetic stage and comparison of taxa from different ontogenetic stages: It is fair to say that enantiornithine ontogeny is still poorly understood, though we believe all current evidence points to each specimen used in this study to being adequately mature for comparison to the extant birds used in the study. Stages of skeletal fusion are the standard method of assessing enantiornithine ontogeny (Hu and O'Connor 2017), and our comparison of histological work (Atterholt, Poust et al. 2021) to skeletal stages in Table S4 suggests a transition from juvenile to subadult in stage 0 or 1 and from subadult to adult within stage 3. Thus, the specimens we quantitatively examine in this study, all at stages 2 or 3 (Figure S10), are advanced subadults or adults. It is well-known that many living animals considered “adults” would be considered subadults or even juveniles to a palaeontologist (Hone, Farke et al. 2016). So, even if some individuals in this study are not fully skeletally mature, they should have obtained the morphology which they would possess for most of their lives and thus the morphology which undergoes selective pressure. We will add this context to the “Bohaiornithid Ontogeny” section and thank the reviewer for seeking more detail for this point.

      • Reviewer 2, on need of a context figure: We have an artistic life reconstruction of a bohaiornithid in preparation, and can include that in the revised manuscript as a figure.

      • Reviewer 2, on raptor claw categories: We explain these categories in-depth in a previous work (Miller, Pittman et al. 2023). However, we will now add a short summary of that explanation to this work so that this manuscript will become self-contained in this regard. In short, the “large raptor” category includes extant birds with records of regularly taking prey which cannot be encircled with the pes, while birds in the “small raptor” have no such records. As Reviewer 2 points out this does often follow phylogenetic lines, but not always. E.g. most owls specialise in taking small prey, but the great horned owl Bubo virginianus regularly takes mammals and birds larger than its pes (Artuso, Houston et al. 2020); and conversely we can only find reports of the common black hawk Buteogallus anthracinus taking prey samll enough for the pes to encircle (Schnell 2020) despite other accipiters frequently taking large prey. In both cases these taxa plot in PCA nearer to other large or small raptors (respectively) than to their phylogenetic relatives.

      • Reviewer 3, on teeth vs beaks: We are not aware of any foods which are exclusive to toothed or beaked animals. There are some aspects of extant bird biology that may affect the way a certain diet may need to be adapted to which we do comment on, e.g. discussion of alternatives to the crop and ventriculus for processing plant matter in the Bohaiornithid Ecology and Evolution section. For functional studies, e.g. FEA, we have included the rhamphotheca in toothless models which serves the same role as teeth, to be a feeding surface. It should not matter, in theory, if the feeding surface is hard or soft as mechanical failure occurs in high stress/strain states regardless of the medium. If having teeth necessarily increases or decreses overall stress/strain relative to a beak (and from our work this does not appear to be the case), this would in turn necessarily limit dietary options. So, all models in our work should be directly comparable.

      As an additional note on this topic, we address tooth shape in bohaiornithids at the end of the Bohaiornithid Ecology and Evolution section. We specifically note that their tooth shape is likley controlled by phylogeny in the current version, though we will add a note in the upcoming version that the morphospace of bohaiorntihid teeth overlaps that of many other clades with purportedly diverse diets, which is consistent with a hypothesis of diverse diets within the clade.

      • Reviewer 3, on cranial kinesis: Our FE models should be unaffected by cranial kinesis, as these are two-dimensional and model the akinetic lower jaw only. Some mediolateral kinesis may be relevant in the mandible in the form of “wishboning” in different taxa, but its prevalence in extant birds is currently unknown. The preservation of enantiornithines (two-dimensionally and typically in lateral view) limits the ability to capture any mediolateral function regardless.

      Our models of mechanical advantage do not account for any cranial kinesis. This is a necessary simplifcation. The nature of cranial kinesis in extant birds, and the role that it plays in feeding, is poorly understood. Cranial kinesis will increase gape, but we don’t yet know how/if it affects jaw closing force and speed (moreover, given the variation in quadrate and hinge morphology present in extant birds, this is also something that is likely to be highly diverse). We have therefore modelled the extant birds’ jaw closing systems as having one, akinetic out lever (the jaw joint to the bite point), to match the situation in our fossil taxa. This is a common simplification that has been used previously with success (Corbin, Lowenberger et al. 2015, Olsen 2017). However, we acknowledge that this simplification may introduce some error. Unfortunately, until the mechanics of cranial kinesis – and the variation in the anatomy and performance of kinetic structures in extant birds – are better understood, we cannot determine exactly what that error looks like. We therefore have greater confidence in the inter-species comparability this conservative, akinetic approach (in other words, we may not be making assumptions that are 100% accurate, but we are at least making the same assumption across all taxa, so it should be comparable in its error). We will add a section in the Mechanical Advantage and Functional Indices discussion calling for further research into the mechanics of cranial kinesis so future mechanical advantage work in birds can take this matter into account.

      • Reviewer 3, on skull reconstruction: This issue is partly addressed in the Bohaiornithid Skull Reconstruction section, though we agree that adding more mentions of it in the MA and FEA Discussion sections and the Bohaiornithid Ecology and Evolution sections will benefit the manuscript. Most notably Shenqiornis and Sulcavis have similar ecological interpretations, but much of the Shenqiornis skull reconstruction uses Sulcavis bones. Longusunguis is the only other taxon which takes more than two bones from a different taxon, and in this case all but the quadrate are not used in any quanitative measurements. We have ensured that the skull reconstructions presented in Figure 2 show what portions of the skull come from what specimen so that as new material is discovered and phylogenetic relationships are updated it will be clear to future readers which parts of reconstructions will need to be updated.

      • Reviewer 3, on data availability: All data including FEA models and raw measurement data are included in the same repository as the scripts, which we will make clear in the manuscript. Good catch on the data link being dead, we will publish it now.

      As a final note, it was brought to our attention by another colleague that the original manuscript’s ancestral state reconstrction lacked an outgroup. An updated reconstruction using Sapeornis as an outgroup will be included in the revised manuscript. The addition of the outgroup does not change any conclusions of the manuscript.

      We once again thank our reviewers for their valuable feedback and will submit a revised version of this manuscript for publication shortly. Please let us know if you have any additional comments after reading our response that we can take onboard in our revision.

      References

      Artuso, C., C. S. Houston, D. G. Smith and C. Rohner (2020). Great Horned Owl (Bubo virginianus), version 1.0. Birds of the World. A. F. Poole. Ithaca, NY, USA, Cornell Lab of Ornithology.

      Atterholt, J., A. W. Poust, G. M. Erickson and J. K. O'Connor (2021). "Intraskeletal osteohistovariability reveals complex growth strategies in a Late Cretaceous enantiornithine." Frontiers in Earth Science 9: 640220.

      Corbin, C. E., L. K. Lowenberger and B. L. Gray (2015). "Linkage and trade‐off in trophic morphology and behavioural performance of birds." Functional ecology 29(6): 808-815.

      Hone, D. W. E., A. A. Farke and M. J. Wedel (2016). "Ontogeny and the fossil record: what, if anything, is an adult dinosaur?" Biology letters 12(2): 20150947.

      Hu, H. and J. K. O'Connor (2017). "First species of Enantiornithes from Sihedang elucidates skeletal development in Early Cretaceous enantiornithines." Journal of Systematic Palaeontology 15(11): 909-926.

      Miller, C. V., M. Pittman, X. Wang, X. Zheng and J. A. Bright (2023). "Quantitative investigation of Mesozoic toothed birds (Pengornithidae) diet reveals earliest evidence of macrocarnivory in birds." iScience 26(3): 106211.

      Olsen, A. M. (2017). "Feeding ecology is the primary driver of beak shape diversification in waterfowl." Functional Ecology 31(10): 1985-1995.

      Schnell, J. H. (2020). Common Black Hawk (Buteogallus anthracinus), version 1.0. Birds of the World. A. F. Poole and F. B. Gill. Ithaca, NY, USA, Cornell Lab of Ornithology.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current work by Kulich et al. examines the dynamic relocalization of NGR1 (LAZY2) a member of the LAZY protein family which is key for auxin redistribution during gravitropic responses. After gravistimulation of the triple mutant ngr123 (lazy234), the PIN3 activating kinase D6PK is not polarized in the columella cells.

      Strengths:

      The authors show a thorough characterization of NGR1 relocalization dynamics after gravistimulation.

      Weaknesses:

      Genetically the relocalization of D6PK depends on the LAZY protein family, but some essential details are missing in this study. On the one hand, NGR1-GFP does not associate with the BFA compartments and maintains its association with the PM and amyloplasts. On the other hand, D6PK relies on GNOM, via vesicle trafficking sensitive to BFA, suggesting that D6PK follows a different relocalization route than NGR1 which is BFA-insensitive. Based on these observations, D6PK relocalization requires the LAZY proteins, but D6PK and NGR1 relocalize through independent routes. How can this be interpreted or reconciled?

      Response: Since we demonstrated that D6PK does not relocalize in the absence of NGR proteins, we conclude that NGR1 acts upstream of D6PK. The molecular mechanism driving this interaction is not fully understood; however, it is evident that NGR1 triggers the mobilization of D6PK. Despite previous investigations into D6PK mobility, the underlying mechanisms remain elusive. Notably, despite its sensitivity to BFA, D6PK does not localize to BFA bodies and does not undergo conventional endocytosis (https://doi.org/10.1016/j.devcel.2014.05.006). We fully acknowledge the importance and interest in gaining a better understanding of these processes, and it will be a focal point of our future research.

      Two other works (now published) provide valuable and fundamental findings related to the mechanism examined in the current manuscript and display complementary and similar results to the ones shown in the current manuscript. Given the similarities in the examined mechanisms, these preprints should be referenced, recognized, and discussed in the manuscript under review. It is assumed that the three projects were independently developed, but the results of these previous works should be addressed and taken into account at least during the discussion and when drawing any conclusions. This does not mean that this work is less relevant. On the contrary, some of the observations that seem to be redundant are more solid, and firm conclusions can now be drawn from them.

      Response: We have included and discussed these works in the revised discussion

      Reviewer #2 (Public Review):

      Summary:

      This manuscript addresses what rapid molecular events underly the earliest responses after gravity-sensing via the sedimentation of starch-enriched amyloplasts in columella cells of the plant root cap. The LAZY or NEGATIVE GRAVITROPIC RESPONSE OF ROOTS (NGR) protein family is involved in this process and localizes to both the amyloplast and to the plasma membrane (PM) of columella cells.

      The current manuscript complements and extends Nishimura et al., Science, 2023. Kulich and colleagues describe the role of the LZY2 protein, also called NGR1, during this process, imaging its fast relocation and addressing additional novel points such as molecular mechanisms underlying NGR1 plasma membrane association as well as revealing the requirement of NGR1/LZY2, 3,4 for the polar localization of the AGCVIII D6 protein kinase at the PM of columella cells, in which NGR1/LZY2 acts redundantly with LZY3 and LZY4.

      The authors initially monitored relocalization of functional NGR1-GFP in columella cells of the ngr1 ngr2 ngr3 triple mutant after 180-degree reorientation of the roots. Within 10 -15 min NGR1-GFP signal disappeared from the upper PM after reorientation and reappeared at the lower PM of the reoriented cells in close proximity to the sedimented amyloplasts. Reorientation of NGR1-GFP occurred substantially faster than PIN3-GFP reorientation, at about the same time or slightly later than a rise in a calcium sensor (GCaMP3) just preceding a change in D2-Venus auxin sensor alterations. Reorientation of NGR1-GFP proved to be fast and not dependent on a brefeldin A-sensitive ARF GEF-mediated vesicle trafficking, unlike the trafficking of PIN proteins, like PIN3, or the AGCVIII D6 protein kinase. Strikingly, the PM association of NGR1-GFP was highly sensitive to pharmacological interference with sterol composition or concentration and phosphatidylinositol (4)kinase inhibition as well as dithiothreitol (DTT) treatment interfering with thioester bond formation e.g. during S-acylation. Indeed, combined mutation of a palmitoylation site and polybasic regions of NRG1 abolished its PM but not its amyloplast localization and rendered the protein non-functional during the gravitropic response, suggesting NRG1 PM localization is essential for the gravitropic response. Targeting the protein to the PM via an artificially introduced N-terminal myristoylation and an ROP2-derived polybasic region and geranylgeranylation site partially restored its functionality in the gravitropic response.

      Strengths:

      This timely work should be of broad interest to plant, cell and developmental biologists across the field as gravity sensing and signaling may well be of general interest. The point that NGR1 is rapidly responsive to gravistimulation, polarizes at the PM in the vicinity to amyloplast and that this is required for repolarization of D6 protein kinase, prior to PIN relocation is really compelling. The manuscript is generally well-written and accessible to a general readership. The figures are clear and of high quality, and the methods are sufficiently explained for reproduction of the experiments.

      Weaknesses:

      Statistical analysis has been performed for some figures but is lacking for most of the quantitative analyses in the figure legends.

      Response: We added this information to the figure legends

      The title claims a bit more than what is actually shown in the manuscript: While auxin response reporter alterations are monitored, "rapid redirection of auxin fluxes" are not really directly addressed and, while D6PK can activate PIN proteins in other contexts, it is not explicitly shown in the manuscript that PIN3 is a target in the context of columella cells in vivo. A title such as "Rapid redirection of D6 protein kinase during Arabidopsis root gravitropism relies on plasma membrane translocation of NGR proteins" would reflect the results better.

      Response: We modified the title to Rapid translocation of NGR proteins driving polarization of PIN-activating D6 protein kinase during root gravitropism

      Fig. 4: The point that D6PK is transcytosed cannot be made here based on the data of these authors. They should have used a photoswitchable version of NGR1 to show that the same molecules observed at the upper PM are translocated to the lower PM. Nishimura and colleagues actually did that for NGR4. However, this is a lot of work and maybe for NGR1 that fusion would have too low fluorescence intensity (as it was the case for NGR3). So, I think a rewording would be sufficient such as NGR-dependent reorientation of D6PK plasma membrane localization" as this does not say, from where it comes to the lower PM. Theoretically, the signal could also be amyloplast-derived or newly synthesized (or just folded) NGR1-GFP.

      Response: We fully agree and rephrased the text using translocation instead of transcytosis

      The authors make a model in which D6PK AGCVIII kinase-dependent on NGRs activates PIN3 to drive auxin fluxes. However, alterations in auxin responses are observed prior to PIN3 reorientation. They should explain this discrepancy better and clearly describe that this is a working hypothesis for the future rather than explicitly proven, yet.

      Reviewer #3 (Public Review):

      The mechanism controlling plant gravity sensing has fascinated researchers for centuries. It has been clear for at least the past decade that starch-filled plastids (termed statoliths) in specialised gravity-sensing columella cells sense changes in root orientation, triggering an asymmetric auxin gradient that alters root growth direction. Nevertheless, exactly how statolith movement triggers PIN auxin efflux carrier activation and auxin gradient formation has remained unclear until very recently. A series of new papers (in Science and Cell) and this manuscript report how LAZY proteins (also referred to as NEGATIVE GRAVITROPIC 50 RESPONSE OF ROOTS; NGR) play a pivotal role in regulating root gravitropism. In terms of their overall significance, their collective findings provide seminal insights into the very earliest steps for how plant roots sense gravity which are arguably the most important papers about root gravitropism in the past decade.

      In the current manuscript, Kulich et al initially report (through creating a functional NGR1-GFP reporter) that "NGR1-GFP displayed a highly specific columella expression, which was most prominent at the PM and the statolith periphery." Is NGR1-GFP expressed in shoot tissues? If yes, is it in starch sheath (the gravity-sensing equivalent of root columella cells)? The authors also note "NGR1-GFP signal from the PM was not evenly distributed, but rather polarized to the lower side of the columella cells in the vicinity of the sedimented statoliths (Fig. 1A)." and (when overexpressing NGR-GFP) "chloroplasts in the vicinity of the PM strongly correlated with NGR1 accumulating at the PM nearby, similar to the scenario in columella" suggesting that NGR1 does not require additional tissue-specific factors (i.e. trafficking proteins or lipids) to assist in its intracellular movement from plastid to PM.

      Response: Yes, NGR1, also called LAZY2 is expressed in the inner hypocotyl tissues, according to https://doi.org/10.1104/pp.17.00942. Unfortunately, we saw very little signal with our NGR-GFP construct, possibly due to NGR1-GFP weak signal and/or NGR1 being expressed only exclusively in the inner tissues.

      Next, the authors study the spatiotemporal dynamics of NGR1-GFP re-localisation with other early gravitropic signals and/or components Calcium, auxin, and PIN3. The temporal data presented in Figure 1 illustrates how the GCaMP calcium reporter (in panel E) revealed "the first signaling event in the root gravitropic bending is the statolith removal from the top membrane, rather than its arrival at the bottom" It appeared that the auxin DII-VENUS reporter was also changing rapidly (panel G) - was this detectable BEFORE statolith re-sedimentation?

      Response: In our data (Figure 1G), we observe that the increase in signal at the top side begins prior to starch sedimentation, in contrast to the bottom side, where the decrease starts only after starch grains land on the bottom membrane. While this observation aligns with our hypothesis and other data, we refrained from commenting on it due to the small differences between the first 2-3 timepoints, which are obscured by noise. This phenomenon arises because the DII response relies on protein degradation and is relatively slow. Hence, for rapid tracking of the auxin response, we utilized auxin-induced calcium as a proxy, with NPA treatment serving as a negative control.

      Please can the authors explain their NPA result in Fig 1E? Why would treatment with the auxin transport inhibitor NPA block Ca signalling (unless the latter was dependent on the former)?

      Response: Auxin induces rapid calcium transients (e.g., http://dx.doi.org/10.1016/j.cub.2015.10.025). Consequently, when auxin reaches the bottom elongation zone approximately 5-6 minutes after rotation, we observe an increased GCaMP signal at this location. Notably, when we inhibit PIN function using NPA, the GCaMP signal persists, but the difference between the top and bottom diminishes. This validates that the calcium transients at the bottom side can be interpreted as monitoring increase in auxin accumulation as a result of auxin transport.

      They go on to note "This initial auxin asymmetry is mediated by PIN-dependent auxin transport, despite visible polarization of PIN3 can be detected only later" which suggests that PIN activity was being modified prior to PIN polarisation.

      In contrast to other proteins involved in gravity response like RLDs and PINs, NGR1 localization and gravity-induced polarization does not undergo BFA-sensitive endocytic recycling by ARF-GEF GNOM. This makes sense given NGR1 is initially targeted to plastids, THEN the PM. Does NGR1 contain a cleavable plastid targeting signal? The authors go on to elegantly demonstrate that NGR1 PM targeting relies on palmitoylation through imaging and mutagenesis-based transgenic ngr rescue assays.

      Response: Yes, there is weakly conserved plastid targeting signal on NGR1. Although we also started researching in this direction, we quickly realized, that two other groups showed very comprehensive data regarding NGR plastid localization.

      Finally, the authors demonstrate that gravitropic-induced auxin gradient formation is initially dependent on PIN3 auxin efflux activation (prior to PIN3 re-localisation). This early PIN3 activation process is dependent on NGR1 re-targeting D6PK (a PIN3 activating kinase). This elegant molecular mechanism integrates all the regulatory components described in the paper into a comprehensive root gravity sensing model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Line 83: This construct fully rescued the agravitropic bending phenotype of the ngr1/2/3 triple mutant (see further).

      What does it mean the see further in this context?

      Response: It is a reference to the second part of the manuscript (Fig. 3, Supplementary Fig S3, Fig S4), where we extensively address the complementation with wild type and point mutated versions of NGR. There we show that the construct we are using is functional. This does not prove, but strongly imply that the GFP signal we obtain is relevant. We updated the text to point this out.

      Line 101: Timing of events during the gravitropic response

      When describing the equipment employed and the rotation applied to the samples, "the vertical stage microscope and minimized the time required for rotating the sample. 180{degree sign} rotation..."

      The authors mentioned a travel time of 5 minutes first and later of 15 minutes for the relocalization of NGR1. Are these two different experiments? Were there two different rotation angles or degrees applied? Could the authors please rephrase this part of the description to answer these questions and help the reader understand how the assay performed?

      Response: We added this explanation to the text.

      Figure 1 E, F, and G.

      Could the authors please provide pictures and/or videos for the PIN3 localization dynamics, intracellular calcium transients, and auxin reporter DII-Venus? In other words, show the complementing images for Figure 1E, 1F, and 1G as the authors did for Figure 2D where authors presented the pictures and the corresponding quantification plots.

      Response: We wanted to avoid overcrowding the figure, but we would also love to show the videos. Therefore, we did additional supplementary movie 3, where we put all the additional observations.

      Line 194: This implies the existence of posttranslational modifications such as S-acylation to associate with PM.

      Why is this specific modification suggested/examined and no other modification? What is the criteria to select this kind of modification? Based on what premises? Could the authors elaborate on that? Could the authors please include references?

      Response: Thank you for this comment. We of course first checked the prediction tools which have shown very strongly conserved S-acylation side. We now clarified this in the text and added other modifications as an example. Later on, we rule out myristoylation (that happens on the glycins) and prenylation (it happens only at the C-terminus CAAX box).

      Line 255: NGR1 PM localization is synergistically mediated by polybasic regions and a palmitoylation site

      Similarly to the previous commentary, How and why are these regions examined/analyzed? Likewise, why is the palmitoylation site selected? Please provide some background, criteria, and references.

      Response: Here, we clearly state that the prediction of the palmitoylation site is made based on the GPS lipid prediction tool.

      As for the polybasic region, these can be seen upon manual inspection of the primary protein sequence. We simply looked at the protein and saw it there. We rephrased the text so that it is more clear.

      Reviewer #2 (Recommendations For The Authors):

      Please, proofread the manuscript for style and minor language errors.

      Statistical analysis has been performed for some figures but is lacking for most of the quantitative analyses in the figure legends. Where it has been performed it is not given what "n" number of roots, cells, or plasma membranes were analyzed NGR1-GFP and no information is given whether the data is derived from a representative experiment or several or pooled data from several experiments. This certainly requires revision in Fig. 1D-G, Fig. 2B-D, Fig. S2 B,E, Fig. 3B,D, F-H, Fig. S.3 B,D, Fig. S. 4 ,E-H, Fig. 4 D.

      Response: Thank you, we added this information to the figure legends.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We appreciate the care and the detail shown by the Reviewers. Their comments have made our article more focused and more accessible to a general audience.

      We would like to begin with a comment about the last sentence of the “eLife assessment”. The evolution of metamorphosis in insects was a major triumph in animal evolution that subsequently impacted almost every aspect of plant and animal evolution in the terrestrial and freshwater aquatic biospheres. Unlike the metamorphoses of most other groups, whose evolutions are lost in time, insect evolution arose relatively recently (~400 mya) and insect orders have branched off at various points in this evolution and have persisted to modern times. Although these “relic” groups also have undergone millions of years of evolution and specialization, they still provide us with windows into how this progression may have come about. The study of these groups provides a unique opportunity to explore the mechanisms that underlie major life history shifts and should be of interest to anyone interested in evolution – not just entomologists.

      Reviewer #1 (Public Review):

      Summary:

      This paper provides strong evidence for the roles of JH in an ametabolous insect species. In particular, it demonstrates that:

      • JH shifts embryogenesis from a growth mode to a differentiation mode and is responsible for terminal differentiation during embryogenesis. This, and other JH roles, are first suggested as correlations, based on the timing of JH peaks, but then experimentally demonstrated using JH antagonists and rescue thereof with JH mimic. This is a robust approach and the experimental results are very convincing.

      • JH redirects ecdysone-induced molting to direct formation of a more mature cuticle

      • Kr-h1 is downstream of JH in Thermobia, as it is in other insects, and is a likely mediator of many JH effects

      • The results support the proposed model that an ancestral role of JH in promoting and maintaining differentiation was coopted during insect radiations to drive the evolution of metamorphosis. However, alternate evolutionary scenarios should also be considered.

      Strengths:

      Overall, this is a beautiful, in-depth student. The paper is well-written and clear. The background places the work in a broad context and shows its importance in understanding fundamental questions about insect biology. The researchers are leaders in the field, and a strength of this manuscript is their use of a variety of different approaches (enzymatic assays, gene expression, agonists & antagonists, analysis of morphology using different types of microscopy and detection, and more) to attack their research questions. The experimental data is clearly presented and carefully executed with appropriate controls and attention to detail. The 'multi-pronged' approach provides support for the conclusions from different angles, strengthening conclusions. In sum, the data presented are convincing and the conclusions about experimental outcomes are well-justified based on the results obtained.

      Weaknesses:

      This paper provides more detail than is likely needed for readers outside the field but also provides sufficient depth for those in the field. This is both a strength and a weakness. I would suggest the authors shorten some aspects of their text to make it more accessible to a broader audience. In particular, the discussion is very long and accompanied by two model figures. The discussion could be tightened up and much of the text used for a separate review article (perhaps along with Figure 11) that would bring more attention to the proposed evolution of JH roles.

      We appreciate the comments about the strengths and weaknesses of the paper. To deal with the weaknesses, we have condensed some of the Results to make them less cumbersome and the Discussion has been completely revised, keeping a sharp focus on the actions of JH in Thermobia embryos and how these actions relate to the status quo functions of JH in insects with metamorphosis. As part of the revision of the Discussion, we have replaced Figures 10 and 11.

      Reviewer #1 (Recommendations For The Authors):

      In keeping with my public review, this paper is very strong and I have very few suggestions for improvement. They are:

      (1) Thermobia are extant insects and are not ancestral insects. It is likely that they retain features found in an insect ancestor. However, these insects have been evolving for a very long time, and for any one feature, many changes may have occurred, both gain and loss of gene function and morphology. Further, even for morphological features present in an extant species that are the same as an ancestor, genetic pathways regulating this feature may have changed over time (see for examples papers from the Haag and Pick labs). Although I realize this is a small, possibly almost semantic point, I feel it is important to be precise here. For example, in the title, "before" is speculative as there could have been a different role in the ancestor with the role in embryogenesis arising in lineages leading to Thermobia; similarly in the abstract, "this ancestral role of JH' is an overstatement since we cannot actually measure the ancestral role.

      Since the title has already been cited in a Perspectives review, we decided to keep the title as is.

      (2) I don't understand the results in Met and myo in Fig. 3B. Perhaps include them in the explanation of Fig.3 and not after the description of Fig. 4 and explain them in more detail (or perhaps not include them at all?). I don't really understand the statistical analysis of these panels either.

      We have revised the figure legends to explain the statistics.

      (3) Another point regarding language - talking about the embryo being "able" to go through a developmental stage implies decision-making. I would suggest dropping that wording (e.g, in the description of Fig. 5C). Similarly, in explaining Fig. 6B, it would be more correct to say "JH treatment no longer inhibited" than as written "could no longer inhibit" (implying 'no matter how hard it tried, it still couldn't do it')

      We have removed the “can’t” wording. Figure 6 has been revised

      Reviewer #2 (Public Review):

      The authors have studied in detail the embryogenesis of the ametabolan insect Thermobia domestica. They have also measured the levels of the two most important hormones in insect development: juvenile hormone (JH) and ecdysteroids. The work then focuses on JH, whose occurrence concentrates in the final part (between 70 and 100%) of embryo development. Then, the authors used a precocene compound (7-ethoxyprecocene, or 7EP) to destroy the JH producing tissues in the embryo of the firebrat T. domestica, which allowed to unveil that this hormone is critically involved in the last steps of embryogenesis. The 7EP-treated embryos failed to resorb the extraembryonic fluid and did not hatch. More detailed observations showed that processes like the maturational growth of the eye, the lengthening of the foregut and posterior displacement of the midgut, and the detachment of the E2 cuticle, were impaired after the 7EP treatment. Importantly, a treatment with a JH mimic subsequent to the 7EP treatment restored the correct maturation of both the eye and the gut. It is worth noting that the timing of JH mimic application was essential for correcting the defects triggered by the treatment with 7EP.

      This is a relevant result in itself since the role of JH in insect embryogenesis is a controversial topic. It seems to have an important role in hemimetabolan embryogenesis, but not so much in holometabolans. Intriguingly, it appears important for hatching, an observation made in hemimetabolan and in holometabolan embryos. Knowing that this role was already present in ametabolans is relevant from an evolutionary point of view, and knowing exactly why embryos do not hatch in the absence of JH, is relevant from the point of view of developmental biology.

      The unique and intriguing aspect of juvenile hormone is its status quo action in the control of metamorphosis. Our reason for dealing with an insect group that branched off from the line of insects that eventually evolved metamorphosis, was to gain insight into the ancestral functions of this hormone. Our data from Thermobia as well as that from grasshoppers and crickets indicate that the developmental actions of JH were originally confined to embryogenesis where it promoted the terminal differentiation of the embryo. Its actions in promoting differentiation also included suppressing morphogenesis. This latter function was not pronounced during embryogenesis because JH only appeared after morphogenesis was essentially completed. However, it was a preadaptation that proved useful in more derived insects that delayed aspects of morphogenesis into the postembryonic realm. JH was then used postembryonically to inhibit morphogenesis until late in juvenile growth when JH disappears, and this inhibition is released.

      Then, the authors describe a series of experiments applying the JH mimic in early embryogenesis, before the natural peak of JH occurs, and its effects on embryo development. Observations were made under different doses of JHm, and under different temporal windows of treatment. Higher doses triggered more severe effects, as expected, and different windows of application produced different effects. The most used combination was 1 ng JHm applied 1.5 days AEL, checking the effects 3 days later. Of note, 1.5 days AEL is about 15% embryonic development, whereas the natural peak of JH occurs around 85% embryonic development. In general, the ectopic application of JHm triggered a diversity of effects, generally leading to an arrest of development. Intriguingly, however, a number of embryos treated with 1 ng of JHm at 1.5 days AEL showed a precocious formation of myofibrils in the longitudinal muscles. Also, a number of embryos treated in the same way showed enhanced chitin deposition in the E1 procuticle and showed an advancement of at least a day in the deposition of the E2 cuticle.

      While the experiments and observations are done with great care and are very exhaustive, I am not sure that the results reveal genuine JH functions. The effects triggered by a significant pulse of ectopic JHm when the embryo is 15% of the development will depend on the context: the transcriptome existing at that time, especially the cocktail of transcription factors. This explains why different application times produce different effects. This also explains why the timing of JHm application was essential for correcting the effects of 7EP treatment. In this reasoning, we must consider that the context at 85% development, when the JH peaks in natural conditions and plays its genuine functions, must be very different from the context at 15% development, when the JHm was applied in most of the experiments. In summary, I believe that the observations after the application of JHm reveal effects of the ectopic JHm, but not necessarily functions of the JH. If so, then the subsequent inferences made from the premise that these ectopic treatments with JHm revealed JH functions are uncertain and should be interpreted with caution.

      We disagree with the reviewer. An analogous situation would be in exploring gene function in which both gain-of-function and loss-of-function experiments often provide complementary insights into how a gene functions. We see JH effects only when its receptor, Met, is present and JH can induce its main effector protein, Kr-h1. The latter gives us confidence that we are looking at bona fide JH effects. We have also kept in mind, though, that the nature of the responding tissues is changing through time. Nevertheless, we see a consistent pattern of responses in the embryo and these can be related to its postembryonic effects in metamorphic insects.

      Those inferences affect not only the "JH and the progressive nature of embryonic molts" section, but also, the "Modifications in JH function during the evolution of hemimetabolous and holometabolous life histories" section, and the entire "Discussion". In addition to inferences built on uncertain functions, the sections mentioned, especially the Discussion, I think suffer from too many poorly justified speculations. I love speculation in science, it is necessary and fruitful. But it must be practiced within limits of reasonableness, especially when expressed in a formal journal.

      We have tried to dial back the speculation.

      Finally, In the section "Modifications in JH function during the evolution of hemimetabolous and holometabolous life", it is not clear the bridge that connects the observations on the embryo of Thermobia and the evolution of modified life cycles, hemimetabolan and holometabolan.

      Our Figure 12 should put this into perspective.

      Reviewer #2 (Recommendations For The Authors):

      Main points

      (1) Please, reduce the level of overinterpretation of ectopic treatment experiments with JHm, since the resulting observations represent effects, but not necessarily functions of JH.

      We have revised this section to indicate that the “effects” of ectopic treatments provide insights into the function of JH. Using a genetic analogy, both “loss-of-function” and “gain-of-function” experiments provide insights into a given gene. (see response to Public Comments)

      (2) Especially in the sections "JH and the progressive nature of embryonic molts" and "Modifications in JH function during the evolution of hemimetabolous and holometabolous life histories", and the entire "Discussion", please keep the level of speculation within reasonable limits, avoiding especially the inference of conclusions on the basis of speculation, itself based on previous speculation.

      We have toned down some of the speculation and provided reasons why it is worth suggesting.

      (3) Please revisit the argued roles of myoglianin in the story, in light of its effects as an inhibitor of JH production, repressing the expression of JHAMT, as has been reliably demonstrated in hemimetabolan species (DOI: 10.1073/pnas.1600612113 and DOI: 10.1096/ fj.201801511R).

      Our appreciation to the reviewer. We are more explicit about the relationship between JH and myo.

      Minor points

      (4) Please keep the consistency of the scientific binomial nomenclature for the species mentioned. For example, read "Manduca sexta" (in italics) at the first mention, and then "M. sexta" (in italics) in successive mentions (instead of reading "Manduca" on page 17, and then "Manduca sexta" on page 18, for example). The same for "Drosophila" ("Drosophila melanogaster" first, and then "D. melanogaster"), "Thermobia" ("Thermobia domestica" first, and then "T. domestica"), etc. In the figure legends, I recommend using the complete name: Thermobia domestica, in the main heading.

      Where there is no possibility of confusion, we intend to use Thermobia, rather than T. domestica, etc. We think that it is easier for a non-specialist to read and it is commonly done in endocrine papers.

      (5) There is no purpose in evolution and biological processes. Thus, I suggest avoiding expressions that have a teleological aftertaste. For example (capitals are mine), on p. 3 "appears to have been extended into postembryonic life where it acts TO antagonize morphogenic and allow the maintenance of a juvenile state".

      We have tried to avoid teleological wording.

      (6) The title "The embryonic role of juvenile hormone in the firebrat, Thermobia domestica, reveals its function before its involvement in metamorphosis" contains a redundancy ("role" and "function"), and an apparent obviousness ("before its involvement in metamorphosis"). I suggest a more straightforward title. Something like "Juvenile hormone plays developmental functions in the embryo of the firebrat Thermobia domestica, which predate its status quo action in metamorphosis".

      As noted above, we are retaining the title since it has already been cited.

      (7) Page 2. "The transition from larva to adult then occurred through a transitional stage, the pupa, thereby providing the three-part life history diagnostic of the "complete metamorphosis" exhibited by holometabolous insects (reviews: Jindra, 2019; Truman & Riddiford, 2002, 2019)". I suggest adding the reference ISBN: 9780128130209 9 7 8 - 0 - 1 2 - 8 1 3 0 2 0 - 9, as the most comprehensive and recent review on complete metamorphosis.

      Done

      (8) Page 3. "These severe developmental effects suggest that the developmental role of JH in insects was initially CONFINED to the embryonic domain" (capitals are mine). This appears contradictory with the observations of Watson, 1967, on the relationships between the apparition of scales and JH, mentioned shortly before by the authors.

      This is explained in the Discussion. Although JH can suppress scale appearance in the J4 stage, we have not been able to show that scales appearance is caused by changes in the juvenile JH titer.

      (9) Page 4. "we measured JH III levels during Thermobia embryogenesis at daily intervals starting at 5 d AEL". Why not before, like in the case of ecdysteroids? The authors might perhaps argue that the levels of Kr-h1 expression are consistently low from the very beginning, according to Fernandez-Nicolas et al, 2022 (reference cited later in the manuscript).

      (10) Page 4. "Ecdysteroid titers through embryogenesis and the early juvenile instars were measured using the enzyme immunoassay method (Porcheron et al., 1989) that is optimized for detecting 20-hydroxyecdysone (20E)". The antibody generated by Porcheron (and now sold by Cayman) recognizes ecdysone and 20-hydroxyecdysone alike. But that's not relevant here. I would refer to "ecdysteroids" when mentioning measurements. Also in figure 2B (and "juvenile hormone III" without the formula, in Panel A, for harmonization). And I would not expand on specifications, like those at the beginning of page 5, or towards the end of page

      We thank the reviewer for this important correction.

      (12) ("the fact that we detected only a slight rise in ecdysteroids at this time (Fig 2B) is likely due to the assay that we used being designed to detect 20E rather than ecdysone").

      Omitted.

      (11) Page 5. "Low levels of Kr-h1 transcripts were present at 12 hr after egg deposition, but then were not detected until about 6 d AEL when JH-III first appeared". There is a very precise Kr-h1 pattern in Fernandez-Nicolas et al. 2023 (reference mentioned later in the manuscript).

      (12) Page 5. "notably myoglianin (myo), have become prominent as agents that promote the competence and execution of metamorphosis in holometabolous and hemimetabolous insects (He et al., 2020; Awasaki et al., 2011)". See my note 3 above.

      The myoglianin issue has been revised.

      (13) Page 5. "a drug that suppresses JH production". Rather, "a drug that destroys the JH producing tissues". Why the way, do the authors know when the CA are formed in T. domestica embryo development?

      We prefer to keep our original wording. There have been some cases in which precocene has blocked JH production but did not kill the CA cells. We do not have observations that show that 7EP kills the CA cells in Thermobia embryos.

      (14) Page 5. "subsequent treatment with a JHm". I would say here that the JHm is pyriproxyfen, not on page 6 or page 7. Thus, to be consistent, after the first mention of "pyriproxyfen (JHm)" on page 5, I'd consistently use the abbreviation "JHm".

      (15) Page 9. "Limb loss in such embryos was often STOCHASTIC, i.e., in a given embryo some limbs were completely lost while others were maintained in a reduced state" (capitals are mine). The meaning of "stochastic" is random, involving a random variable; it is a concept usually associated to probability theory and related fields. I suggest using the less specialized word "variable", since to ascertain that the values are really stochastic would require specific mathematical approaches.

      We are still using stochastic because the loss is random.

      (16) Page 10. "9E). Indeed, the JH treatment redirects the molt to be more like that to the J2 stage, rather than to the E2 (= J1) stage". Probably too assertive given the evidence available (see my points 1 and 2 above).

      We do not see a problem with our conclusion. In response to the JHm treatment, the embryo produced a smooth, rather than a “pebbly” cuticle, failed to make the J1-specific egg tooth, and attempted to make cuticular lenses (a J2 feature). This ability of premature JH exposure to cause embryos to “skip” a stage is also seen in locusts (Truman & Riddiford, 1999) and crickets (Erezyilmaz et al., 2004). The JHm treatment resulted in the production of smooth cuticle, lack of a hatching tooth, and an attempt to make cuticular lenses.

      (17) Page 11. "early JHM treatment", read "early JHm treatment".

      Corrected

      (18) Page 11. "likely. A target of JH, and likely Kr-h1, in Thermobia is myoglianin...". Please see my notes 1, 2, and especially 3, above.

      This has been revised

      (19) Page 13. "the locust, Locusta americana (Aboulafia-Baginshy et al.,1984)". Please read "the locust, Locusta migratoria (Aboulafia-Baginshy et al.,1984)".

      Corrected

      (20) Page 13 "Acheta domesticus" three times. The correct name now is "Acheta domestica", after harmonizing the declension of the specific name with the generic one. See additionally my note 4 above.

      Acheta domesticus has been used in hundreds (thousands?) of papers since it was originally named by Linnaeus. We will continue to use it.

      (21) Page 15, "(also called the vermiform larva (Bernays, 1971) redirects embryonic development to form an embryo with proportions, cuticular pigmentation, cuticular sculpturing and bristles characteristic of a nymph, while pronymph modifications, such as the cuticular surface sculpturing (Bernays, 1971)". The reference "Bernays, 1971" is indeed "Bergot et al., 1971".

      There was a mistake in the references. The Bernays reference was omitted from the revised Discussion

      (22) Page 16. "Since JH also induces Kr-h1 in embryos of many insects, including Thermobia". I'm not sure that this has been studied in many insects. In any case, any reference would be useful.

      (23) Page 17. "Tribolium casteneum". Please read "Tribolium castaneum".

      Changed

      (24) Page 17. "...results in a permanent larva that continues to molt well after it has surpassed its critical weight (He et al., 2019)". The paper of He et al., 2019 is preceded by two key papers that previously demonstrate (and in hemimetabolan insects) that myoglianin is a determining factor in the preparation for metamorphosis: DOI: 10.1073/pnas.1600612113 and DOI: 10.1096/ fj.201801511R). See my note 3 above.

      Corrected in revision

      (25) Page 18. "These persisting embryonic primordia join the wing primordia in delaying their morphogenesis into postembryonic life". This reader does not understand this sentence.

      Made clearer in the revision.

      (26) Page 18. "is first possible in the commercial silkworm (Daimon et al., 2015)". Please mention the scientific Latin name of the species, Bombyx mori.

      (27) Page 19. "The functioning of farnesol derivatives in growth versus differentiation control extends deep into the eukaryotes.../... this capacity was eventually exploited by the insects to provide the hormonal system that regulates their metamorphosis". This information appears quite out of place.

      We have retained this point.

      (28) Page 21. Heading "Hormones". I suggest using the heading "Bioactive compounds", as neither pyriproxyfen nor 7-ethoxyprecocene are hormones.

      Done

      (29) Page 29, legend of figure 1. "Photomicrographs" is somewhat redundant. The technical word is "micrographs". "Thermobia domestica" appears in the explanation of panel C, but this is not necessary, as the name appears in the main heading of the legend.

      Done

      (30) Page 30, legend of figure 2. Panel B, see my comment 10 above. Why embryonic age is expressed in % embryo development in panel C (and in days in panels A and B)?

      All have been converted to days AEL

      (31) Page 35, legend of figure 5. "Photomicrograph" see my note 28 above.

      Done

      (32) Page 40, figure 10. In panel A, the indication of the properties of JH is misleading. The arrow going to promoting differentiation and maturation is OK, but the repression sign that indicates suppression of morphogenetic growth and cell determination seems to suggest that JH has retroactive effects. In panel B, I suggest to label "Flies" instead of "Higher Diptera", which is an old-fashioned term. In any case, see my general comments 1 and 2, above, about speculation.

      Figure has been completely revised

      (33) Figure 11. See my general comments 1 and 2, above, about speculation.

      Figure has been revised

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors use inhibitors and mimetics of juvenile hormone (JH) to demonstrate that JH has a key role in late embryonic development in Thermobia, specifically in gut and eye development but also resorption of the extraembryonic fluid and hatching. They then exogenously apply JH early in development (when it is not normally present) to examine the biological effects of JH at these stages. This causes a plethora of defects including developmental arrest, deposition of chitin, limb development, and enhanced muscle differentiation. The authors interpret these early effects on development as JH being important for the shift from morphogenetic growth to differentiation - a role that they speculate may have facilitated the evolution of metamorphosis (hemi- and holo-metaboly). This paper will be of interest to insect evo-devo researchers, particularly those with interests in the evolution of metamorphosis.

      Strengths:

      The experiments are generally conducted very well with appropriate controls and the authors have included a very detailed analysis of the phenotypes.

      The manuscript significantly advances our understanding of Thermobia development and the role of JH in Thermobia development.

      The authors interpret this data to present some hypotheses regarding the role of JH in the evolution of metamorphosis, some aspects of which can be addressed by future studies.

      Weaknesses:

      The results are based on using inhibitors and mimetics of JH and there was no attempt to discern immediate effects of JH from downstream effects. The authors show, for instance, that the transcription of myoglianin is responsive to JH levels, it would have been interesting to see if any of the phenotypic effects are due to myoglianin upregulation/suppression (using RNAi for example). These kinds of experiments will be necessary to fully work out if and how the JH regulatory network has been co-opted into metamorphosis.

      We agree completely and should be a feature of future work.

      The results generally support the authors' conclusions. However, the discussion contains a lot of speculation and some far-reaching conclusions are made about the role of JH and how it became co-opted into controlling metamorphosis. There are some interesting hypotheses presented and the author's speculations are consistent with the data presented. However, it is difficult to make evolutionary inferences from a single data point as although Thermobia is a basally branching insect, the lineage giving rise to Thermobia diverged from the lineages giving rise to the holo- and hemimetabolous insects approx.. 400 mya and it is possible that the effects of JH seen in Thermobia reflect lineage-specific effects rather than the 'ancestral state'. The authors ignore the possibility that there has been substantial rewiring of the networks that are JH responsive across these 400 my. I would encourage the authors to temper some of the discussion of these hypotheses and include some of the limitations of their inferences regarding the role of JH in the evolution of metamorphosis in their discussion.

      We have tried to be less all-encompassing in the Discussion. The strongest comparisons can be made between ametabolous and hemimetabolous insects and we have focused most of the Discussion on the role of JH in that transition. We still include some discussion of holometabolous insects because the ancestral embryonic functions of JH may be somehow related to the unusual reappearance of JH in the prepupal period. We have reduced this discussion to only a few sentences.

      Reviewer #3 (Recommendations For The Authors):

      (1) The overall manuscript is very long (especially the discussion), and the main messages of the manuscript get lost in some of the details. I would suggest that the authors move some of the results to the supplementary material (e.g. it might be possible to put a lot of the detail of Thermobia embryogenesis into the supplementary text if the authors feel it is appropriate). The discussion contains a lot of speculation and I suggest the authors make this more concise. One example: At the moment there is a large section on the modification in JH function during the evolution of holo and hemi-metabolous life history strategies. There are some interesting ideas in this section and the authors do a good job of integrating their findings with the literature - but I would encourage the authors to limit the bulk of their discussion to the specific things that their results demonstrate. E.g. The first half of p17 contains too much detail, and the focus should be on the relationship with Thermobia (as at the bottom of p17).

      Section has been revised and is more focused

      (2) I would also suggest a thorough proofread of the manuscript, I have highlighted some of the errors/points of confusion that I found in the list below - but this list is unlikely to be exhaustive . We appreciate catching the errors. Hopefully the final version is better proofed.

      (3) It might be me, but I found the wording in the second half of the abstract a bit confusing. Particularly the statement about the redeployment of morphogen systems - could this be stated more clearly?

      Abstract has been revised.

      (4) Introduction

      a. "powered flight" rather than 'power flight'

      Done

      b. 'brought about a hemimetabolous lifecycle' implies causality which hasn't been shown and directionality to evolution - suggest 'facilitated the evolution of a hemi...". Similar comment for 'subsequent step to complete metamorphosis'.

      c. Bottom of p2 - unclear whether you are referring to hemi- holo- or both

      d. Suggest removing sentence beginning "besides its effects..." as the relevance of the role of JH in caste isn't clear.

      Kept sentence but removed initial clause

      e. State that Thermoia is a Zygentoma.

      Done

      f. Throughout - full species names on first usage only, T. domestica on subsequent usages.

      We will continue to use genus names for the reason given above.

      Gene names e.g. kr-h1 in italics.

      g. 'antagonise morphogens"? rather than 'antagonise morphoentic'.

      Done

      (5) Results

      a. Unclear why drawings are provided rather than embryonic images in Fig. 1A

      We think that the points can be made better with diagrams.

      b. Top of p4, is 'slot' the correct word?

      Corrected

      c. Unclear why the measurements of JHIII weren't measured before 5 days AEL, especially given that many of the manipulative experiments are at earlier time points than this. I appreciate that, based on kr-h1, levels that JHIII is also likely to be low.

      d. Reference for the late embryonic peak of 20E being responsible for the J2 cuticle?

      Clarified that this is an assumption

      e. Clarify "some endocrine related transcripts" why were these ones in particular picked? Kr-h1 is a good transcriptional proxy for JH and Met is the JH-receptor, why myoglianin and not some of the other transcriptional proxies of neuroendocrine signalling?

      Hopefully, the choice is clearer.

      f. Fig 2C rather than % embryo development for the gene expression data please represent this in days (to be consistent with your other figures).

      It is now consistent with other parts of figure.

      g. In Fig. 3 the authors do t-tests, because there are three groups there needs to be some correction for multiple testing (e.g. Bonferroni) can the authors add this to the relevant methods section?

      We think that pair-wise comparisons are appropriate.

      h. Fig. 3 legend: you note that you treat stage 2 juveniles with 7EP - I couldn't tell what AEL this corresponded to.

      This is after hatching so AEL does not apply.

      i. Top of p7 'deformities' rather than 'derangements'?

      Done

      j. Regarding the dosage effects of embryonic abnormalities - it would be good to include these in the supp material, as it convinces the reader that the effects you have seen aren't just due to toxicity.

      It is not clear what the objection is.

      k. Bottom of p7 'problematic' not 'problematical'

      Done

      l. P8 Why are the clusters of Its important? - provide a bit more interpretation for the reader here.

      This is clear in the revised version.

      m. P9 Why is the modulation of transcription of kr-h1, met, and myo important in this context

      Explained

      n. P9 'fig. 7F'? there is no Fig. 5F

      Thanks for catching the typo.

      o. Fig. 7B add to the legend which treatment the dark and light points correspond to.

      We think it is obvious from the labeling on Fig 7B.

      (6) Discussion:

      a. What do we know about how terminal differentiation is controlled in non-insect arthropods? Most of the discussion is focused on insects (which makes sense as JH is an insect-specific molecule), but if the authors are arguing the ancestral role of JH it would be useful to know how their findings relate to non-insect arthropods.

      We have not been able to find any information about systemic signals being involved in non-insect arthropods.

      b. There is no Fig. 5E (are they referring to 7E?)

      Yes, it should have been Fig. 7E.

      c. Is myoglianin a direct target of JH in other species?

      Other reports are in postembryonic stages and show that myoglianin suppresses JH production. Our paper is the first examination in embryos and we find that the opposite is true – i.e., that JH treatment suppresses myoglianin production. We suspect that these two signaling systems are mutually inhibitory. It would be interesting to see whether treatment of a post-critical weight larva with JH (which would induce a supernumerary larval molt) would also suppress myoglianin production (as we see in Thermobia embryos).

      d. P12 What is the evidence that JH interacts with the first 20E peak to alter the embryonic cuticle?

      We are not sure what the issue is. The experimental fact is that treatment with JH before the E1 ecdysteroid peak causes the production of an altered E1 cuticle. We are faced with the question of why is this molt sensitive to JH when the latter will not appear until 3 or 4 days later? A possible answer is that the ecdysone response pathway has a component that has inherent JH sensitivity. The mosquito data suggest that Taiman provides another link between JH and ecdysone action

      e. Top of p13 - this paragraph can be cut down substantially. Although this is evidence that JH can alter ecdysteriods - it is in a species that is 400 my derived from the target species. Is it likely to be the exact same mechanism? I would encourage the authors to distil and retain the most important points.

      This paragraph has been shortened and focused.

      f. Bottom of p13 - what does this study add to this knowledge?

      The response of Thermobia embryos to JH treatment is qualitatively the same as seen in other short germband embryos. This similarity supports the assumption that the same responses would have been seen in their last common ancestor.

      g. P19 the last paragraph in the conclusions is really peripherally relevant to the paper and is a bit of a stretch, I would encourage the authors to leave this section out.

      We agree that it is a stretch. JH and its precursor MF are the only sesquiterpene hormones. How did they come about to acquire this function? We think it is worth pointing out the farnesol metabolites have been associated with promoting differentiation in various eukaryotes. An ancient feature of these molecules in promoting (maintaining?) differentiation may have been exploited by the insects to develop a unique class of hormones. It is worth putting the idea out to be considered.

      h. P19 "conclusions" rather than 'concluding speculations'.

      Changed as suggested.

      Methods:

      It is standard practice to include at least two genes as reference genes for RT-qPCR analysis (https://doi.org/10.1186/gb-2002-3-7-research0034, https://doi.org/10.1373/clinchem.2008.112797) If there are large-scale differences in the tissues being compared (e.g. as there are here during development) then more than two reference genes may be required and a reference gene study (such as https://doi.org/10.3390%2Fgenes12010021) is appropriate. Have the authors confirmed that rp49 is stably expressed during the stages of Thermobia development that they assay here?

      We have explained our choice in the Methods.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes a new method for sequence-based remote homology detection. Such methods are essential for the annotation of uncharacterized proteins and for studies of protein evolution.

      Strengths:

      The main strength and novelty of the proposed approach lies in the idea of combining stateof-the-art sequence-based (HHpred and HMMER) and structure-based (Foldseek) homology detection methods with recent developments in the field of protein language models (the ESM2 model was used). The authors show that features extracted from high-dimensional, information-rich ESM2 sequence embeddings can be suitable for efficient use with the aforementioned tools.

      The reduced features take the form of amino acid occurrence probability matrices estimated from ESM2 masked-token predictions, or structural descriptors predicted by a modified variant of the ESM2 model. However, we believe that these should not be called "embeddings" or "representations". This is because they don't come directly from any layer of these networks, but rather from their final predictions.

      We agree that there is some room for discussion about whether the amino acid probabilities returned by pre-trained ESM-2 and the 3Di sequences returned by ESM-2 3B 3Di can be properly referred to as “embeddings”. The term “embedding” doesn’t have a formal definition, other than some kind of alternative vector representation of the input data which, preferably, makes the input data more suitable for some downstream task. In that simple sense of the word “embedding”, amino acid probabilities and 3Di sequences output by our models are, indeed, types of embeddings. We posed the question on Twitter (https://twitter.com/TrichomeDoctor/status/1715051012162220340) and nobody responded, so we are left to conclude that the community is largely ambivalent about the precise definition of “embedding”.

      We’ve added language in our introduction to make it more clear that this is our working definition of an “embedding”, and why that definition can apply to profile HMMs and 3Di sequences.

      The benchmarks presented suggest that the approach improves sensitivity even at very low sequence identities <20%. The method is also expected to be faster because it does not require the computation of multiple sequence alignments (MSAs) for profile calculation or structure prediction.

      Weaknesses:

      The benchmarking of the method is very limited and lacks comparison with other methods. Without additional benchmarks, it is impossible to say whether the proposed approach really allows remote homology detection and how much improvement the discussed method brings over tools that are currently considered state-of-the-art.

      We thank the reviewer for the comment. To address the question, we’ve expanded the results by adding a new benchmark and added a new figure, Figure 4. In this new content, we use the SCOPe40 benchmark, originally proposed in the Foldseek paper (van Kempen et al., 2023), to compare our best method, ESM-2 3B 3Di coupled to Foldseek, with several other recent methods. We find our method to be competitive with the other methods.

      We are hesitant to claim that any of our proposed methods are state-of-the-art because of the lack of a widely accepted standard benchmark for remote homology detection, and because of the rapid pace of advancement of the field in recent years, with many groups finding innovative uses of pLMs and other neural-network models for protein annotation and homology detection.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a number of exploratory applications of current protein representations for remote homology search. They first fine-tune a language model to predict structural alphabets from sequence and demonstrate using these predicted structural alphabets for fast remote homology search both on their own and by building HMM profiles from them. They also demonstrate the use of residue-level language model amino acid predicted probabilities to build HMM profiles. These three implementations are compared to traditional profile-based remote homology search.

      Strengths:

      • Predicting structural alphabets from a sequence is novel and valuable, with another approach (ProstT5) also released in the same time frame further demonstrating its application for the remote homology search task.

      • Using these new representations in established and battle-tested workflows such as MMSeqs, HMMER, and HHBlits is a great way to allow researchers to have access to the state-of-the-art methods for their task.

      • Given the exponential growth of data in a number of protein resources, approaches that allow for the preparation of searchable datasets and enable fast search is of high relevance.

      Weaknesses:

      • The authors fine-tuned ESM-2 3B to predict 3Di sequences and presented the fine-tuned model ESM-2 3B 3Di with a claimed accuracy of 64% compared to a test set of 3Di sequences derived from AlphaFold2 predicted structures. However, the description of this test set is missing, and I would expect repeating some of the benchmarking efforts described in the Foldseek manuscript as this accuracy value is hard to interpret on its own.

      The preparation of training and test sets are described in the methods under the heading “Fine tuning ESM-2 3B to convert amino acid sequences into 3Di sequences”. Furthermore, there is code in our github repository to reproduce the splits, and the entire model training process: https://github.com/seanrjohnson/esmologs#train-esm-2-3b-3di-starting-from-the-esm-2-3bpre-trained-weights

      We didn’t include the training/validation/test splits in the Zenodo repository because they are very large: train 33,924,764; validation 1,884,709; test 1,884,710 sequences, times 2 because there are both amino acid and 3Di sequences. It comes out to about 30 Gb total, and is easily rebuilt from the same sources we built it from.

      We’ve added the following sentence to the main text to clarify:

      “Training and test sets were derived from a random split of the Foldseek AlphaFold2 UniProt50 dataset (Jumper et al., 2021; van Kempen et al., 2023; Varadi et al., 2022), a reducedredundancy subset of the UniProt AlphaFold2 structures (see Methods for details).”

      To address the concern about comparing to Foldseek using the same benchmark, we’ve expanded the results section and added a new figure, Figure 4 using the SCOPe40 benchmark originally presented in the Foldseek paper, and subsequently in the ProstT5 paper to compare Foldseek with ESM-2 3B 3Di to Foldseek with ProstT5, AlphaFold2, and experimental structures.

      • Given the availability of predicted structure data in AFDB, I would expect to see a comparison between the searches of predicted 3Di sequences and the "true" 3Di sequences derived from these predicted structures. This comparison would substantiate the innovation claimed in the manuscript, demonstrating the potential of conducting new searches solely based on sequence data on a structural database.

      See response above. We’ve now benchmarked against both ProstT5 and AF2.

      • The profile HMMs built from predicted 3Di appear to perform sub-optimally, and those from the ESM-2 3B predicted probabilities also don't seem to improve traditional HMM results significantly. The HHBlits results depicted in lines 5 and 6 in the figure are not discussed at all, and a comparison with traditional HHBlits is missing. With these results and presentation, the advantages of pLM profile-based searches are not clear, and more justification over traditional methods is needed.

      We thank the reviewer for pointing out the lack of clarity in the discussion of lines 5 and 6.

      We’ve re-written that section of the discussion, and reformatted Figure 3 to enhance clarity.

      We agree, a comparison to traditional HHBlits could be interesting, but we don’t expect to see stronger performance from the pLM-predicted profiles than from traditional HHBlits, just as we don’t see stronger performance from pLM-hmmscan or pLM-Foldseek than from the traditional variants. We think that the advantages of pLM based amino acid hmm searches are primarily speed. There are many variables that can influence speed of generating an MSA and HMM profile, but in general we expect that it will be much slower than generating an HMM profile from a pLM.

      We don’t know why making profiles of 3Di sequences doesn’t improve search sensitivity, we just think it’s an interesting result that is worth presenting to the community. Perhaps someone can figure out how to make it work better.

      • Figure 3 and its associated text are hard to follow due to the abundance of colors and abbreviations used. One figure attempting to explain multiple distinct points adds to the confusion. Suggestion: Splitting the figure into two panels comparing (A) Foldseek-derived searches (lines 7-10) and (B) language-model derived searches (line 3-6) to traditional methods could enhance clarity. Different scatter markers could also help follow the plots more easily.

      We thank the reviewer for this helpful comment. We’ve reformatted Figure 3 as suggested, and we think it is much easier to read now.

      • The justification for using Foldseek without amino acids (3Di-only mode) is not clear. Its utility should be described, or it should be omitted for clarity.

      To us, the use of 3Di-only mode is of great theoretical interest. From our perspective, this is one of our most significant results. Previous methods, such as pLM-BLAST and related methods, have made use of very large positional embeddings to achieve sensitive remote homology search. We show that with the right embedding, you don’t need very many bits per position to get dramatically improved search sensitivity from Smith-Waterman, compared to amino acid searches. We also doubt that predicted 3Di sequences are the optimal small encoding for remote homology detection. This result and observation opens up an exciting avenue for future research in developing small, learned positional embeddings that are optimal for remote homology detection and amenable to SIMD-optimized pre-filtering and Smith-Waterman alignment steps.

      We’ve expanded the discussion, explaining why we are excited about this result.

      • Figure 2 is not described, unclear what to read from it.

      It's just showing that ESM-2-derived amino acid probabilities closely resemble amino acid frequencies in MSAs. We think it gives readers some visual intuition about why predicted profile HMMs perform as well as they do. We’ve added some additional explanation of it in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The paper would mainly benefit from a more comprehensive benchmark:

      We suggest that the authors extend the benchmark by including the reference methods (HHpred and Foldseek) run with their original representations, i.e., MSAs obtained with 2-3 iterations of hhblits (for HHpred) and experimental or predicted structures (for Foldseek). HHpred profile-profile comparisons and Foldseek structure-structure comparisons would be important reference points for assessing the applicability of the proposed approach in distant homology detection. It is also essential to compare the method with other emerging tools such as EBA (DOI: 10.1101/2022.12.13.520313), pLM-BLAST (DOI: 10.1101/2022.11.24.517862), DEDAL (DOI: 10.1038/s41592-022-01700-2), etc.

      We also suggest using an evolutionary-oriented database for the benchmark, such as ECOD or CATH (these databases classify protein domains with known structures, which is important in the context of including Foldseek in the benchmark). We ran a cursory benchmark using the ECOD database and generated HH-suite .hhm files (using the single_seq_to_hmm.py and hhsearch_multiple.py scripts). Precision and recall appear to be significantly lower compared to "vanilla" hhsearch runs with MSA-derived profiles. It would also be interesting to see benchmarks for speed and alignment quality.

      The pLM-based methods for homology detection are an emerging field, and it would be important to evaluate them in the context of distinguishing between homology and analogy. In particular, the predicted Foldseek representations may be more likely to capture structural similarity than homology. This could be investigated, for example, using the ECOD classification (do structurally similar proteins from different homology groups produce significant matches?) and/or resources such as MALISAM that catalog examples of analogy.

      We’ve added the SCOPe40 benchmark, which we think at least partially addresses these comments, adding a comparison to pLM-BLAST, ProstT5, and AF2 followed by Foldseek. The question of Analogy vs homology is an interesting one. It could be argued that the SCOPe40 benchmark addresses this in the difference between Superfamily (distant homology) and Fold (analogy, or very distant homology).

      Our focus is on remote homology detection applications rather than alignment quality, so we don’t benchmark alignment quality, although we agree that those benchmarks would be interesting.

      Page 2, lines 60-67. This paragraph would benefit from additional citations and explanations to support the superiority of the proposed approach. The fact that flattened embeddings are not suitable for annotating multidomain proteins seems obvious. Also, the claim that "current search implementations are slow compared to other methods" should be supported (tools such as EBA or pLM-BLAST have been shown to be faster than standard MSA-based methods). Also, as we mentioned in the main review, we believe that the generated pseudo-profiles and fine-tuned ESM2 predictions should not be called "smaller positional embeddings".

      Discriminating subdomains was a major limitation of the influential and widely-cited PfamN paper (Bileschi et al., 2022), we’ve added a citation to that paper in that paragraph for readers interested in diving deeper.

      To address the question of speed, we’ve included data preparation and search benchmarks as part of our presentation of the SCOPe40 benchmark.

      Finally, we were not sure why exactly every 7th residue is masked in a single forward pass. Traditionally, pseudo-log likelihoods are generated by masking every single token and predicting probabilities from logits given the full context - e.g. https://arxiv.org/pdf/1910.14659.pdf. Since this procedure is crucial in the next steps of the pipeline, it would be important to either experiment with this hyperparameter or explain the logic used to choose the mask spacing.

      We’ve added discussion of the masking distance to the Methods section.

      Reviewer #2 (Recommendations For The Authors):

      • While the code and data for the benchmark are available, the generation of searchable databases using the methods described for a popular resource such as Pfam, AFDB, SCOP/CATH which can be used by the community would greatly boost the impact of this work.

      3Di sequences predicted by ESM-2 3B 3Di can easily be used as queries against any Foldseek database, such as PDB, AFDB, etc. We’ve added Figure 4E to demonstrate this possibility, and added some related discussion.

      • Minor: In line 114, the text should likely read "compare lines 7 and 8" instead of "compare lines 6 and 7."

      We’ve clarified the discussion of Figure 3.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Redhardt and colleagues describe a structure of the voltage and Ca-activated Slo1 channel in complex with an auxiliary subunit, γ1. In complex with γ1, Slo1 adopts an open state that closely resembles previous open state structures. Of γ1, only the single membrane-spanning helix, which binds to the periphery of the Slo1 VSD, is resolved. There, it establishes several interactions with Slo1 that authors propose may favor adoption of the open state, potentially explaining how γ1 can shift I-V profile of Slo1 to be activated at more negative membrane potentials. The interactions described fit well with existing mutagenesis analyses.

      While this report provides a first glimpse of how γ1 can bind to Slo1, its impact will be minimal. It describes a single structural snapshot and there are no functional analyses presented. Additional analyses would be helpful in understanding of how γ1 can regulate Slo1 channels.

      We thank the reviewer for their honest judgment. We agree that validating the structure by biochemical and/or functional data would have significantly strengthened the manuscript. However, we are convinced that our structural data alone already provides significant novel understanding of the assembly of the Slo1-γ1 complex and regulation of Slo1 by γ1. Thus, we feel that publication of this manuscript is justified by the high importance of Slo channels and our data will have an impact in the field.

      __Major comments: __ 1. The authors propose several models for how γ1 regulates Slo1, yet none of them are experimentally evaluated. For example, on page 8, it is written that "we propose that the combination of three different principles, namely shape complementarity, covalent anchoring and lowering the resting state potential by a positively charged intracellular stretch, act in concert to stabilize an active VSD conformation in the Slo1-γ1 complex." This is a testable hypothesis and one that should be experimentally evaluated to better understand regulation by γ1.

      We agree with the reviewer that experimental validation of this hypothesis would have been an asset. Nevertheless, we think that our structural data in context of previous functional data e.g. from Li et al. 2015,2016) and also in comparison with the other two manuscripts on the same topic which have been published while this manuscript was under review, allows us to draw conclusions about the mechanism of γ1-mediated activation of Slo1. We have now, however, toned down some of the earlier statements and changed parts of our interpretations in light of the novel findings by Yamanouchi et al. and Kallure et al.

      The authors analysis of the extracellular domain of γ1 is incomplete. The only presented structure was performed with C4 symmetry imposed, in which extracellular domains were largely lost. The authors propose that these domains are dynamic and that their dynamism would enable simultaneous binding of both γ and b subunits, as occurs in cells. A more thorough analysis of the dynamics and well as potential asymmetric conformations should be performed to better understand how these domains interact with Slo1.

      We completely agree with the reviewer that a thorough analysis of the extracellular domain is important and thank the reviewer for their valuable suggestions. We had attempted such analysis already from the beginning, but were not successful. More specifically, we have attempted reconstructions with lower symmetry (C2 and C1) from the beginning or by symmetry relaxation after initial C4 reconstruction. Also, we tested different masking and signal subtraction strategies in combination with different global and local refinements, as well as symmetry expansion and 3D classification. Unfortunately, none of these strategies led to a better resolved LRR module.

      We now think that in comparison with Kallure et al. and Yamanouchi et al., the ice in our sample was thinner, which allowed us to reach higher resolution in the core particle (Slo1 and γ1 TM helix), but at the cost of the γ1 LRRs being denatured or at least distorted by the air-water interface.

      The refinement statistics suggest that the model was incompletely refined. This reviewer was not provided with the map or models, but the validation report lists a clashscore of 9 and 5.7% of the rotamers as being outliers, both of which are high for the reported resolution of the structure. It is also strange that the Q-score varied between different γ1 protomers. Why are the four protomers not identical when the map is 4-fold symmetric? The authors should carefully inspect their model to insure that it is as correct as possible.

      We thank the reviewer for pointing this out, and while the values for clashscores and rotamers were not outside the range of values typically found in many other cryo-EM structures, we agree that there was still some room for improvement. We have worked on this and could lower the values to a clashscore of 7.0 and 1.8 % rotamer outliers.

      The difference in Q-score is also something not too uncommon since, while the map is indeed C4-symmetric, during model refinement the NCS restraints are not completely preventing small deviations between the protomers. We have now also successfully attempted to minimize these differences further.

      Reviewer #1 (Significance (Required)):

      The impact of this report is limited. Functional analyses will be necessary to uncover precisely how gamma subunits regulate Slo1 channels.

      We thank the reviewer for this honest statement, but respectfully disagree. While additional functional analyses would have certainly boosted the impact, we are certain that our structural data and their interpretation will be very valuable for the field, because they provide (as stated by Reviewer 3) new insights into the regulation of Slo channel activity by the γ subunits and suggest (as stated by reviewer 2) a novel mechanism of activation of voltage-gated ion channels..

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary This study presents a high resolution cryo-EM study of a voltage-gated Ca++-dependent K+ channel in the presence of a gamma1 subunit. Analysis of the structure and sequence alignments suggest a novel mechanism of activation of voltage-gated ion channels.

      __Major comments __ The major issue in this paper is that it is only a structural biology paper. There is no structure-function relationship study, no functional studies of mutants that could validate -or not- the inferred underlying mechanism. Even though the authors have identified good candidates for mutations (e.g. p. 6) they have not attempted to validate their importance experimentally. As a result, reading their discussion is somewhat frustrating and full of assumptions, as indicated by sentences (p.7) like

      "a possible mechanism... might be... which would make... more likely".

      "... which might act ... seems important... might indicate... might lower... likely most pronounced... could be responsible..."

      "... might play an important role... does not allow a certain conclusion..."

      We completely agree with the reviewer that the paper would have been much stronger if we would have been able to perform biochemical or functional assays testing mutations in the binding interface. However, this would have unfortunately been beyond the scope of the project. We are nevertheless confident that our structural data will be of value for the field, also in context of the two structure-function papers that have been published since which confirm and validate our data and provide the link to function.

      __Minor comments which could be confidently addressed __ The Introduction contains no description of the state-of-the-art in the field concerning the available structures in the same system or similar ones. Hence, it is difficult to judge for people outside the field if the novelty. is incremental or significant.

      We have adjusted the introduction to explicitly mention previously published structural data on the Slo channels.

      References 10 and 42 (eLife) lacj some details.

      We have adjusted said references accordingly.

      __Reviewer #2 (Significance (Required)): ______


      Significance general assessment As it turns out, at least two papers in exactly the same field just appeared: -one in Molecular Cell by a Japanese group, which is much more developed and contains functional tests and structure-function relationships, in addition to beautiful structures (available on-line early December) https://www.sciencedirect.com/science/article/pii/S1097276523009218

      -one in biorxiv, deposited yesterday https://www.biorxiv.org/content/biorxiv/early/2023/12/20/2023.12.20.572542.full.pdf

      Advances wrt known results See above. As a result of these new papers in Mol Cell and biorxiv, I think the authors should reconsider submitting their article elsewhere, perhaps for a more specialized audience.

      We agree with the reviewer that in light of the other two publications which both were published a while after we deposited our preprint on biorxiv and while the manuscript was under review, the uniqueness of our data is somewhat lowered. However, since our data is overall in large agreement with these two other publications, but we report a structure at significantly higher resolution and from a different species (indeed the first Slo1 structure from rabbit, a model organism of BK channel characterization in the last decades), we are confident that our data are still very valuable for the field and qualify for publication in one of the affiliate journals of Review Commons. After all, the fact that three papers reporting very similar data were published within a few weeks (plus another preprint reporting structures of a Slo channel, but unrelated to γ subunits) illustrates the importance for understanding the regulation of this essential ion channel and the impact of all structural data enhancing this understanding, and independent confirmation by three different labs is something very valuable to the community.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      "This manuscript by Redhardt et al. presents the cryo-EM structure of the Slo K+ channel from rabbits in conjunction with its auxiliary subunit, γ1, and proposes a mechanistic model for regulating channel activation. "This manuscript by Redhardt et al. presents the cryo-EM structure of the Slo K+ channel from rabbits in conjunction with its auxiliary subunit, γ1, and proposes a mechanistic model for regulating channel activation. The Slo channel, also known as the large-conductance calcium-activated potassium channel or BK channel, is an ion channel type found in various cell membranes, including neurons, muscle cells, and other tissue types. Its key features encompass Ca2+ activation, voltage dependence, and regulation by auxiliary subunits. Different auxiliary subunits have been shown to modulate channel functions distinctly; notably, the γ1 subunit enables channel activation at lower voltages compared to the wild-type channel. This manuscript offers a structural-functional framework that enhances our comprehension of how Slo channels are regulated by auxiliary subunits, such as gamma and beta subunits. While the structure of Slo channels in complex with the beta subunit is understood, the binding and interaction of the gamma subunit with the channels remain elusive due to the absence of corresponding structures. Along these lines, the presented structure here indeed provides new insights into the regulation of Slo channel activity by the gamma subunit. However, there are some important questions below that should be addressed."

      1. In Figure 1D panel, the calcium ions appear to be indistinct, likely due to the figure's low resolution. The authors are recommended to enhance the figure quality and consider a better positioning to effectively illustrate the ions.

      We have adjusted the coloring of calcium ions Fig. 1D to increase their visibility.

      It would be beneficial for the readers if the authors provided detailed methodology explaining how they arrived at the 7% and 11% coexpression, aiding in the complex formation. Additionally, it would be informative to know the observed shift in the size exclusion chromatography (SEC) profile of Slo1-Y1 compared to apo Slo1.

      We have arrived at these concentrations of the respective viruses by empirically testing ranges between 3 % and 15 %. We have now added a sentence to the manuscript to explain this.

      Is there any rationale behind initially purifying using strep affinity followed by His affinity?

      The idea behind using a dual-affinity protocol is to ensure that all purified complexes contain at least one copy of Slo1 and one copy of γ1. Using the Strep tag first allows to remove most contaminants already in the first step, due to its higher specificity compared to the His tag. We have added a sentence to the methods section to explain this.

      Regarding the Slo1 tetramer with gamma subunit binding, are there other classes where one, two, or three gamma subunits are bound to Slo1? Or is there only one class where all protomers of Slo1 are occupied by the gamma subunit? How do these classes appear when refined in C1 symmetry? Are there classes displaying C1 or C2 symmetry, or is the four-fold symmetry preserved across all refined classes?"

      We exclusively observe complexes with four γ1 subunits. This is also in agreement with the other two recent publications reporting Slo1-γ1 complex structures, but could in principle be an artifact of artificial overexpression. Also when we refine the particles in C1, we retain C4 symmetry and do not observe any classes with C2 or C1 symmetry.

      The authors utilized nearly 1.9 million particles to reconstruct the final class, resulting in a high resolution. Is such a large number of particles truly necessary to achieve high resolution in this context?

      The large number of particles is not strictly necessary, i.e. we could obtain similar quality by using fewer particles. In the end, we have now further classified down to ~827k particles, which very slightly improved the resolution and quality of the map.

      Authros mentioned that F273 of γ1 forms pi-stacking interactions, it remains unclear with which components of the channel these interactions occur.

      F273 forms (slightly distorted) T stacking interactions with F164 in S2 and F187 in S3. We now changed the sentence in the manuscript to mention the residues that line the hydrophobic pocket to make it more clear which elements contribute to the interaction with F273.

      The authors propose that the disulfide bond between the γ subunit and Slo1 could play a crucial role in their interaction. Was there any observation of a covalent linkage in SDS page analysis? Furthermore, how would this interaction be affected if either cysteine C253 of gamma1 or C141 on the channel were mutated or neutralized?"

      We have run all our SDS-PAGE experiments under reducing conditions, thus destroying any disulfide bridges that might have been present in the complex. We have now, however obtained a slightly better defined reconstruction (as pointed out in our answer to point 5 raised by this reviewer) where we do not see as clear continuous density anymore between the two cysteine side chains. Thus, we have removed the cystine bond from the final model and have adjusted text and figures accordingly. We still think that it might be more than coincidence that those two side chains come into such close proximity, though, and still discuss the possibility of a cystine bridge in the manuscript.

      Author's state that "The presence of several immobile positive charges on the intracellular side in close proximity to the VSD as in the case of the Slo1-γ1 complex is likely to locally lower the resting state potential and repulse the gating charges, thereby reducing the energy to overcome for the VSD to transition to the active conformation." Authors need to be little more elaborative here as it is not clear what authors mean repulse of gating charges.

      We have expanded our description of the proposed repulsive effect of the positive charges in the manuscript and in addition also discuss the additional role of the charges in stabilizing the Ca2+-bound conformation of the gating ring as proposed by Yamanouchi et al.

      Probably beyond this study but I was wondering whether it is possible that Beta and gamma subunit can together assemble as heteromers to form a cage-like structure with contribution from both.

      We agree with the reviewer that this is an interesting question which we have also thought about and one which should be tested, but as the reviewer already mentioned, this would go beyond the present study and should be subject to an independent follow-up investigation.

      Are there any specific lipids observed within the structure that could potentially contribute to the functional conformation or stability of the complex?"

      Given the high resolution of our structure, we observe a number of ordered lipid and detergent molecules, most of which were located at similar positions as in previous structures of Slo channels. Besides those molecules clustering in the deep cleft between neighboring voltage-sensor domains, we also observe lipid densities close to the binding site of γ1 on the distal side of the VSD. However, as their relevance for γ1 binding is unclear, we don’t discuss them in the manuscript. In general, of course, we agree with the reviewer that lipids can have a large impact on the function of membrane proteins.

      It would be interesting to see if the kink in the gamma subunit is entirely neutralized through mutations of proline and glycine, how these alteration might impact the assembly of the mutated gamma subunit with the channel. The authors should provide insights into whether this mutated form of the gamma subunit assembles effectively with the channel and whether there are functional consequences associated with this alteration.

      As shown by Kallure et al., substituting P270 in the kink by serine (the native residue at this position in γ3) strongly diminished the ability of γ1 to associate with Slo1 in vitro, demonstrating the importance of the kink and providing a rationale for the observed differences in the potency of the TM helices of γ1 and γ3 in Slo1 activation.

      It would be generally beneficial for the authors to provide functional insights that can support the physiological relevance of this kink in the gamma subunit. Understanding the potential consequences of this mutation and its implications for the assembly and function of the channel complex will offer valuable insights into the physiological role of the kink.

      We absolutely agree with the reviewer that functional insights on the relevance of the kink would be very valuable, but we think that the available experimental data together with the natural sequence differences in γ1-γ4 and the correlation with their physiological activity are very clear indications that the kink is relevant. However, future follow-up studies that prove this beyond any doubt would be valuable.

      Is it known that binding of beta or gamma subunit can impact the subsequent binding of beta and gamma to channels. If it is, it need to be discussed briefly in the discussion part.

      This is, to the best of our knowledge, not known. The only existing data that suggests co-presence of beta and gamma subunits on Slo1, reported in Gonzalez-Perez et al., 2015, stems from electrophysiological experiments and does not reveal anything about hierarchy and temporal order of binding events.

      Reviewer #3 (Significance (Required)):

      The Slo channel, also known as the large-conductance calcium-activated potassium channel or BK channel, is an ion channel type found in various cell membranes, including neurons, muscle cells, and other tissue types. Its key features encompass Ca2+ activation, voltage dependence, and regulation by auxiliary subunits. Different auxiliary subunits have been shown to modulate channel functions distinctly; notably, the γ1 subunit enables channel activation at lower voltages compared to the wild-type channel. This manuscript offers a structural-functional framework that enhances our comprehension of how Slo channels are regulated by auxiliary subunits, such as gamma and beta subunits. While the structure of Slo channels in complex with the beta subunit is understood, the binding and interaction of the gamma subunit with the channels remain elusive due to the absence of corresponding structures. Along these lines, the presented structure here indeed provides new insights into the regulation of Slo channel activity by the gamma subunit.

      We thank the reviewer for this positive assessment of the data and agree that our structural data, also when regarded together with the complementary manuscripts by Kallure et al. and Yamanouchi et al., provides significant new insight into the assembly and activity of γ subunits.

    1. We may characterize this process with reference to thechanges which it brings about in the familiar instinctual dispositions of human beings, to satisfy which is,after all, the economic task of our lives. A few of these instincts are used up in such a manner thatsomething appears in their place which, in an individual, we describe as a character-trait.

      You can see this within different generations. For example, how boomers and gen z do not think the same as society has changed and so has different norms. It is more normal for gen Z to stay with their parents as long as they can because of factors that have changed from the boomer generation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The current manuscript focuses on the adenine phosphoribosyltransferase (Aprt) and how the lack of its function affects nervous system function. It puts it into the context of Lesch-Nyhan disease, a rare hereditary disease linked to hypoxanthine-guanine phosphoribosyltransferase (HGPRT). Since HGPRT appears absent in Drosophila, the study focuses initially on Aprt and shows that aprt mutants have a decreased life-span and altered uric acid levels (the latter can be attenuated by allopurinol treatment). Moreover, aprt mutants show defects in locomotor reactivity behaviors. A comparable phenotype can be observed when specifically knocking down aprt in dopaminergic cells. Interestingly, also glia-specific knock-down caused a similar behavioral defect, which could not be restored when re-expressing UAS-aprt, while neuronal re-expression did restore the mutant phenotype. Moreover, mutants, pan-neuronal and pan-neuronal plus glia RNAi for aprt caused sleep-defects. Based on immunostainings Dopamine levels are increased; UPLC shows that adenosine levels are reduced and PCR showed in increase of Ent2 levels are increased (but not AdoR). Moreover, aprt mutants display seizure-like behaviors, which can be partly restored by purine feeding (adenosine and N6methyladenosine). Finally, expression of the human HGPRT also causes locomotor defects.

      The authors provide a wide range of genetic experimental data to assess behavior and some molecular assessment on how the defects may emerge. It is clearly written, and the arguments follow the experimental evidence that is provided. The findings provide a new example of how manipulating specific genes in the fruit fly allows the study of fundamental molecular processes that are linked to a human disease.

      We thank the reviewer for his clear understanding and positive assessment of our work.

      Reviewer #2 (Public Review):

      The manuscript by Petitgas et al demonstrates that loss of function for the only enzyme responsible for the purine salvage pathway in fruit-flies reproduces the metabolic and neurologic phenotypes of human patients with Lesch-Nyhan disease (LND). LND is caused by mutations in the enzyme HGPRT, but this enzyme does not exist in fruit-flies, which instead only have Aprt for purine recycling. They demonstrate that mutants lacking the Aprt enzyme accumulate uric acid, which like in humans can be rescued by feeding flies allopurinol, and have decreased longevity, locomotion and sleep impairments and seizures, with striking resemblance to HGPRT loss of function in humans. They demonstrate that both loss of function throughout development or specifically in the adult ubiquitously or in all neurons, or dopaminergic neurons, mushroom body neurons or glia, can reproduce the phenotypes (although knock-down in glia does not affect sleep). They show that the phenotypes can be rescued by over-expressing a wild-type form of the Aprt gene in neurons. They identify a decrease in adenosine levels as the cause underlying these phenotypes, as adenosine is a neurotransmitter functioning via the purinergic adenosine receptor in neurons. In fact, feeding flies throughout development and in the adult with either adenosine or m6A could prevent seizures. They also demonstrate that loss of adenosine caused a secondary up-regulation of ENT nucleoside transporters and of dopamine levels, that could explain the phenotypes of decreased sleep and hyperactivity and night. Finally, they provide the remarkable finding that over-expression of the human mutant HGPRT gene but not its wild-type form in neurons impaired locomotion and induced seizures. This means that the human mutant enzyme does not simply lack enzymatic activity, but it is toxic to neurons in some gain-of-function form. Altogether, these are very important and fundamental findings that convincingly demonstrate the establishment of a Drosophila model for the scientific community to investigate LND, to carry out drug testing screens and find cures.

      We thank the reviewer for his clear understanding and positive assessment of our work.

      The experiments are conducted with great rigour, using appropriate and exhaustive controls, and on the whole the evidence does convincingly or compellingly support the claims. The exception is an instance when authors mention 'data not shown' and here data should either be provided, or claims removed: "feeding flies with adenosine or m6A did not rescue the SING phenotype of Aprt mutants (data not shown)". It is important to show these data (see below).

      As recommended by the reviewer, these results are now shown in the new Figure S15.

      Sleep is used to refer to lack of movement of flies to cross a beam for more than 5 minutes. However, lack of movement does not necessarily mean the flies are asleep, as they could be un-motivated to move (which could reflect abnormal dopamine levels) or engaged in incessant grooming instead. These differences are important for future investigation into the neural circuits affect by LND.

      We agree that the method we used could overestimate sleep duration because flies that don't move do not necessarily sleep either, as it is the case with brain-dopamine deficient flies (Riemensperger et al., PNAS 2011). To address this issue, we have recorded video data showing that after 5 min of inactivity, wild-type and Aprt5 mutant flies are less sensitive to stimulation, indicating that they were indeed asleep. This is now shown in the new Figure S10 and mentioned on page 17, lines 338-339 in the main text. In addition, in this work we report that Aprt mutant flies have a nocturnal insomnia phenotype. Sleep overestimation is not, therefore, an issue that could challenge these results.

      The authors claim that based on BLAST genome searchers, there are no HPRTI (encoding HGPRT) homologues in Drosophila. However, such a claim would require instead structure-based searches that take into account structural conservation despite high sequence divergence, as this may not be detected by regular BLAST.

      To reinforce our conclusions about the lack of homologue of the human HPRT1 gene in Drosophila, we have now added a Results section about the evolution of HGPRT proteins on pages 6-7, lines 122150, and two phylogenetic analyses as new Figures S2 and S3 with more details in legends. We have also carried out structural similarity searches against the RCSB PDB repository. The structural analysis did not identify any relevant similarity with HGPRT 3D structures in Insecta (mentioned lines 146-150). We hope these new analyses address the Reviewer's concerns. Furthermore, as shown in Table S2, no enzymatic HGPRT activity could be detected in extracts of wild-type Drosophila. A protein that would be structurally similar to human HGPRT but with a divergent sequence could not be involved in purine recycling without expressing HGPRT-like activity. In contrast, enzymatic Aprt activity could be easily detected in this organism (Figure S4 and Table S1).

      This work raises important questions that still need resolving. For example, the link between uric acid accumulation, reduced adenosine levels, increased dopamine and behavioural neurologic consequences remain unresolved. It is important that they show that restoring uric acid levels does not rescue locomotion nor seizure phenotypes, as this means that this is not the cause of the neurologic phenotypes.

      We agree with the reviewer about the potential importance of our results and the need to resolve the exact origin of the neurological phenotypes. This would need to be addressed in further studies in our opinion. The fact that allopurinol treatment did not improve the locomotor ability of Aprt5 mutant flies is now shown in Figure 1D, E to emphasize this result. Results showing that allopurinol does not rescue the bang-sensitivity phenotype of Aprt-deficient mutants are shown in Figure S14.

      Instead, their data indicate adenosine deficiency is the cause. However, one weakness is that for the manipulations they test some behaviours but not all. The authors could attempt to improve the link between mechanism and behaviour by testing whether over-expression of Aprt in neurons or glia, throughout development or in the adult, and feeding with adenosine and m6A can rescue each of the behavioural phenotypes handled: lifespan, SING, sleep and seizures. The authors could also attempt to knock-down dopamine levels concomitantly with feeding with adenosine or m6A to see if this rescues the phenotypes of SING and sleep.

      The reviewer is right. However, carrying out all these experiments properly with enough repeats will require about two more years of work. Because of that, they could not be included in the revision of the present article. Here we show that Aprt overexpression in neurons, but not in glia, rescues the SING phenotype of Aprt5 mutants (Figure 2B and 2E). We have also added in the revised article the new result that Aprt overexpression reduces transcript levels of DTH1, which codes for the neural form of the dopamine-synthesizing enzyme tyrosine hydroxylase (new Figure 5F).

      Visualising the neural circuits that express the adenosine receptor could reveal why the deficit in adenosine can affect distinct behaviours differentially, and which neurologic phenotypes are primary and which secondary consequences of the mutations. This would allow them to carry out epistasis analysis by knocking-down AdoR in specific circuits, whilst at the same time feeding Aprt mutants with Adenosine.

      Deciphering the specific circuits involved in the various effects of adenosine would indeed be extremely interesting. Unfortunately very few is currently known about the neural circuits that express AdoR in flies. No antibody is available to detect this receptor in situ and mutated AdoR gene coding for a tagged form of the receptor has not been engineered yet to our knowledge.

      The revelation that the mutant form of human HGPRT has toxic effects is very intriguing and important and it invites the community to investigate this further into the future.

      To conclude, this is a fundamental piece of work that opens the opportunity for the broader scientific community to use Drosophila to investigate LND.

      We sincerely thank the reviewer for his thoughtful and positive comments on our work.

      Reviewer #3 (Public Review):

      The study attempts to develop a Drosophila model for the human disease of LND. The issue here, and the main weakness of this study, is that Drosophila does not express the enzyme, HGPRT, which when mutated causes LND. The authors, instead, mutate the functionally-related Drosophila Aprt enzyme. However, it is unknown whether Aprt is also a structural homologue. Because of this, it will likely not be possible to identify pharmacological compounds that rescue HGPRT activity via a direct interaction (unless modelling predicts high conservation of substrate binding pocket between the two enzymes, etc).

      As stated in our Provisional Responses prior to revision of the Reviewed Preprint, the enzymes APRT and HGPRT are actually known to be functionally and structurally related. We apologize for not providing this information in the original submission. This point is now made clearer in the revised article on page 39, lines 785-792. Indeed, both human APRT and HGPRT belong to the type I PRTases family identified by a conserved phosphoribosyl pyrophosphate (PRPP) binding motif, which is used as a substrate to transfer phosphoribosyl to purines. This binding motif is only found in PRTases from the nucleotide synthesis and salvage pathways (see: Sinha and Smith (2001) Curr Opin Struct Biol 11(6):733-9, doi: 10.1016/s0959-440x(01)00274-3). The purine substrates adenine, hypoxanthine and guanine share the same chemical skeleton and APRT can bind hypoxanthine, indicating that APRT and HGPRT also share similarities in their substrate binding sites (Ozeir et al. (2019) J Biol Chem. 294(32):11980-11991, doi: 10.1074/jbc.RA119.009087). Moreover, Drosophila Aprt and Human APRT are closely related as the amino acid sequences of APRT proteins have been highly conserved throughout evolution (see Figure S5B in our paper).

      An additional weakness is that the study does not identify a molecule that may act as a lead compound for further development for treating LND. Rather, the various rescues reported are selective for only a subset of the disease-associated phenotypes. Thus, whilst informative, this first section of the study does not meet the study ambitions.

      In this study, we identify adenosine and N6-methyladenosine as rescuers of the epileptic behavior in Aprt mutant flies (shown in Figure 7E, F). Interestingly, the same molecules have been found to rescue the viability of fibroblasts and neural stem cells derived from iPSCs of LND patients, in which de novo purine synthesis was prevented (discussed on page 38, lines 747-753). This suggests that the Drosophila model reported here could help to identify new genetic targets and pharmacological compounds capable to rescue HGPRT mutations in humans.

      The second approach adopted is to express a 'humanised mutated' form of HGPRT in Drosophila, which holds more promise for the development of a pharmacological screen. In particular, the locomotor defect is recapitulated but the seizure-like activity, whilst reported as being recapitulated, is debatable. A recovery time of 2.3 seconds is very much less than timings for typical seizure mutants. Nevertheless, the SING behaviour could be sufficient to screen against. However, this is not explored.

      We agree with the reviewer that it would be very interesting to do a pharmacological screen in this second LND model. However, we did not have the possibility to carry out such a screen yet.

      In summary, this is a largely descriptive study reporting the behavioural effects of an Aprt loss-offunction mutation. RNAi KD and rescue expression studies suggest that a mix of neuronal (particularly dopaminergic and possibly adenosinergic signalling pathways) and glia are involved in the behavioural phenotypes affecting locomotion, sleep and seizure. There is insufficient evidence to have confidence that the Arpt fly model will prove valuable for understanding / treating LND.

      Here we report many common phenotypes between the Aprt fly model and the symptoms of LND patients (reduced longevity, locomotor problems, sleep defects, overproduction of uric acid that is rescued by allopurinol treatment…). Moreover, APRT and HGPRT enzymes are both functional and structural homologues, as explained in our answers. We also found that the same drugs can rescue the seizure-like phenotype in Aprt-deficient flies and the viability of LND fibroblasts and neural stem cells, derived from iPSCs of LND patients, in which de novo purine synthesis is prevented (Figure 7E, F). In many respects, our results therefore suggest that Aprt mutant flies could be useful to better understand LND, and potentially to screen for new therapeutic compounds.

      From the Reviewing Editor:

      (1) How are the pathways of purine catabolism different between flies and mammals? How does the absence of HGPRT and presence of only AGPRT affect purine catabolism? When did HGPRT appear in evolution?

      Purine catabolism is quite similar in flies and mammals, except for the lack of urate oxidase in primates, as described in Figure S1. We added words in the revised article about purine anabolism/catabolism pathways lines 123-126 (see below our detailed response to Reviewer 1’s Recommandations). HGPRT is present in Bacteria, Archea and Eukaryota, and nearly all animal phyla. However, BLAST search indicates that HGPRT homologues cannot be found in most insect species, such as Drosophila. To reinforce our conclusions about the lack of homologue of the human HPRT1 gene in Drosophila melanogaster, we have now added a Results section about the evolution of HGPRT proteins on pages 6-7, lines 122-150, and two phylogenetic analyses as new Figures S2 and S3 with details in legends.

      In addition to BLAST a structural based modelling method should be used to establish the loss of HGPRT in Drosophila.

      In agreement with the phylogenetic analyses, we have confirmed that no HGPRT enzymatic activity can be detected in wild-type Drosophila extract (Table S2). To complete these observations, as recommended by reviewer #2, we have carried out 3D structure-based searches in the RCSB Protein Data Bank. This enabled us to compare human HGPRT with all currently available protein structures. W found no Drosophila protein with a divergent sequence showing relevant structural similarity to human HGPRT. In contrast, this search identified proteins similar to human HGPRT in many other species of Eukaryota, Archea and Bacteria. This is now mentioned on page 7, lines 146-150 in the revised article.

      (2) Of the three biochemical changes reported the change in dopamine levels should be validated by other methods given the unreliable nature of IHC.

      As recommended by Reviewer #1, we have added the results of new experiments carried out by RTqPCR and Western blotting, which confirm the effect of Aprt mutation on brain dopamine levels. In addition, we added the consistent result that Aprt overexpression reduces transcript levels of DTH1. The results are shown in the new panels E to H of Figure 5 and mentioned in the text on page 20, lines 385-389.

      (3) As suggested by reviewer 2 it would be helpful to clearly identify which of the three biochemical changes (DA, uric acid, adenosine) are responsible for the numerous behaviours tested. This is important because it is relevant for developing any therapeutic strategy arising from this study.

      We agree that it would be very interesting to decipher the relationship between the different behaviors observed in mutant flies and the biochemical changes (dopamine, uric acid or adenosine). However, this would require a large amount of new experiments and it would probably double the size of our paper, which already includes many original data. In our opinion, such a detailed study should logically be the purpose of another article.

      (4) There is concern regarding the robustness of the seizure data. Reviewer 3 has suggestions on how to address this.

      See our answers to Reviewer 3’s recommendations below.

      (5) Editorial corrections and changes suggested by reviewers 2 and 3 need to be addressed.

      As indicated in our answers, we have taken into account and when possible addressed the corrections and changes suggested by the reviewers.

      (6) It is recommended that the authors tone down the relevance of this model for LND, particularly in the abstract. The focus should be on stating what is actually delivered.

      As recommended by the reviewing editor, and to take in account the reserved comments of reviewer #3, we have toned down our affirmation that our new fly models are relevant for LND in the last sentences of the Abstract and Discussion, and also added a question mark in the subtitle of the Discussion on line 777. As mentioned in our provisional responses to the Public Reviews, we would like to emphasize, however, that reviewers #1 and #2 expressed more confidence than reviewer #3 in the potential usefulness of our work. Reviewer #1 indeed stated that: “The findings provide a new example of how manipulating specific genes in the fruit fly allows the study of fundamental molecular processes that are linked to a human disease”, and reviewer #2 further wrote: "Altogether, these are very important and fundamental findings that convincingly demonstrate the establishment of a Drosophila model for the scientific community to investigate LND, to carry out drug testing screens and find cures”, and added: “To conclude, this is a fundamental piece of work that opens the opportunity for the broader scien2fic community to use Drosophila to inves2gate LND”.

      Reviewer #1 (Recommendations For The Authors):

      • An important prerequisite for the current study is that there appears to be no HGPRT "activity" in Drosophila. It is initially stated that there was previously no "HGPRT activity observed" in two papers form the 70ies. It would be important to corroborate this notion and provide some background on the <br /> /catabolism pathways. How shared or divergent are these pathways between Drosophila and mammals?

      In agreement with the pioneering studies of Becker (1974a, b), we have confirmed in this work that no HGPRT enzymatic activity can be detected in wild-type Drosophila extracts, as mentioned in Results on page 6, lines 127-130 and reported in Table S2. Purine catabolism is quite similar in flies and mammals, except for the lack of urate oxidase in primates, as shown in Figure S1. All the enzymes involved in purine anabolism/catabolim or recycling in humans have been conserved in Drosophila and humans, with the notorious exception of HPRT1.

      If there is no HGPRT gene, but only the APRT ortholog, what would this mean for the metabolites? Our enzymatic assays on Drosophila extracts indicated that hypoxanthine and guanine cannot be recycled into IMP and GMP, respectively, contrary to adenine which can be converted into AMP in flies. In the absence of HGPRT activity, GMP and IMP could be produced by de novo purine synthesis, or, alternatively, synthesized from AMP, which can be converted into IMP by the enzyme AMPD, and then IMP can be converted into GMP by the enzymes IMPDH and GMPS. These metabolic pathways are depicted in Figure S1A.

      Is the lack of HGPRT specific for Drosophila, insects (generally in invertebrates)? I feel clarifying this would provide more insight into the motivation of the experimental approach.

      As suggested by the Reviewer and the Reviewing Editor, we have addressed the evolution of HGPRT proteins more precisely in the revision. We have added a section on this subject in Results on pages 67, lines 122-150, and two phylogenetic analyses as Figures S2 and S3 with details in legends. A phylogenetic analysis was carried out a few years ago by Giorgio Matassi, who is now co-author of this paper. The most striking result was the great impact of horizontal gene transfer in the evolution of HGPRT in Insects (Figures S2 and S3). Our analysis of the phyletic distribution of HGPRT proteins revealed their striking rareness in Insecta, and in particular, their absence in Drosophilidae. The PSIBlast search detected however a significant hit in Drosophila immigrans (accession KAH8256851.1). Yet, this sequence is 100% identical to the HGPRT of the Gammaroteobacterium Serratia marcescens. Indeed, a phylogenetic analysis showed that D. immigrans HGPRT clusters with the Serratia genus (see Figure S3). This can be interpreted either a contamination of the sequenced sample, or as a very recent horizontal gene transfer event. The second scenario is more likely for the corresponding nucleotide sequences differ by 5 synonymous substitutions (out of 534 positions). A powerful approach to try to understand the "origin" of the D. immigrans protein would be to analyze whether horizontal gene transfer has affected its chromosomal neighbours. This approach, proposed previously by G. Matassi (BMC Evol Biol, 2017, 17:2, doi: 10.1186/s12862-016-0850-6), is highly demanding in terms of computing time and would require an ad hoc study. We hope that these new analyses address the Reviewer's concerns.

      • On the mechanistic side on how the behavioral defects may arise, the authors show that dopaminergic neurons (and glia cells) are involved. One interesting finding is that dopamine immunostainings suggest increased dopamine levels. However, immunostainings are notorious for artifacts and do not provide a strong quantitative assessment. I feel it would be helpful to have an alternative technique to corroborate this finding.

      We agree with the reviewer and we added the results of further confirmatory experiments in the four new panels E-H of Figure 5, showing that: 1) the transcript levels of DTH1 (encoding the neuronal isoform of the dopamine-synthesizing enzyme tyrosine hydroxylase in Drosophila) are increased in Aprt5 mutants compared to wild-type flies (new Figure 5E), 2) consistent with this, DTH1 transcript levels were found in contrast to be decreased when Aprt was overexpressed ubiquitously in flies (new Figure 5F), 3) Western blot experiments showed that DTH1 protein levels are also increased in Aprt5 mutant flies compared to controls (new Figure 5G-H).

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in the public review, the behavioural phenotypes of decreased lifespan, SING, sleep and seizures could be tested for all manipulations: feeding with allopurinol, adenosine and m6A, and combining this with knock-down dopamine levels in PAMs or MBs. This could help dissect the relationship between mutations in Aprt and behaviour.

      We thank the reviewer for these suggestions, and, indeed, we would have liked to do all these experiments. However, as mentioned in our responses to the Public Reviews, carrying out these experiments properly with sufficient repeats would require about two more years of work. We have already accumulated a large amount of data, so we have decided to publish our results at this stage in order to make our new fly models available to the scientific community. We are giving careful and due consideration to these experimental proposals and we hope to continue our investigation on this topic in the future.

      It would also be helpful to find out which neurons and glia express AdoR. Perhaps there are already tools available the authors could test or at least check with the scRNAseq Fly Atlas (public Scope database).

      Following the reviewer’s recommendation, we have checked the scRNAseq Fly Atlas for AdoR expression in the brain, compared to that of ple (encoding tyrosine hydroxylase) and Eaat1 (encoding the astrocytic glutamate transporter). As shown in the image below, the results are not very informative. AdoR appears to be expressed in rather widespread subsets of neurons and glial cells, that partly overlap with ple and Eaat1 expression. Further work would be required to identify more precisely the neurons and glial cells expressing AdoR in the brain.

      Author response image 1.

      Page 7, line 161: use of the word 'normalize'. "We tried to normalise uric acid content in flies..." would best to use 'rescue' instead, as normalisation in science has a different meaning.

      We modified this word as suggested.

      Page 9 line 203: 'genomic deficiencies that cover': the genetic term is 'uncover', as a deficiency for a locus reveals a phenotypes, thus it is said 'a gene uncovered by xx deficiency".

      Thank you for this helpful remark. We corrected this in line 221.

      Page 10, lines 206-208: 'allopurinol treatment did not improve the locomotor activity...". These are important observations that should be best presented within the main manuscript Figure 1.

      As recommended, we have transferred the graphs of Figure S5 to new panels D and E of Figure 1.

      Figure 4: please indicate genotypes in the figure, where no information is given that these are UASAprt-RNAi experiments.

      We added the complete genotype in Figure 4G, and also in Figure S12C and D. Thank you for noting that.

      Page 25 line 491: "None of these drugs was able to rescue the SING defects (data not shown)". Either provide the data or remove this claim.

      We have added these data in the new Figure S15.

      Statistical analyses: details are provided in the methods, but the name of test and multiple comparisons corrections should be also provided in the legends.

      Thank you very much for the careful proofreading. This was an oversight and we have added the information in all legends of the revised article.

      Reviewer #3 (Recommendations For The Authors):

      This is a difficult manuscript to appreciate. The abstract and introduction suggest that the study is to identify novel treatments for a human disease (LND) by development of a Drosophila model. Much of the results, however, are focussed to describing the consequences to purine metabolism of the Aprt mutation. To my mind, a rewrite to focus on the latter would be beneficial. The potential applicability to LND would be best restricted to the discussion.

      We apologize for not making our goals clearer. Our purpose was to find out if purine recycling deficiency could lead to metabolic and neurobehavioral disturbances in Drosophila, as it is the case in human LND patients when HGPRT is mutated. Interestingly, we observed that mutation of the only purine recycling enzyme in flies, Aprt, did induce defects in part comparable to that of LND in humans, including overproduction of uric acid that is rescued by allopurinol treatment, reduced longevity, and various neurobehavioral phenotypes including bang-sensitive seizure, sleep defects and locomotor impairments. We also identified adenosine and N6-methyladenosine as rescuers of the epileptic behavior in these mutants. These drugs were also identified as therapeutic candidates in screens based on iPSCs from LND patients. This suggests that Aprt deficiency in Drosophila could be used as a model to better understand this disease and find new therapeutic targets.

      Regardless of the above comment, the concluding sentence of the abstract is inappropriate. This study does not show that Drosophila can be used to identify a cure for LND.

      We agree with the Reviewer that the last sentence of the abstract was too affimative. As also recommended by the reviewing editor, we have modified this sentence in the abstract and other sentences in the text in order to tone down the affirmation that our new fly models are relevant for LND. See our answers to the Reviewing Editor above for details.

      Indeed, I would challenge the premise that screening against a functional, but unknown if structural, homologue (Aprt) will ever provide an exploitable opportunity. To meet this statement, this study needs to identify a treatment that rescues all of the behavioural phenotypes associated with the Aprt mutation, in addition to rescuing the influences of the mis-expression of mutated HGPRT.

      APRT and HGPRT are both functionally and structurally related. Both human APRT and HGPRT belong to the type I PRTases family identified by a conserved phosphoribosyl pyrophosphate (PRPP) binding motif, which is used as a substrate to transfer phosphoribosyl to purines. This binding motif is only found in PRTases from the nucleotide synthesis and salvage pathways (see: Sinha and Smith (2001) Curr Opin Struct Biol 11(6):733-9733-9, doi: 10.1016/s0959-440x(01)00274-3). The purine substrates adenine, hypoxanthine and guanine share the same chemical skeleton and APRT can bind hypoxanthine, indicating that APRT and HGPRT also share similarities in their substrate binding sites (Ozeir et al. (2019) J Biol Chem. 294(32): 11980-11991, doi: 10.1074/jbc.RA119.009087)). This point has been made clearer in the Discussion page 39, in lines 785-792.. Finally, Drosophila Aprt and Human APRT are closely related as the amino acid sequences of APRTs have been highly conserved throughout evolution (shown in Figure S5B).

      With respect to expression of the mutated HGPRT: the short seizure recovery time of 2.3 seconds is not very convincing evidence of a seizure phenotype. This is far below the timings reported for typical BS mutations. Because of this, the authors should run a positive control (e.g. one of the wellestablished BS mutations: parabss, eas or jus) to validate their assay. Moreover, was the seizure induced by the Aprt mutation (17.3 secs - again a low value) rescued by prior exposure to an antiepileptic? Could this behaviour be, instead, related to the SING locomotor phenotype?

      The assay we used to test for bang-sensitivity has been validated in previous articles from different laboratories. We agree that the recovery times we observed were shorter than those of the BS mutations mentioned by the reviewer. However, we could cite another Drosophila BS mutant, porin, that shows similarly short recovery times (2.5 and 6 sec, according to the porin alleles tested, Graham et al. J Biol Chem. 2010, doi: 10.1074/jbc.M109.080317). This is now mentioned on page 36 lines 717-720). In addition, the BS phenotype we observed with Aprt mutants was robust and highly significant compared to control flies (Figure 7). We did not try to rescue this phenotype by exposing the flies to an antiepileptic, but we do not think that it can be related to the SING phenotype. Indeed, providing adenosine or N6-methyladenosine to Aprt5 mutant flies was able to rescue the BS phenotype (Figure 7E, F), but did not rescue the locomotor defects (new Figure S15). Moreover, SING performances of Aprt5 mutant flies at 8 or 30 d a. E. are decreased nearly in almost identical way (Figure 1C), while we observed an effect on BS behavior at 30 d a. E., which implies that the SING and BS behaviors are most likely unrelated.

      Line 731 states that 'Aprt mutants show a typical BS phenotype' - whilst accurate to some extent (e.g. the behaviour depicted in the supp videos), it should be made clear, it should be made clear that the recovery time is uncharacteristically short and thus differs from typical BS mutations.

      We have corrected the sentence in the revised article to mention that (page 36, lines 717-718).

      Line 732 stating that BS phenotype is often linked to neuronal activity - what other links would there be? Even if via glia or other tissues the final effect is via neurons.

      We have modified this sentence (page 36, line 720).

      The introduction and, particularly, the discussion are overly long and, in the case of the latter, repetitive of the results text. Pruning to make the paper more concise would be very beneficial. Removal of the extensive speculation about how DA and adenosine may interact would help in this regard (line 688 onwards). Indeed, in many places the discussion morphs into a review.

      We agree with the reviewer on this point, and have therefore done our best to shorten the Introduction and Discussion, which are now 24% and 21% shorter, respectively, in the revised article compared to the original submission.

      The applicability of using Drosophila Aprt mutations to screen for compounds that may treat LND is predicated on some degree of similarity in either enzyme structure or metabolic pathways. A discussion of how relevant, therefore, studying Aprt is needs to be included. Given the authors insights - where should potential new rugs be targeted to?

      As stated above, we now mention in the article that APRT and HGPRT share similarities in their structure. In addition, the metabolic pathways between humans and Drosophila have been largely conserved (shown in Figure S1B).

    1. Before we talk about public criticism and shaming and adults, let’s look at the role of shame in childhood. In at least some views about shame and childhood1, shame and guilt hold different roles in childhood development: Shame is the feeling that “I am bad,” and the natural response to shame is for the individual to hide, or the community to ostracize the person. Guilt is the feeling that “This specific action I did was bad.” The natural response to feeling guilt is for the guilty person to want to repair the harm of their action. In this view, a good parent might see their child doing something bad or dangerous, and tell them to stop. The child may feel shame (they might not be developmentally able to separate their identity from the momentary rejection). The parent may then comfort the child to let the child know that they are not being rejected as a person, it was just their action that was a problem. The child’s relationship with the parent is repaired, and over time the child will learn to feel guilt instead of shame and seek to repair harm instead of hide.

      I think whats important is supporting children's emotional well-being, social growth, and moral knowledge during childhood requires an awareness of and appropriate response to the subtle differences between shame and guilt. It makes it possible for adults to help kids respond constructively to errors and setbacks, establishing the foundation for them to grow into emotionally strong, socially proficient, and morally aware adults.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The coupling between cell polarity and cell cycle progression is an important aspect of symmetric and asymmetric cell division. Although there are several examples of cell cycle kinases phosphorylating polarity proteins, it has been difficult to assess the importance of these on cell division due to the strong and pleiotropic effects of manipulating these kinases. Here, the authors generate an analogue-sensitive allele of cdk1 in flies to tackle this question in neuroblasts (NBs) and sensory organ precursors (SOPs), two well characterised examples of asymmetric cell divisions. They show that partial Cdk1 inhibition (which still allows cell cycle progression) does not block Bazooka (PARD3 in mammals) polarization in NBs, but prevents coalescence of the Baz crescent, which has previously been shown to be an actomyosin-based process. They further identify a Cdk1 consensus site on Baz (S180) for which they generate a phospho-specific antibody, allowing them to show that this site is specifically phosphorylated in dividing NBs and SOPs. Although mutations at this site do not recapitulate the effect of Cdk1 on Baz coalescence, they do delay Miranda polarization in NBs and affect lateral inhibition and asymmetric cell division of SOPs. Finally, the authors show that human PARD3 can also be phosphorylated by Cyclin B/Cdk1 in vitro.

      Major comments:

      • Figure 2A: it would be good to show that polarization of Baz::GFP in consecutive divisions is maintained in cdk1as2 animals in the absence of 1-NA-PP1. We now show in Fig S2B a panel with two consecutive divisions of a cdk1as2 neuroblast in the absence of 1-NAP-PP1, followed by a third division in the presence of 1-NAP-PP1. The neuroblast shows high levels of Baz polarization in the two first divisions.

      • The interpretation of the observed SOP phenotypes is complicated by the uneven expression of the pnr-GAL4 driver and the fact that it is expressed in epithelial cells rather than just SOPs. The authors could express their control and mutant Baz constructs under the control of neurP72-GAL4. It is not likely they would be able to deplete endogenous Baz as they have done in NBs, as neurP72-GAL4 is expressed too late to deplete most proteins before SOP division, but they could at least look at localization of the mutants and any possible gain-of-function phenotypes.

      Following this suggestion, we have recombined Neur-GAL4 with UAS-delta RNAi to attempt to deplete both endogenous Baz::mScarlet and Delta while expressing our Baz::GFP constructs specifically in SOPs. Baz::mScarlet depletion was surprisingly efficient considering, as the reviewer points out, the late timing of Neur-GAL4 expression. However, the adult flies did not present any sensory organs transformations, perhaps because Delta might not be as efficiently depleted. We can at least rule out dominant-negative effects.

      We thank the reviewer for his constructive feedback and as suggested, we now extensively analysed the localisation of the Baz-S180 mutants in SOPs and found significant defects. We describe these observations in a new Figure 6. Briefly, we observed that the Baz phosphomutants have localisation defects during the pIIa cell division but not the pI cell division. We also observed a very surprising mosaicism of expression of our UASz-driven constructs within the SOP lineage that allowed us to make a few interesting observations which should be of interest to SOP specialists. Briefly, mosaic expression of Baz::GFP within the SOP lineage allowed to analyse the relative contributions of pIIa and pIIb/pIIIb to different Baz cortical pools and revealed an unexpected cell non-autonomous mechanism controlling pIIb division orientation. We describe these findings in a new associated supplemental figure.

      The authors speculate that Baz phosphorylation during lateral inhibition may be the reason for the observed excess specification of SOPs in the S180 mutants. This could easily be tested by looking at their antibody staining at earlier stages in the notum. Following this suggestion (also coming from Reviewer #2), we have stained nota between around 8h APF. We observed that patches of cells of the early notum display a strong Baz-pS180 phospho-signal. These patches partially overlap with the Delta-positive stripes in which lateral inhibition occurs (as described for example in (Corson et al., 2017), consistent with the possibility that Baz-S180 phosphorylation does somehow regulate lateral inhibition.

      These new experiments clearly show that Baz can be phosphorylated on S180 in cells that do not divide asymmetrically. This led us to change the title.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Cell polarization in dividing cells, including stem cells, is typically coupled such that polarity can inform the architecture, orientation, and/or asymmetry during cell division. In Drosophila neural stem cells (neuroblasts/SOP), Par polarity is coupled to the cell cycle, but the nature of this coupling remains unclear. In this work, Loyer and colleagues report on impacts of CDK1 inhibition on Bazooka/Par3 localization and basal fate determinant localization. They provide evidence for a novel phosphorylation site that appears unique to asymmetrically dividing cells and may be involved in regulation of asymmetric division. Finally, they show that CDK1 can, at least in principle, phosphorylate human Par3 in vitro.

      Overall, the major claims of the abstract appear supported by the experimental work; however, we think the title overstates the overall conclusions that can be drawn from the work.

      Major comments:

      • The major claim of the paper is the role of specific phosphorylation of S180 in asymmetrically dividing cells in polarization and sensory organ formation, which relies heavily on interpretation of S180A/D phosphomutants. The experiments are carefully performed and quantified, and are consistent with the conclusions drawn. However, we wondered if it possible that the phenotypes are not linked to phosphorylation (the authors acknowledge this in the Discussion)? In other words could the A/D mutants simply be weak Baz mutants? This could this potentially explain the extra-SOP phenotype if Baz function is generally altered, especially given that it is difficult to rationalise a role for SOP-specific phosphorylation in the processes that specify SOP cells in the precursor epithelial cells. The authors speculate that these early precursors may exhibit also phosphorylation, but this isn't examined. Chasing this down seems key to support the core titular claim of the paper. Following this suggestion (also coming from Reviewer #1), we have stained nota around 8h APF. We observed that patches of cells of the early notum display a strong Baz-pS180 phospho-signal. These patches partially overlap with the Delta-positive stripes in which lateral inhibition occurs (as described for example in Corson et al., 2017). This result is presented in Fig. 5H. As would be the case for any phosphomutant, this does not strictly rule out that the S180A and S180D could simply be weak Baz mutants, but it strongly supports the possibility that the lateral inhibition defects observed in these mutants result from defective Baz-S180 phosphorylation.

      • Implicit in the core message of the paper is the elucidation of CDK1 regulation of polarity and specifically Baz. However, the connection between CDK1 and S180 (and Baz regulation overall) is relatively tenuous in this work. First, the S180A mutant does not phenocopy CDK1 inhibition with respect to basal determinant phenotypes, though obviously CDK1 may be more pleiotropic. Second, whether the CDK1 inhibition phenotype is linked to any effect on Baz/PAR behaviour is not really explored. Third, they do not test whether S180 phosphorylation is CDK1-dependent. We fully agree with these comments. We cannot think of any way of addressing the first two points, which would require fully inhibiting CDK1 and somehow maintaining neuroblasts in mitosis to examine how it impacts Baz localisation. We tried to arrest neuroblasts in mitosis and block the proteasome as this at least in HeLa cells led to persistence of mitosis when CDK1 was inhibited (Skoufias et al., 2007). However, neuroblasts arrested in mitosis by proteasome inhibition slipped out of mitosis.

      However, concerning the third point, we now provide evidence showing that, at least in vitro, Drosophila BazS180 is phosphorylated by CDK1 (see below).

      The method for quantifying domain signal only references prior work and should be described in this work. From our search of the cited reference, it appears to be peak signal intensity at a user specified point on the cortex. While this does not undermine the core findings as presented, it may not capture additional features that may be informative (domain size, fluorescence distribution, total signal etc.). For example domain coalescence would imply smaller, brighter domains, but similar total protein amounts, which appears to be the case from images, but isn't quantified per se. We now describe our method for quantifying average signal intensity in the middle of the Baz crescents. We agree that quantifying additional features to check whether they are affected by partial CDK1 inhibition would be interesting. However, doing so requires determining exactly where Baz crescents start and end. As Baz crescent edges in neuroblasts often end in a gradient rather than a sharp edge (Hannaford et al., 2018), we are not sure to be able to confidently do so in every case with the image quality of our dataset: we prioritised limiting photobleaching to accurately quantify the levels of endogenously expressed Baz rather than obtaining very sharp and high contrast images. This is further complicated by the fact that, depending on the depth of neuroblasts within the tissue and the orientation of their division relative to the imaging plane, the signal intensity of Baz crescents is quite variable, preventing a simple thresholding approach to arbitrarily determine the limits of crescents based on signal intensity. In short, accurately determining the size of crescents is very challenging.

      The phosphospecific antibody signal is relatively weak, leading to relatively low signal to noise, which could compromise the ability to detect phospho-S180 in non-asymmetrically dividing cells or generally in cells in which Baz is not polarised and thus signal would be diffused around the cell rather than concentrated. Similar caveats could also apply to the lack of signal in interphase cells, where Baz may be less enriched at the cortex and not polarized. We are inclined to believe the authors conclusions, particularly given their examination of multiple cell types and tissues. However, it is a potential caveat as it may be most visible in polarised cells where it is asymmetrically enriched. We thank the reviewer for pointing this out. Given the fact that Baz levels at the neuroepithelial cells adherens junctions are similar, we are confident that Baz-S180 is phosphorylated in dividing neuroblasts but not in non-mitotic epithelial cells, which is at least consistent with our new finding that CDK1 phosphorylates Baz-S180 in vitro. However, we agree that we cannot strictly rule out that Baz-S180 is phosphorylated but below a detection threshold in mitotic neuroepithelial cells as cortical Baz levels decrease in these cells.

      We have also gathered new data showing that, in the early notum, Baz-S180 is detected in epithelial cells that are not dividing asymmetrically, definitely ruling out the notion that Baz-S180 is strictly ACD-specific. We have changed the title of the paper accordingly, toned down the mention of apparently ACD-specific Baz-S180 phosphorylation in the abstract and now describe and discuss the fact that the apparent ACD-specificity of Baz-S180 phosphorylation is context-specific.

      Examination of in vitro phosphorylation of human Par3D (Figure 6) seems out of place and does not add much. It is human, not Bazooka. They reveal 30 sites, 18 of which in both replicates, but most are not obvious CDK sites and the S180 equivalent site is missing. None of these sites is validated in vivo, at least in this work.

      We fully agree with these comments. We initially attempted to purify both full length Baz and human PARD3 but only managed to purify small amounts of PARD3, which is why our analysis was limited to human PARD3. To circumvent these difficulties, we instead purified a smaller N-terminal fragment of Baz and PARD3, which was successful for both proteins and gave us much higher quantities of sample for analysis. Using two different approaches (Western blot with our phospho-specific antibody on Baz and targeted mass spectrometry on Baz and PARD3), we now show in a new Figure 7 that CDK1 phosphorylates Baz-S180 and PARD3-S187 in vitro.

      Minor comments: Figure 1: Uses metaphase arrested cells, presumably colcemid, but colcemid is only noted in Figure 2. We now mention Colcemid in the legend of Figure 1. - Figure 2A: Scale bar is truncated. We have corrected this. - Figure 2A: Example images of control neuroblasts could be useful to readers. We now show control neuroblasts in Figure 2A. - Figure 2G' vs H': Because G' has two panels and H' has only one, we often confused the PKC and Mira box plots when comparing to Numb. Perhaps Mira could be in a separate sub panel or be more closely juxtaposed with Numb? The quantification of the Mira signal is now right next to Numb. - Whereas both Numb/Mira were examined in CDK1(as), only Mira is reported for the S180A/D experiments. Is there a Numb phenotype as well?

      We actually co-stained Numb and Miranda in the dataset that we analysed in the S180A/D experiments shown in Fig 4E, F. We did not analyse Numb localisation in the first version we submitted because of a penetration issue of the Numb antibody: the Numb signal fades extremely fast as we image deeper in the tissue, causing large difference of signal intensity even within a single cell. This prevents us from performing any meaningful quantitative measurement of the Numb signal like the one we did in Fig. 2H, K, for which we did not encounter this issue. All our further immunostaining experiments with this antibody have had the same problem since then, even after using Triton concentrations up to 4% for permeabilization.

      Nonetheless, following the reviewer’s question, we have at least performed a simple qualitative analysis of Numb localisation in this experiment. We observed that Numb localised to the basal pole in most cases in controls and Baz phosphomutants, but localised uniformly at the cortex in half the cases where Miranda showed very low levels of polarisation in metaphase in BazS180D mutants. This Numb localisation defect suggests a loss of function of the PAR complex whereas, intriguingly, the Miranda localisation defect suggests a gain of function of the PAR complex. These new observations are described in Fig. 4G-H’.

      • The discussion of the notch / Baz phenotypes (Figure 5) is rather complicated and a bit difficult to follow. We agree with this, we have rewritten this part. This is further simplified by our new observation that Baz-S180 is phosphorylated in the early notum during lateral inhibition.

      • Figure 5A: captions should indicate that RFP RNAi is depleting Baz. We have modified the figure accordingly.

      • Box plots are used, but not described. i.e. outliers seem to be marked, but criteria unclear. Mean vs median, etc. We now describe boxplots in the legend in the first instance they are used (Fig 2A’), and in the material and methods
      • Some grammatical mistakes:
      • Title: neuroblast (no 's'),
      • Page 1: Cell fate difference(s?) in the resulting daughter cells
      • Page 4: (As) CDK1 inhibition with 10 μM 1-NA-PP1 prevents neuroblasts from cycling and causes metaphase- arrested neuroblasts to slip out of mitosis. (Reword)
      • Page 6: increased levels of basal fate(no 's') determinants

      We have corrected these mistakes.

      Reviewer #2 (Significance (Required)):

      The links between cell cycle and cell polarity are clearly important and remain poorly understood. Hence, the work addresses key conceptual/mechanistic questions relevant to our fundamental understanding of stem cell biology and regulation of polarity and asymmetric cell division. In our opinion, there are clearly some interesting observations in the manuscript, the experiments are performed carefully, and the data are generally well described. That said, overall, the work seems somewhat premature.

      The direct impact of CDK1 on Baz behaviour remains somewhat unclear. The authors do a good job of limiting the concentration of inhibitor to decouple effects of cell cycle progression from CDK1 levels per se, but this does potentially impact the strength of the phenotypes they can detect and hence the observed phenotypes are relatively minor. Note that driving cells out of mitosis with stronger CDK1 inhibition clearly impacts Baz localization, so the 'real' effect of CDK1 inhibition on Baz could be stronger than reported here. It is also unclear whether the phenotypes observed are directly linked to CDK1 regulation of PAR polarity or an indirect effect of cell cycle control of other processes. The authors' suggestion that it could be related to defects in cortical actin organization, which is known to be cell cycle controlled, seems most likely, but neither this or other models are explored further. We agree but are not aware of any experiment that would allow testing full inhibition of CDK1 on membrane-bound Baz in mitotic neuroblasts. As mentioned above in our response to reviewer #1 we tried to arrest neuroblasts in mitosis and block the proteasome as this at least in HeLa cells led to persistence of mitosis when CDK1 was inhibited (Skoufias et al., 2007). However, neuroblasts arrested in mitosis by proteasome or Colcemid or both slipped out of mitosis upon inhibition of CDK1.

      We agree it would be interesting to study how CDK1 affects the actomyosin network in neuroblasts but feel that this is somewhat beyond the scope of the manuscript.

      Using phosphospecific antibodies, they report on a novel putative CDK1 phosphorylation site, but aside from looking like a consensus CDK1 site, whether this site is CDK1 dependent is not examined. Notably, the corresponding phosphomutants have modest effects and don't obviously account for the CDK1 inhibition phenotype, leaving it somewhat unclear whether it is under cell cycle regulation. We now provide a new figure 7 to address this point. As mentioned already above, using two different approaches (Western blot with our phospho-specific antibody on Baz and targeted mass spectrometry on Baz and PARD3 using), we now show in a new Figure 7 that CDK1 phosphorylates Baz-S180 and PARD3-S187 in vitro. Again, we cannot identify any experiment that would allow us testing whether S180 Baz is a direct target of CDK1 in vivo. The fact that we now report significant defects on Baz localisation in pIIa divisions, strongly suggests functional relevance and CDK1 seems a plausible kinase based on the new in vitro results.

      The observation that S180 phosphorylation appears unique to asymmetrically dividing cells is very curious, but this observation is not followed up extensively. Again phenotypes of phosphomutants are quite modest, and while one can propose models to rationalise the phenotypes observed, these models are not fully explored. As mentioned above, we now show that Baz-S180 phoshorylation is not strictly ACD-specific and changed the title accordingly. We also have new data showing that the S180 phosphomutants of Baz have localisation defects in mitotic pIIa divisions (new figure 6). Therefore, this phosphorylation event on Baz can be linked to Baz’s cortical localisation and interestingly shows context dependency.

      The findings that human Par3D can be phosphorylated by CDK1 in vitro do not add much particularly as they obtain a very large number of putative sites raising questions of specificity, the sites are not validated, and an S180 equivalent site was not identified. We agree that this has been a weakness which we feel we have addressed. We paste here the answer already provided above when replying to reviewer #1.

      We initially attempted to purify both full length Baz and human PARD3 but only managed to purify small amounts of PARD3, which is why our phospho-proteomics analysis was limited to human PARD3. To circumvent these difficulties, we instead purified a smaller N-terminal fragment of Baz and PARD3, which was successful for both proteins and gave us much higher quantities of sample for analysis. Using two different approaches (Western blot with our phospho-specific antibody on Baz and phosphor proteomics on Baz and PARD3 using mass spectrometry), we now show in a new Figure 7 that CDK1 phosphorylates Baz-S180 and PARD3-S187 in vitro.

      References

      CORSON, F., COUTURIER, L., ROUAULT, H., MAZOUNI, K. & SCHWEISGUTH, F. 2017. Self-organized Notch dynamics generate stereotyped sensory organ patterns in Drosophila. Science, 356.

      HANNAFORD, M. R., RAMAT, A., LOYER, N. & JANUSCHKE, J. 2018. aPKC-mediated displacement and actomyosin-mediated retention polarize Miranda inDrosophilaneuroblasts. eLife, 7__,__ 166.

      SKOUFIAS, D. A., INDORATO, R. L., LACROIX, F., PANOPOULOS, A. & MARGOLIS, R. L. 2007. Mitosis persists in the absence of Cdk1 activity when proteolysis or protein phosphatase activity is suppressed. J Cell Biol, 179__,__ 671-85.